非交错顶点缓冲区的DirectX11缓冲区、顶点

2023-09-07 22:01:21 作者:穷极一生

如果我的顶点位置是共享的,但我的法线和UV的不(以preserve硬边和喜欢),是有可能使用非交错缓冲区的DirectX11解决这个内存重新presentation例如,我可以用指数之与它缓冲?或者我应该坚持以交错缓冲重复顶点位置?

If my vertex positions are shared, but my normals and UVs are not (to preserve hard edges and the likes), is it possible to use non-interleaved buffers in DirectX11 to solve this memory representation, such that I could use indice buffer with it? Or should I stick with duplicated vertex positions in an interleaved buffer?

和有交错和非交错顶点缓冲区之间的任何性能问题?谢谢!

And is there any performance concerns between interleaved and non-interleaved vertex buffers? Thank you!

推荐答案

有几种方法。我将介绍最简单的一种。

How to

There are several ways. I'll describe the simplest one.

只要创建独立的顶点缓存:

Just create separate vertex buffers:

ID3D11Buffer* positions;
ID3D11Buffer* texcoords;
ID3D11Buffer* normals;

创建输入布局元素,递增 InputSlot 成员的每个组件:

{ "POSITION",  0,  DXGI_FORMAT_R32G32B32_FLOAT,  0, 0,                            D3D11_INPUT_PER_VERTEX_DATA, 0 },
{ "TEXCOORD",  0,  DXGI_FORMAT_R32G32_FLOAT,     1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
{ "NORMAL",    0,  DXGI_FORMAT_R32G32B32_FLOAT,  2, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
                                             //  ^
                                             // InputSlot

绑定缓冲其插槽(最好一次性全部):

Bind buffers to their slots (better all in one shot):

ID3D11Buffer** vbs = {positions, texcoords, normals};
unsigned int strides[] = { /*strides go here*/ };
unsigned int offsets [] = { /*offsets go here*/ };
m_Context->IASetVertexBuffers(0, 3, vbs, strides, offsets );

绘制如常。 你并不需要改变HLSL code(HLSL会认为,因为它有一个缓冲)。

Draw as usual. You don't need to change HLSL code (HLSL will think as it have single buffer).

请注意,即code片段写在即时,并且可以包含错误。

Note, that code snippets was written on-the-fly and can contain mistakes.

编辑:您可以改善这种方法,结合缓冲区由更新速率:如果 texcoords 法线从来没有改变过,合并。

you can improve this approach, combining buffers by update rate: if texcoords and normals never changed, merge them.

这是所有关于引用地方的:越接近数据,更快的接入

It is all about locality of references: the closer data, the faster access.

交错缓冲器,在大多数情况下,使(由远)针对GPU侧(即渲染)更多的性能:为每个顶点彼此靠近每个属性。但是,独立的缓冲区提供了更快的CPU访问:数组是连续的,每一个数据就是近previous

Interleaved buffer, in most cases, gives (by far) more performance for GPU side (i.e. rendering): for each vertex each attribute near each other. But separate buffers gives faster CPU access: arrays are contiguous, each next data is near previous.

因此​​,总体而言,性能问题取决于你写缓冲区的频率。如果你的限制因素是CPU写入,坚持独立的缓冲区。如果没有,去一个。

So, overall, performance concerns depends on how often you writing to buffers. If your limiting factor is CPU writes, stick to separate buffers. If not, go for single one.

你怎么会知道?只有一个办法 - 的简介的。在这两方面,CPU方面,和GPU方面(通过显卡从GPU的厂商调试/测试)。

How will you know? Only one way - profile. Both, CPU side, and GPU side (via Graphics debugger/profiler from your GPU's vendor).

最好的做法是限制CPU写入,所以,如果你会发现,你是缓冲更新的限制,你可能需要重新查看你的方法。我们需要更新缓存的每一帧,如果我们有500 fps的?用户将看不到区别,如果你减少缓冲区的更新速度为每秒30-60倍(从帧更新解除绑定缓存更新)。所以,如果您的更新策略是合理的,你可能永远都不会是CPU-有限,最好的方法是经典的交织。

The best practice is to limit CPU writes, so, if you will find that you are limited by buffer updating, you probably need to re-view your approach. Do we need to update buffer each frame if we have 500 fps? User won't see difference if you reduce buffer update rate to 30-60 times per second (unbind buffer update from frame update). So, if your updating strategy is reasonable, you will likely never be CPU-limited and best approach is classic interleaving.

您也可以考虑重新设计您的数据管道,甚至在某种程度上prepare离线数据(我们称之为烘焙),这样你就不会需要应对非交错缓冲区。这将是相当合理了。

You can also consider re-designing your data pipeline, or even somehow prepare data offline (we call it "baking"), so you will not need to cope with non-interleaved buffers. That will be quite reasonable too.

内存与性能权衡。这是一个永恒的问题。重复记忆采取交错的优势是什么?或不?

Memory-to-performance tradeoff. This is the eternal question. Duplicate memory to take advantages of interleaving? Or not?

答案是...这取决于。你是编程新的CryEngine,瞄准顶级的GPU与GB的内存?还是要编程的移动平台的嵌入式系统中,内存资源缓慢和有限的?请问1兆字节的内存值得麻烦呢?或者你有巨大的模型,100 MB的每一个?我们不知道。

Answer is... "that depends". You are programming new CryEngine, targeting top GPUs with gigabytes of memory? Or you're programming for embedded systems of mobile platform, where memory resources slow and limited? Does 1 megabyte memory worth hassle at all? Or you have huge models, 100 MB each? We don't know.

这一切由你来决定。但要记住:没有免费的糖果。如果你会发现存储经济价值的性能损失,做到这一点。简介和比较以确保万无一失。

It's all up to you to decide. But remember: there are no free candies. If you'll find memory economy worth performance loss, do it. Profile and compare to be sure.

希望这有助于以某种方式。快乐编码! =)

Hope it helps somehow. Happy coding! =)