Today we see the price of doing weekly blog entries -- I don't have anything even remotely finished, I'm annoyed at several issues, and I don't feel well. So this should be a really upbeat entry!
I've talked for ages about mixing blocks and polygon-based scenery in SeaOfMemes. I want to use blocks there for building, but have the landscape be smooth. The only changes you'd make in the landscape would be digging holes or leveling a building area.
In Crafty on the other hand, the whole world is blocks, and it's more a matter of how to represent distant blocks. Figure 1 shows what I had months ago -- a heightmap-based landscape in the distance, and editable blocks nearby. This has problems:
So Use Blocks!
I decided to try a compromise. I render the distant scenery with a heightmap, but draw it as blocks. Patches are still 32 by 32 and get coarse in the distance, but the heights are accurate. This gives a more gradual landscape and the blocks in the distance match the nearby blocks. It all looks like a "block world" and I can use actual block data to build the reduced resolution versions. So a large structure would still show in the distance. That looks like Figure 3.
The bottom line is I'm still fussing over this.
Back at the beginning of the project, people told me to use instancing to draw all my cubes. Unfortunately, instancing isn't supported in OpenGL 2.1, so I couldn't use it and support the Mac (at that time) or tablets. Also, I didn't really understand how it worked.
I could see from the documentation that the instanceID variable is incremented for each draw of the buffer, but I didn't know what you could do with that. The example I looked at just placed blades of grass on a grid. I knew that I needed to place cubes at arbitrary locations and wasn't sure how to get that data into the shader. I hadn't realized you could pass coordinate data or transforms through textures. (OpenGL 2.1 also doesn't have anything but 8-bit textures.)
What I did instead was packed my vertex data into bit fields with integers, instead of using floats. Part 16 and Part 17 cover my efforts on that. The result was that instead of 36 bytes per vertex (9 floats), I use 8 bytes. That let me leave a large number of chunks in the display without swapping them as you moved.
Unfortunately, this doesn't work at all under OpenGL 2.1, where integer vertex attributes are not supported and there are no bit shift operators in the shaders. So ironically, on the older displays with limited memory, I'm using a lot of memory, and on the newer displays with lots of memory, I compress vertexes and need less.
Things got worse in Part 34 when I introduced non-cube shapes. These have lots of triangles, and there are parts of the scenery where many blocks are non-cubes (grass, for example.) This puts a huge amount of data into the vertex buffers. Again, under OpenGL 3.2, I compress the vertexes, and under OpenGL 2.1, I can't.
In fact, if you play with McrView on a complex world, you'll notice that it pages scenery chunks in and out a lot (fades them in again when you turn your head). This is because the entire region around the player just doesn't fit in display memory. Sometimes it also drops frames, because the chunks are now so huge that they can't be transferred to the display in time.
It can be done!
I understand how instancing works now, and have frequently thought I should rewrite the code to use it. Then complex objects like grass or rail tracks would take almost no room in the display. But that would require two completely different versions of the code for OpenGL 2.1 and 3.3. So I didn't do anything about it.
Then I realized there is a way to do instancing under OpenGL 2.1. It requires three things to happen:
If I use instancing to render my cube data, I need to have a buffer for each different shape (grass, rails, etc.) and draw each one with a texture giving locations. I called DrawArrays 75 times, each time with a subsection of my larger buffer. To my surprise, this takes the exact same amount of time as drawing it all in one buffer.
To check this again, I split the buffer into 5000 calls (75*75 actually) and drew that. This also took the same time as a single buffer! Then I got suspicious and tried drawing the chunks in reverse order, from high to low indexes. This should be the same amount of work, but increases the time to 35ms. Apparently, something in OpenGL is noticing if I draw consecutive pieces out of the same buffer and just enlarging the initial draw...
After that, I split the data into 75 actual buffers and drew those. This takes 25 ms, less than drawing the single buffer in 75 calls. That's very odd, since it's the same amount of work, other than starting at a different point in the buffer. It still takes longer than drawing it all with a single buffer though.
Then I implemented my work-around for instancing. The shader combines the baseIndex uniform variable with the vertex index field, converts that into a row/column in a texture and reads the pixel there. X, y, and z times 256 give me my coordinates, and I draw the vertex. That takes 33 ms.
This is considerably slower than doing it all as one buffer, which implies instancing is going to save a lot of space, but cost a lot of time. I assume the extra time is going to accessing the texture in the vertex shader, but I haven't played with it enough yet to be sure.
Unfortunately, I realized this doesn't solve all my problems. If I rendered all cubes as instances of a single cube, I'd be drawing all six faces, which is more than I should for partially obscured cubes. I could handle this with 2**6 = 64 separate cube buffers, one for each possible arrangement of faces. If that's too many calls, I could add rotation info and cut the number of cases.
Even this doesn't work with transparency though. Transparent cubes have to be drawn in order from back to front, and I can't draw any extra sides without changing the look. So I can't just draw them all as 6-faced cubes, and I can't draw all the ones with just tops, then the ones with just bottoms, etc.
To do transparency correctly, I'd have to make a cube face my instance, and draw them all in order from back to front. The only savings over what I do now would be that instead of 24 vertexes (6 faces times 4 corners), of 8 bytes each (192 bytes), I'd be drawing 6 faces of 4 bytes each (24 bytes.) On OpenGL 2.1, this would be a much bigger savings, since there I use 36 bytes per vertex (864 bytes per cube!) and that would become 24. So definitely worth doing, if I can afford the time.
A smart cube shader is what I was trying to write in Part 17, and it did not go well. I used a constant array in the shader code and it was 30 times slower than the simple case. Commenters had told me that uniforms might be faster than constants, but I was so disgusted with the situation that I didn't try it.
This time I coded it up with array uniforms for corners, normals and texture coordinates. To my surprise, it is actually a bit faster than my instanced cubes -- 30 ms vs. 33 ms. This can only be due to the smaller vertex. In the "face instance" case, the vertex is just the instance ID and a face number (0-5). In the instanced cube case, the vertex is the full 9 floats.
I haven't tried this on any slow hardware or ATI vs. Nvidia, and I haven't turned it into full rendering code for my landscapes to get a final time. I haven't even coded a 2.1 shader for this, although I don't think I'm using any operators that aren't supported there.
If this works, I can at least do all the non-cube shapes with instancing, even under OpenGL 2.1. That will reduce the memory use dramatically without making things too complicated. Whether I can use it for all my cubes without a performance penalty remains to be seen.
That's all for this week.
blog comments powered by Disqus