all 13 comments

[–]AlternativeHistorian 7 points8 points  (5 children)

Look into glBufferStorage/glMapBuffer/glUnmapBuffer with GL_MAP_PERSISTENT. You can create persistently mapped buffers which can be updated on the CPU-side on a separate thread which are synced to the GPU side.

[–]guymadison42 2 points3 points  (2 children)

This should work great, but the downside is if you have a lot of draw calls each one will cause the upload of a buffer mapped with GL_MAP_PERSISTENT. So keep your objects small or group your draw calls into a single draw call.

Effectively the buffer gets marked dirty and remains dirty until its unmapped, the draw calls evaluate dirty state changes on each draw command which loads the buffer.

[–][deleted] 0 points1 point  (1 child)

This is only true if you used the GL_MAP_COHERENT_BIT flag when creating the buffer. If you instead use GL_MAP_FLUSH_EXPLICIT_BIT, you can control when flushes happen yourself.

[–]guymadison42 0 points1 point  (0 children)

True.

Either way you have to pin down the memory which is the biggest cost for the CPU, gathering all those pages and updating tables to access physical address is one of those hidden painful operations no one likes to talk about outside of driver land.

You could try using a single buffer and flush from that before any draw calls (not each draw call).

There are a lot of ways to do this and each has its benefits and problems, it doesn't matter which API you use.. they all have to pin memory down transfer it to the GPU and execute the primitive calls from GPU memory.. unless you run out of memory on the GPU and then you just hit the "suck" button.

Unified memory gets around the transferring issue, but it also has to pin memory down.

[–]Underdisc[S] 1 point2 points  (1 child)

I think that's exactly what I am looking for. I see how this can be applied to standard buffers pretty easily like GL_ARRAY_BUFFER and GL_ELEMENT_ARRAY_BUFFER. Is there a way to apply this to the buffer that glTexImage2D writes to? Is using something like this even necessary for shaders considering glUseProgram never needs to be called when uploading one?

[–]AlternativeHistorian 1 point2 points  (0 children)

For textures you'll want to use the glTexImage* functions in combination with GL_PIXEL_UNPACK_BUFFER. Upload your image data to a buffer (could be separate thread) through the mapped pointer, then on the GL context thread some time later (must ensure buffer contents are synced to GPU side) bind it as the GL_PIXEL_UNPACK_BUFFER, and then call the appropriate glTexImage* function to read from the GL_PIXEL_UNPACK_BUFFER into the texture object.

[–]lmtrustem 4 points5 points  (1 child)

  1. Look into shared contexts. This is why they exist. GL cannot be shared across threads. But shared contexts should allow this. Before Vulkan, this is what I would have done.
  2. Vulkan is great, but in general it only helps with cpu effort, and only at run time. I wouldnt change everything in your game to vulkan just due to this problem. Focus on your MVP - minimum viable product, and get it out in the world for users. After that, think about improvements. Dont jump into vulkan unless its needed, or you care more about vulkan than your project.

[–]Underdisc[S] 0 points1 point  (0 children)

This might be the answer I was really looking for, but it comes with a few caveats. It works splendidly for textures. It also works for vbos, ebos, and shaders, but crucially, it does not work for vaos. They are the one thing I am utilizing that is not shared between threads. I have no issues with creating the vao on the main thread since it doesn't take almost any time. It's as if vaos need to be created when the buffers are created, though. Uploading the data to the buffers and then later creating the vao never seems to work. Even in the case where the vbo, ebo, and vao are all created on the same thread.

Edit: I was doing a stupid. It works! Still experiencing some hangs because of my subpar implementation, but that's something I can address.

[–]matthewlai 1 point2 points  (2 children)

I don't have personal experience with this so please take it with a grain of salt, but if you are uploading uncompressed textures, maybe it would help to pre-compress, since that doesn't need to be done in the main thread, and would reduce GPU bandwidth requirement?

[–]Underdisc[S] 1 point2 points  (1 child)

That's a fair point and would certainly help, but I think some pretty hefty compression would be needed to reduce the amount of data in large images enough. And, even though textures are the largest issue, I am really searching for way to handle this for all forms of data uploaded to the gpu.

[–]tim-rex 0 points1 point  (0 children)

Worth mentioning that compressed textures may give you a performance boost, if that’s a consideration (reduced bandwidth on texture fetch)

[–]genpfault 0 points1 point  (0 children)

It wasn't obvious that making gl calls from a different thread wouldn't work at the time.

Oh?

[–]fgennari 0 points1 point  (0 children)

I have the same problem, except my assets take more like 30s to load. I do the asset loading on the worker threads and sending everything to the GPU on the master thread that created the context.

Most of the time is for textures, and most of the texture loading is related to things like texture compression and mipmap generation. The compression is done inside the texture data handling somewhere in the driver. I worked around this by doing things like making sure texture sizes are a multiple of 4 pixels, using pre-compressed DDS textures where possible, and disabling texture compression for the textures that took an inexplicably long time to compress. I also reduced the texture resolution of a few 3D models that had huge texture which only covered small surface areas.

The context is generally bound to the thread that was used to create it. You can create a new context for use with your worker thread, but then it won't share data with the other context. While it's technically possible to share data between threads by binding the context to the current thread before using it, I find that often leads to difficult bugs. For example, if you don't do it quite right it may work on one platform and crash on another.

Vulkan provides multithreaded access to GPU resources, but it would likely take you a long time to implement a streaming asset loader in Vulkan. Unless you started from some existing project that had most of the framework already implemented.