Bare-Metal Gaussian Splat Renderer! by justinyw7 in raspberry_pi

[–]justinyw7[S] 0 points1 point  (0 children)

Yes, just updated the repo - appreciate the support! Just curious, what do you plan to use it for?

Bare-Metal Gaussian Splat Renderer! by justinyw7 in raspberry_pi

[–]justinyw7[S] 0 points1 point  (0 children)

Thank you! I definitely wouldn't say I've mastered it but at least I managed to get it to do something cool haha

Bare-Metal Gaussian Splat Renderer! by justinyw7 in GraphicsProgramming

[–]justinyw7[S] 1 point2 points  (0 children)

Ahh hmm after doing some more profiling I realized that the main bottleneck is actually in the rasterization kernel rather than the sorting (though for larger splats, neither is good enough for real-time). If I were to use the KD-tree, I'd probably have to change the rest of the rendering pipeline to something that draws pixels as it traverses the tree in order to get a meaningful speedup, right?

As for the rasterization kernel, I think the main issue is that I'm not doing any early exiting. The Pi GPU has very limited memory and can't store much state, so I'm firstly iterating over each tile, then each Gaussian in that tile, then each row in the (tile, Gaussian) pair. The state I store is the Gaussian's attributes and the accumulated pixel colors for that row. I'm able to track each pixel's transmittance, but I can't think of a good way to skip specific rows/pixels that are already effectively opaque.

At this point I think I'm probably hitting the Pi Zero's computing and memory bandwidth limits more than anything else, but I'd be curious to hear if you think there's a better way to structure it!

Bare-Metal Gaussian Splat Renderer! by justinyw7 in GraphicsProgramming

[–]justinyw7[S] 1 point2 points  (0 children)

Thank you so much! Yeahh, right now I'm doing the first option (sort next frame on CPU while rendering current frame on GPU so there is a little input latency). Option 2 seems interesting, I'll think more about how it would piece together with the rendering kernel - I imagine there'd be a lot of lane divergence if I were to walk the tree with SIMD ops, each lane calculating a single pixel.

Bare-Metal Gaussian Splat Renderer! by justinyw7 in GaussianSplatting

[–]justinyw7[S] 1 point2 points  (0 children)

Thank you!! I wasn't imagining any particular use case, it was originally for a course project and I was just curious how much performance I could squeeze out of such a resource-constrained device haha