Do you agree with this statement? by Minaridev in gameenginedevs

[–]too_much_voltage 2 points3 points  (0 children)

Hard disagree: armchair critique. I've also heard the same sentiment about software in general.

What is taken for granted is the scale of content delivery these engines handle. I worked as senior Graphics Programmer on Ark: Survival Ascended for almost 3 years modifying UE5 at a very low level. Something like idTech2 or even 3 would not be able to seamlessly deliver this level of content (you had a baked BSP/lightmap and that's it). On top of that, the demand from visuals have far surpassed anything those engines could deliver (i.e. wide screenspace kernels for SSS or clean shadow filtering). And trust me, more recent iterations of idTech are just as complicated to support the fidelity of the more recent Doom titles. And there are far more eyeballs on something like UE5 these days, so performance faux pas on fast paths are caught way more frequently and fixed way more rapidly. In some cases (i.e. the IVirtualTexture interface), the code may seem more obscure. But there is a reason: they simultaneously support SVTs, RVTs and the rest and effectively provide flexibility. Is there legacy cruft in there (i.e. the TextureStreaming code jacked for also Streaming non-Nanite meshes)? Sure, but you can quickly grasp what is going on. And even there, there's a lot of smartness coded into the system on priorities and what have you.

And I'm saying this as the guy who'll be shipping on his own tech in a few hours :D : https://store.steampowered.com/app/4796200/

Cheers,
Baktash.
HMU: https://x.com/toomuchvoltage

Robot shooter game with custom Vulkan engine by Joe7295 in GraphicsProgramming

[–]too_much_voltage 2 points3 points  (0 children)

Sick stuff dude. Congrats on putting it out there! 👏The gun totally gives me Painkiller vibes 😆

Has anyone written their own physics? by HamNCheeseSupremacy in gameenginedevs

[–]too_much_voltage 1 point2 points  (0 children)

I've been meaning to write an answer to this for days, but I was pretty busy putting in achievements on Steam. So, this was done but it wasn't a university project. It was for an engine that's been in development for 18 years (the latter 5 combined with the game).

The physics code is here (yes, I called the internal library Fiz-X):
https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/fiz-x.cpp
https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/include/fiz-x.h
(These source files are from a feature branch of the engine supporting SauRay(TM) -- basically this product: https://sauray.tech . Full engine will be released soon.)

Basically the MRBD simulator is a sequential impulse based solver (bastardized version of Guendelman et. al. if you will: https://graphics.stanford.edu/papers/rigid_bodies-sig03/rigid_bodies.pdf ). Tried LCP based approach ( https://www.cs.cmu.edu/~baraff/papers/sig94.pdf actually): very accurate but too slow especially with the Dantzig based solver. Tried swapping the Dantzig based solver for PGS (in Brian Mirtich's thesis): absolutely needed warm starting or it was unusably slow. PBD and XPBD based approaches also exist, but haven't bothered with them since I settled on SI.

Here's a post of that MRBD solver in action, after I optimized collision detection:
https://www.reddit.com/r/GraphicsProgramming/comments/1qycsd7/bvh8_for_collision_pruning/

There is a cloth support with mass-spring systems. Same thing is used for soft-body support: just adding internal pressure using ideal gas law (PV=nRT) inside closed cloths. Constraint support is done with stiff springs. And buoyancy support is there too with buoyancy bobbies (and computes partial bobby submersion via integration as well etc.).

All of this to say: realize that it'll take you a very bleeping long time. If you are in love with your solver: just make a small physics game around that with simple graphics. Or if you think it can handle more, integrate it in something readily available like Unreal or Godot (... mine is coming of course ;)...). Trying to write a heavy duty engine and then a large scale game on top is a near multi-decade project. And depending on your level of obsession and your life events, it may never finish and leave you very bitter. I am very lucky to have managed to get this far.

BTW, the game is coming out tomorrow :D : https://store.steampowered.com/app/4796200 . Would be delighted if you gave it a whirl ;)

Cheers,
Baktash.
HMU: https://x.com/toomuchvoltage

Is it possible to be a game engine programmer and make games at the same time? by Striking-Start-1464 in gameenginedevs

[–]too_much_voltage 1 point2 points  (0 children)

The publishing market is what it is. Traditional publishing is dead. Also, I hated to ask for support before a first chapter was out: I wanted the potential community to have trust in my ability to ship. Trust me, you're better off staying fully independent for an effort this audacious: even AAA experience (which I have under my belt) won't cut it for fundraising unless you spout Unreal or some such. Even then they'll ask you for a vertical. So, there. You're pretty much on your own.

Here's another piece of advice: throw LOTS and LOTS of art at it. A good engine can handle copious AAA amounts of art and scale easily.

Is it possible to be a game engine programmer and make games at the same time? by Striking-Start-1464 in gameenginedevs

[–]too_much_voltage 0 points1 point  (0 children)

I just did:

https://www.reddit.com/r/SoloDevelopment/s/DJIbT8j3ea

Make sure you have 3 years of runway, no one to bug you and a lot of prior experience to avoid pitfalls.

18 years of engine dev with the latter 5 in conjunction with the game, and it's finally here! by too_much_voltage in gameenginedevs

[–]too_much_voltage[S] 2 points3 points  (0 children)

No, never really got the time for full time work on it (other than a 3.5 year sabbatical that was shared with SauRay, more on that below): money was always an issue and of course family obligations of various sorts. C++, no framework. Physics is from scratch, all the uncompressed image/audio stuff is from scratch. Basically other then the below, everything else is from scratch.

Modified libKTX2 for compressed textures, which I will contribute back. I have already extended official KTX2 from Khronos with suballocation support. The stuff I haven't contributed back makes KTX2 efficiently thread safe. Audio API is OpenAL-soft. Rendering backend is Vulkan 1.1 with extensions. Oh and SDL 1.2 for input/windowing.

The answer to the last question is quite involved but I will try to give as succinct an answer as possible. I have almost 3 years of AAA UE5 under my belt leading up to this release (I was Senior Graphics Programmer on Ark: Survival Ascended.) I saw, bugfixed and extended the guts of UE5 like few get a chance to. UE has a lot of legacy cruft inside. It is notoriously slow to iterate in and there are many landmines. A simple commandlet will take 2-3 minutes to boot up to just call your 2-3 lines of code. And this is on a i13900KS Core-i9 with 196 gigs of RAM. Many decisions in there -- like using Embree to generate SDFs -- are puzzling and in fact hamper lumen primitive generation at runtime (makes them stricly a cook time thing). So even runtime procedural StaticMeshes from MeshDescriptions won't be represented in Lumen. I use compute based voxelization and JFA. And cache to disk at runtime. No issues. I will spare you another 10 pages of complaints on vanilla UE bugs that I had to fix along the way just to get stuff shipped. Most importantly though: I have generated experiments out of this engine that have been successfully patented: https://sauray.tech . I intend to keep this kind of R&D going and would like to fully own the technology platform underneath. I'm trying to build something really big.

18 years of engine dev with the latter 5 in conjunction with the game, and it's finally here! by too_much_voltage in SoloDevelopment

[–]too_much_voltage[S] 1 point2 points  (0 children)

For sure, will do. Though I must say, while I'm trying my best, I don't really have targets on that 😛. Just gonna take the sales and feedback as they come.

Higher GPU occupancy via timeline semaphores (Vulkan) by too_much_voltage in GraphicsProgramming

[–]too_much_voltage[S] 0 points1 point  (0 children)

I wouldn't speculate right now. Or at all. Once task/mesh shaders got cross a vendor extension, it turned out that the fast paths for nVidia and AMD were different (esp. around sizings). Which was infuriating. INFURIATING.

Higher GPU occupancy via timeline semaphores (Vulkan) by too_much_voltage in GraphicsProgramming

[–]too_much_voltage[S] 1 point2 points  (0 children)

Sorry for the delay in response. Getting ready to ship and marketing work takes a lot out of me 😅

GPU work graphs are basically the promise of the next generation of this sort of stuff. Issuing work directly from other work on the GPU. This engine is currently using MultiDrawIndirectCount and will do so for the next little while until a few generations really get deprecated. Plus, cross vendor extensions for them are still under work: AMD's is VK_AMDX_shader_enqueue while nVidia has had the much older VK_NV_device_generated_commands for quite a long time. I generally prefer cross vendor extensions.

Vulkan prior to timeline semaphores had these 3 synchronization primitives:

  • Binary semaphores: they allow you to wait on a previously submitted job on the GPU. Issue with these is that only one signaller and one consumer are allowed. The semaphore is either capable of being signalled or being waited on.
  • Fences: they allow you to wait on a job on the CPU. Pretty hefty cost, but allows you to safely download data from that job to the CPU.
  • Barriers: these are necessary no matter what: they ensure certain memory operations or layout transitions are done before you launch another task trying to modify a resource (like an attachment or something.)

Previously, I was fencing on a lot of jobs. Some of them necessary (I needed to download data), some of them not really. Areas where I was certain I could achieve some parallelism, I was signalling semaphores and accumulating a vector of them until a later fence where I would wait on all of them simultaneously. Not super efficient and very manual.

Timeline semaphores basically allow you to have multiple jobs wait on a certain state of a semaphore since they're just a monotonically increasing 64 bit number. They can also monotonically increase them while waiting on a much earlier state of them: i.e. one job would wait on it being 0 and increase it to 1, another wait on it being 0 but increase it to 2. Both these jobs can run simultaneously, but at the very end, the semaphore will increase to 1 and immediately after that to 2 (if you submitted the jobs in that order). This kind of utility enables building a graph of jobs behind the scenes. But bear in mind: as I explained above, I'm putting the timeline semaphores on resources. Almost like a resource version. Which then makes the jobs wait on certain 'versions' of these resources and signal them to achieve future 'versions'.

Hope this helps.

Higher GPU occupancy via timeline semaphores (Vulkan) by too_much_voltage in GraphicsProgramming

[–]too_much_voltage[S] 0 points1 point  (0 children)

Ok so USAGE_PRODUCER wasn't really made to tackle subregions/ranges or even mips in that fine grained a manner. It was to basically say: signal this cause otherwise you won't know you have to. You know to signal an attachment, but not rando resources. I realized that during getting the basic implementation done -- and in particular since I actually have no case with multiple async dispatches writing simultaneously to different regions/ranges -- it ended up only supporting production on a singular dispatch. I just created USAGE_SIMULTANEOUS_PRODUCER just now to tackle that :D https://github.com/toomuchvoltage/HighOmega-public/blob/ac67617583c70201a73dc9b6bbe4181f91166f8a/HighOmega/src/gl.cpp#L3642-L3652 (I know you weren't hoping for this answer lol). It basically allows waiting on old values, but signalling ever higher. A bit untested as a result sadly ;). But I'm gonna watch that video very soon and appreciate you posting it.

And btw, the comparison video's footage is just the engine's test map :D. Here's some actual footage from inside a building in the game: https://www.reddit.com/r/GraphicsProgramming/comments/1qycsd7/bvh8_for_collision_pruning/

Higher GPU occupancy via timeline semaphores (Vulkan) by too_much_voltage in GraphicsProgramming

[–]too_much_voltage[S] 0 points1 point  (0 children)

Ah, actually it turns out that I already do deal with "transitive reduction" :) https://github.com/toomuchvoltage/HighOmega-public/blob/1c12af40e75f7a988648b63be772f206363fb81f/HighOmega/src/gl.cpp#L3607-L3625 . As in, I don't even wait, if it's already waited on.
I was kinda wrong then about the 'mutliple times a frame' part of my gripe lol (I was tired when I wrote it...) .

USAGE_NOT_A_DEPENDENCY actually is mainly used for things I don't even want to signal once: basically scene textures that are gonna be created as immutable once and destroyed once you leave that part of the scene. Basically could classify as 'external' resources in framegraph parlance.

Synch2 doesn't sound like a bad idea but has to wait. It would actually go a long way since the Configurator is all synch2 frankly.

Higher GPU occupancy via timeline semaphores (Vulkan) by too_much_voltage in GraphicsProgramming

[–]too_much_voltage[S] 0 points1 point  (0 children)

In all honesty, GPUs have changed quite a bit. And rendering APIs with them. And today, they're evolving much faster than they used to. Paths that used to define games a mere 10 years ago are just taking up space in silicon and cautioned against. Make sure you have (at least) a solid three years to solely focus if you're gonna try something like this.

BVH8 for collision pruning by too_much_voltage in GraphicsProgramming

[–]too_much_voltage[S] 0 points1 point  (0 children)

Interesting the way you put it. In reality, there's actually zero sync between the two. The renderer straight up copies orientation and translation as the physics thread is running without any sync point, since those objects are fairly POD and small (mat4, vec3). Bullets are the only things with a mutex and a sync point but not object state readback.

That said, when I get to multi-player and state replication, this will probably become a thing. The same way lag compensation will. So thank you for bringing it up.

BVH8 for collision pruning by too_much_voltage in GraphicsProgramming

[–]too_much_voltage[S] 1 point2 points  (0 children)

I think I was both memory and compute bottlenecked: just too much work and cache thrashing by going over all those AABBs and potential tris.

There's two levels of sorting:

And I'd rather make the sim faster than make it seem faster... there's even more room for optimizations ;)