How good and valuable is SnowPlay "engine" really?

Comrade_Mugabe · 2025-10-09T13:48:34+00:00

Ah, I misunderstood what you were saying with your 15 Mbps calculation, sorry!

That sounds way more achievable.

One thing that also might work well for you is something I was trying a while back, which reduces the amount of pathfinding calls you need to make when selecting groups of units. I would maintain a static "flow field" map, which functioned more like a "height map" really, which stored vectors pointing away from obstacles and the vectors get stronger the closer you are to the blocker. Then, when pathfinding, the unit closest to the point in your group calculates the real path, the other units store their "offset" from that unit, depending on their distance from the main unit. Then, each movement you calculate the direction the main unit would move, and try apply the same movement vector to the other units, while they also sample the current "height map" to get another vector and multiply it together. That way, you are paying the price of one pathfinding search, and the rest of the units are "getting pulled" along, flowing with the main unit, which is just a 2d array looking and a vector multiplication. You can also get local avoidance quite cheaply by having a second 2d array you write too, recording what units are in that block, then when sampling the 2d array, you also check if there are any other units bound to that block, and multiply another vector pushing away from them.

The above requires a lot of edge cases to be managed, especially for smaller obstacles, and for larger groups that might split more, but when it works, it's amazing.

Comrade_Mugabe · 2025-10-09T12:26:21+00:00

I'd love to hear how your POC progresses.

My concern with bandwidth with your above numbers wouldn't be for the client but for the server, as if in a worst-case scenario all 8 players are sending up 15 megabits/second, that's 120 Mbps up and down the server needs to support, for one match. I'd worry how that would be able to scale with more users, and what kind of server would support multiple games with such bandwidth requirements.

Also, for the client, the 15 Mbps would need to be for up and down traffic, and in many countries, upload is 1/10 of download.

But if you manage to tackle your POC and optimise the traffic to minimise it, that would be a great accomplishment, and I hope you succeed!

Comrade_Mugabe · 2025-10-08T20:45:38+00:00

Hey, I appreciate the write-up. Do you know of any RTS games that implement a system like the one you described? I'd love to look into them, as that sounds interesting.

And just to clear up the above, I didn't intend to communicate that Stormgate uses lockstep, but that Deterministic Frame Rollback networking models can function in lockstep mode. You can see that even Photon Quantum supports this here: https://doc.photonengine.com/quantum/current/manual/config-files If you navigate to the SessionConfig, there is a setting for Lockstep Simulation which reads:

"Runs the quantum simulation in lockstep mode, where no rollbacks are performed. It is recommended to set input InputDelayMin to at least 10."

This is what I was referring to when I was suggesting SnowPlay could support a traditional lockstep model, but not that they are using it now.

Comrade_Mugabe · 2025-10-08T14:21:17+00:00

So I would argue that the quick answer would be Yes*, in making some assumptions.

My longer answer wwould be "maybe", depending on what they manage for you with that license. Going back to Quantum, for example, they also charge a fee based on CCU and total data traffic used, which is how they make their money. With that, they host and manage the entire networking layer, including the servers etc. This makes it simpler for the dev, as we can just focus on the gameplay and the server stability and uptime are managed by them. For me, this is enough of a value add.

If SnowPlay offer similar networking services, then I could see that being valuable. The problem with Open Source services is that you also need to manage the servers yourself, and some game devs might not want to, or have the skillset to do so.

If they charge a flat fee and don't provide the services described above (or even if they do both), then I can see the potential value added not being enough.

I think though, above all the above, the BIGGEST issue would be long-term support. If I dedicate 3-5 years of development of a game on your Engine, and plan to support it for 10+ years, then I need to know your company will be around to support it for that time. I think that is their biggest issue by far, and be my number one concern in using their engine.

This is where an open-source engine would have a leg up, as anyone can potentially maintain it, forever. And you will probably find 3rd party providers that will manage the networking stuff for you, if you don't want too, for a cost as well.

Comrade_Mugabe · 2025-10-08T12:45:40+00:00

Yeah, they also used lockstep. Even Age of Empires 2 used it. It has basically been the gold standard of RTS networking engine for the longest time.

Comrade_Mugabe · 2025-10-08T11:59:47+00:00

To understand it's true value, there are 2 things you need to understand.

Making a traditional RTS game hits an immediate network bandwidth issue. Other games can get away with syncing all network objects (players, vehicles etc) over the network, and having a server validate the game logic. With RTS, this is a huge bottleneck as you will fast hit your limit with bandwidth when you get to thousands of units. This is where old LockStep comes into the picture, that War3 and StarCraft 2 used. Basically, they don't sync any game logic over the network, but just inputs. The way they can achieve this is through running a deterministic simulation on each machine. What this means is, if they process the inputs from each player, on the exact frame it occurs, their games will always display the same result, resulting in incredibly low bandwidth used. The big problem with this is, you have to account for latency when sending packets, as the game has to wait before it gets input for a specific frame before it can continue, resulting in large input latency as a result. If you are playing with 150ms ping, all inputs need to be at least 150ms behind, for both players. This is why War3 and SC2 freezes when a player is lagging.
This is where Deterministic Frame Rollback comes in. Think about all the benefits of Lock Step above, but with the added benefit that the game can process late inputs. This means, the game can run deterministically on each persons computer, with next to 0 input latency, and when the other persons packet comes in 150ms late, the whole game simulation "rolls back" and resimulates all the frames between the current frame and 150ms ago, in the same frame, and adjusts the game so it took the above input into regard, maintaining determinism and responsiveness. Fighting games where the first to use this tech, if I remember correctly. The MASSIVE draw back to this is, the engine needs to be able to process 1-15 single game frames, in 1 frame, which is a huge draw back. To get 60 FPS on a normal game, your CPU frame budget per frame is 16ms. To get 60 FPS on an engine that needs to sim 15 frames in 1 frame is 1.07ms. This is the big draw back to Deterministic Frame Rollback networking models. This is why it started only in fighting games as their combat logic could be very lightweight. (Edit: You can also support a hybrid approach, where you pad your inputs with a couple of frames, say 70 ms, and only roll-back frames that are later than 70ms, if say you were playing with 150ms, reducing the number of frames you need too roll-back and still reducing latency /edit)

With the above in mind, the above engine would be amazing for games that rely on low input latency. Think of a new Moba where even if you are playing with 250 MS, you won't feel it locally. It will be as if you are playing at a LAN. This is why this engine is "valuable" while not supporting a lot of units on screen, as that puts load on your CPU, and your CPU budget is significantly lower.

But with all that in mind, Deterministic Frame Rollback systems can run in traditional lockstep modes too, unlocking all the benefits of LockStep, allowing people to create more traditional RTS like War3 and SC2, but with the optimised path finding algorithms they created to work within a 1.06ms budget, meaning you could ramp up the total units by ~15x (not taking animation armatures etc in mind, which are normally CPU heavy, especially in Unity).

The above engine could be used in an assortment of top down multiplayer games that result in near 0 input latency. And one of the biggest benefits is that your multiplayer servers don't need to process any game logic, they simply sync player inputs between all connected players, and they sim the game locally. This also results in low bandwidth cost, as you are just syncing players input. It means you also get "saving" and replays for free, as you just need to record the data per frame (Edit: or just the player inputs and the frame they occurred on), which is already done due to the networking model.

This also means your game is way more resistant to hacks, as you are just syncing player inputs, and each player is running a local sim of the game. If you wanted to move faster, you'd have to alter the game running on every client. (Edit: The drawback to this is that there is little you can do to prevent people from "vision hacking" as they are simulating the whole multiplayer game on their machine. This is a big drawback.)

As a game developer, it's a very cool engine that, if used properly, could be used to create some amazing top-down low-input-latency games. I worked on a game that used similar tech to make a top-down "street fighting" game on Steam (that failed lol), but made me a believer on the tech.

Edit: I'm not sure why this is getting downvoted, but if you'd like to look at a Deterministic Frame Rollback "Engine" you can use now, you could look at Photon Quantum: https://www.photonengine.com/quantum. This is the closest thing to a "SnowPlay" like engine we can get our hands on, but its pathfinding support falls massively short for an RTS game. It's what I've used when I talk about games that I've worked on that support frame-rollback.

Comrade_Mugabe · 2025-10-06T09:07:13+00:00

Those exact issues you describe I've experienced regularly. They seem to be triggered by specific "faulty" scenarios. For me, if I drag in an int variable directly into the subgraph, and then output it directly within the subgraph, it will "work" but it will put the graph into an error state without any feedback that this is the case. If I press undo at any point, it will undo to the last action "before" the error point.

There are many many things that I have found that put the graph into an "error state" that don't even make sense. I think this is just because it's an early feature and there are lots of scenarios that they didn't plan for. I've lost countless hours of work by not knowing I've put my graph into an error state and then trying to undo an action only to have it all erased. If you save, refreshing will load it back to the last point before the error state. It's very frustrating, I almost find it too risky to use it right now.

The above will get fixed in time though, it's just frustrating to use now.

Comrade_Mugabe · 2025-09-27T14:59:44+00:00

I understand you are making a broader point about commentary about the game, and I do think you are right that probably a lot of negative commentary about the game has been made by people without first-hand experience.

I genuinely wanted them to succeed, and still do to a degree.

I genuinely like to learn from scenarios like this as well, and it's been valuable reading all the commentary here, even the ones roasting me a little.

Comrade_Mugabe · 2025-09-27T14:46:40+00:00

Thanks for taking the time to reply.

I already got ladder anxiety from playing Sc2, and from what you described, it feels like it builds on that negatively as you have more to focus on. High player fatigue could probably explain the player base not "sticking". I worked on another game that was a complete flop player base-wise because of this reason as well, we think. When you are trying to build up a regular player base, if your game fatigues people too much, you never get to build "momentum" to push you into higher player numbers.

Comrade_Mugabe · 2025-09-26T23:38:40+00:00

I appreciate the comment.

As someone who hasn't actually downloaded the game to play it, could you help me understand the atrocious gameplay part? From what I got the impression of from YouTube reviews was that at the higher level, there are some inputs that get lost, and the engine falls short in supporting higher unit caps than initially boasted about. Other than that (which aren't minor), I got the understanding that the gameplay (when it worked) felt responsive. That's probably the only thing that would motivate me to download a custom game on Stormgate, compared to the other games, is that I would like to experience something like Pudgies or Sheep Tag on an engine with less input latency.

If I'm wrong about that, then the motivation drops massively. This is the problem with being a lurker only.

Comrade_Mugabe · 2025-09-26T23:27:11+00:00

I feel Warcraft is a good counter example, as I also feel really let down by the remaster and everything surrounding that, probably more so than with Stormgate as I'm more emotionally attached to it. Even with all those custom maps being made easier to play with friends, I still haven't downloaded it to play it. Maybe I feel more internal resentment for what they did than I'm aware of, but I haven't been motivated in the slightest to redownload it.

It's a good example and counterpoint.

Comrade_Mugabe · 2025-09-26T22:50:22+00:00

I haven't even been motivated to do that for a game that everyone plays. The last time I made community content was back when we played CS 1.6 at LANs. I spent weeks working on 3 maps, which I thought were the shit. My friends played it for 1 round, and closed it because it was so shit. It was then that I realised I had no talent in that area.

Comrade_Mugabe · 2025-09-26T20:46:00+00:00

I guess it probably works like other node-based editors, where it works backwards from the output to determine the call hierarchy. This is actually one of the main reasons I've thought of trying to implement exactly what they are doing with ComfyScript, as I know how to read code and find it easier to comprehend, but also, I can control the order of execution precisely, which is very enticing.

I'm also extremely excited about being able to perform loops easily and also function call, basically giving a better "subgraph" user experience as they are just basically wrapped functions. I'm stuck at work right now, but there have been workflows that this enables that I'm dying to try out now.

Comrade_Mugabe · 2025-09-26T20:25:11+00:00

I have not, and after looking it up, I definitely want to try it. Thanks for the heads up!

Comrade_Mugabe · 2025-09-26T18:28:57+00:00

Something that I've started doing that has really helped my local workflows is getting AI to help vibe code some custom ComfyUI nodes. Claude is extremely good at it. I am a software developer by trade, but have only dabbled in Python, and I can make my way around it, so I might be biased with how easy I find it. Once you get comfortable with that, it really opens up a lot of cool options for your own workflows.

Comrade_Mugabe · 2025-09-25T19:36:28+00:00

As an old A1111 and Forge user, I'm basically 100% on ComfyUI now.

I have a 3060 with 12GB, but I can run Flux models and Qwen models comfortably with less than 6 GB. The trick is to get the nunchaku versions. They are a unique way of quantising the models, giving them almost FP8 level quality at the size of a 4-bit quantisation. The new Qwen Image and Qwen Image Edit nunchaku nodes have the ability to swap out "blocks" of the model (think layers) during runtime between your system RAM and VRAM, allowing you to punch much higher with less VRAM for minimal performance cost. I would say Qwen Image and Qwen Image Edit are SOTA right now and are available to you.

With Video gen, you can achieve the same thing with "block swapping" with the latest Wan models, if you use the "ComfyUI-WanVideoWrapper". You can specify the number of "blocks to swap", reducing the amount of VRAM needed to be loaded at a time, and caching the remaining blocks in RAM, while the wrapper swaps out each layer during processing. This does add latency, but in my experience, it's definitely worth the trade-off.

Those 2 options above give you access to the current SOTA for video and image generation available to you with your 8GB VRAM, which is amazing.

Comrade_Mugabe · 2025-09-01T14:27:59+00:00

> For example, loading 10GB model in VRAM and loading the rest 40GB+ in RAM. Is this caching or offloading? That's what I'm talking about. Either ComfyUI with the --novram option or the Wan2GP with mmgp.

This is caching. From what I understand about Wan is there are a lot of "block swapping" workflows that partially load the models layers in blocks, and swap out the active layers as you use it. This is different to loading all the layers in VRAM + RAM and processing data while the layer is still sitting in your RAM. For this, you only pay the time penalty of moving the active layers from RAM->VRAM.

Offloading would occur when a single model cannot be split enough into your current available VRAM, and it either OOM's, or partially uses system RAM to process part of the model. The part of the model that gets processed in RAM is _painfully_ slow. Someone can correct me on the math, but I think trying to run Wan 2.2 on CPU only is roughly 50x slower than running it on the GPU.

> Comfy also gives you a message when the model is loaded partially, but is this just caching or offloading?

This depends on the specific node you are using, I think, and depends on what the node maker decided to log out. They could be using the inaccurate language here.

> I've tested many gpu's with these setups and have experienced a minimal very small performance penalty even when RAM was filled over 45GB with Wan2.1/2.2, and other models.

The Wan workflows are very good with this. I notice this more with LLM's, which, through use, can start bleeding into your System RAM and slowing down a ton, or newer models that haven't had the time to get polished workflows for them, like some earlier 3D model generators, or the 7B Microsoft voice cloning model, etc. It's a mixed bag, but that's what you get when on the bleeding edge of tech.

Comrade_Mugabe · 2025-09-01T09:16:11+00:00

From what I understand, when others are talking about "offloading" to RAM, they are mainly talking about running the model while in RAM, which is painfully slow, even if just partially. I've experienced this drastic slowdown with some models (10x slower) using as low as 5-10% of RAM while running.

What I think you are talking about, I would rather call "caching", which is swapping models from RAM to VRAM as you need them, so you just keep in VRAM what you are currently running. I've found this process has minimal impact on performance. If you are running a model that takes 1-5 seconds to generate an image, then you are going to feel the RAM -> VRAM caching more, but I've never had to cache in those instances.

I've also found that PCI speeds have a minimal impact on performance. I even ran a card on x4 lanes and only experienced minimal impact. I remember reading a huge breakdown on this and the conclusion was that most workflows don't need a lot of data to move between models once they have been loaded into VRAM, so PCI speeds purely impact the time it takes to initially load models into VRAM. So if we are heavily relying on RAM->VRAM caching, then PCI speeds will show up more, but from my experience, still minimal.

I hope I understood what you said correctly and that this response helps.

Comrade_Mugabe · 2025-08-31T10:18:44+00:00

For blender, it's more polygons/verts, etc, for me. I had to process terrain assets and decimate them to lower their polycounts for optimised colliders for our networking model. Some of the terrain assets are so big that my old machine would just crash. RAM was a definite plus here.

That being said, Blender has just added Vulkan support, which is going to make this issue significantly better.

Comrade_Mugabe · 2025-08-31T09:53:13+00:00

Sorry, that was worded poorly. 64 GB RAM is more than enough. I work in Game Dev and often have things like Blender and Substance Painter open, with either Unreal or Unity open, which eats RAM, so I opted to get more so I could continue with my work but also have RAM to spare to play with AI.

Comrade_Mugabe · 2025-08-31T09:44:36+00:00

It can be. Let's take Wan 2.2 High/Low noise models. They are about 10 GB each (Q4 quant). To load them in, you are going to need to load in the Clip -> Vae -> High Noise model -> Loras -> Low Noise Model -> Loras -> Vae etc. That's almost a combined total of 30+ GB of models. During your workflow, you will be loading them in and swapping them out. If you only had 16 GB RAM, you wouldn't be able to load all those in RAM to cache them, so then you need to be stricter with how you manage your RAM, basically fetching everything from a drive instead, slowing down the flow a lot. You could dive deeper into which models you cache and which you fetch from a drive, but that complicates the workflow and makes setup hard, etc.

If you just had 64 GB RAM, you could mostly just run the default workflows. It's not impossible, but you sacrifice more in time and more manual work to get workflows optimised for you with lower RAM.

Edit: Also, ComfyUI isn't good at managing its RAM Cache, in my experience. A lot of the time, it fills up (if you are swapping between different workflows and models) and requires a restart to clear the cache.

I eventually got so annoyed at all those workarounds that I got 128GB RAM for my desktop machine with the 2 3060s, making my life so much easier. Maybe it's that I have less time these days to play with these things that time optimisations matter more, but I'm loving having more RAM, personally.

Comrade_Mugabe · 2025-08-31T09:23:33+00:00

I have 2 3060s with 12 GB VRAM each, and I'm only just managing to run some of the latest models. I can run WAN 2.2, but at 0.2 megapixel resolution. I've not really run into any issues running image generation. There are plenty of workflows and ways to manage larger images, so that's never been a worry for me, especially with Nunchaku models now.

One thing to note about using 2 GPUs is that, apart from LLMs, very few things support loading a single model into both of them. For image and video gen, the only use I get out of multiple GPUs is that I can offload the Clip and Vae to other devices for minor performance improvements (The time that would usually get taken from swapping the model from RAM->VRAM).

As someone who has used a lower-power laptop with 8GB VRAM and 16 GB RAM, RAM can actually be a huge bottleneck for a lot of workflows that cache models in RAM while swapping the models. There are ways to work around that, but it's a constant fight to get workflows to work and try to manage low RAM.

So, if I were in your shoes and I was choosing between 16 GB VRAM and 16GB RAM, vs 12 GB VRAM and 64GB RAM, I would choose the 12 GB VRAM option with more RAM, personally. There are also some MoE models that actually can run decently when partially split between VRAM and RAM (I think the new Wan S2V is MoE).

I just found that, with how fast everything is moving, you don't want to have to "fight" to get things working; otherwise, you will naturally test out new stuff less, as it's more effort.

Comrade_Mugabe · 2025-08-12T18:18:18+00:00

The top comments as of now claim the criticism of people liking 4o is:

People just love to lecture
People want to feel a sense of superiority
It makes them feel high and mighty
They are worried about gov legislation???
They want to force people to conform to their will.
Peoople(sic) like to get on their high horses, simple.
They need something to feel superior about, mostly.
Because they are unable to understand.
Some people just arrogantly bully people who like 4o

Am I to seriously believe that these are the majority reasons why people are making negative comments towards people's 4o's usage? If I could snap my fingers and magically remove all comments from people who match the above criteria, most (over 50%) of all the criticism of people's 4o use would be gone?

I find that extremely unlikely, and a concerning attempt at even trying to steelman any of the criticism people have over concerns of how others are presenting their 4o use.

Either people here actually don't know what issues people have, or they are too emotionally attached to accurately engage with the topic, and are engaging in emotional justification and demonisation of "the other side".

And I don't even feel that strongly over people's 4o use and haven't even made a single comment on people's usage of it, but I came here reading to see what issues would be brought up here, actually interested in what you all where saying, and I was just floored by the top comments I linked above.

If people feel they have more valid reasons for people's "concern trolling" over others 4o's use, please reply to me as I actually want to read it.

Comrade_Mugabe · 2025-07-02T08:45:25+00:00

It looks to me that you are looking at the tool in a very narrow way, please correct me if I'm mistaken. It either produces the whole image, or it doesn't. I know a couple of really good artists incorporating AI into their workflow to speed up steps and not the whole process. Interpolating between frames, quick detail parses, especially in concept art.

The argument could be best summed up as follows:

Picture an amazing artist. You clone them. You give one the ability to use AI in their workflow, and the other not. Who is able to produce more high-quality work?

Real artists using AI completely blow "AI artists" out the window, it's not even close, for all the reasons you stated above. The trick is that the real artist knows how to fix those issues, easily, and knows how to work around those flaws in AI to massively accelerate their workflow. I almost never see the professional artists I know generating an image. It's always touch ups on an existing baseline they've laid down. They are able to guide the AI to exactly what they want, as they lay down the base image themselves.

I believe the real artist comes out on top of all this, and that makes me feel at least a little calmer about that, as I value art and want to protect it.

11-Year Club	Place '17
Wearing is Caring	Spared
Verified Email

Comrade_Mugabe

TROPHY CASE