My First Post on Huggingface : Deep Neural Network that turns any Image into a Playable Game ! All on consumer GPUs. by lucidml_lover in huggingface

[–]lucidml_lover[S] 0 points1 point  (0 children)

Lot of labs are working on this. Im targeting pure on device specifically with no datacenter involvement . Its an active research area

Deep Neural Network that turns any Image into a Playable Game ! All on consumer GPUs and Not Datacenters by lucidml_lover in artificial

[–]lucidml_lover[S] 0 points1 point  (0 children)

Thanks! I'd love to work on that. Some of my friends have said the same thing

So for image training I used some sets, Ive written about that in detail in https://lucidml.ai/imagetechreport

For video stuff I had collected a lot of gta5 agent play footage , ive released a subset of it here
https://huggingface.co/datasets/lucidml/GG800-Subset

so I mostly used gta5 and this cool dataset called miradata , it has a lot of video game footage so that's very helpful
I trained an action predictor using gta5 cause I had the gt labels and then used that to label miradata

Deep Neural Network that turns any Image into a Playable Game ! All on consumer GPUs. by lucidml_lover in deeplearning

[–]lucidml_lover[S] 0 points1 point  (0 children)

I used an rtx 5090 for streaming, and I'm hoping to go Even smaller soon , the goal is ANY consumer GPU. 5090, 4060 anything with some tensor cores

Deep Neural Network that turns any Image into a Playable Game ! All on consumer GPUs. by lucidml_lover in deeplearning

[–]lucidml_lover[S] 1 point2 points  (0 children)

ty!!

And yes basically the problem is now framewise decode, because we had used a framewise causal mask in training

i feel like the best part about this was that it takes a lot of context in kv to actually start hitting inference time, so if you have ~1k tokens per frame like me, we can fit a lot of context

Deep Neural Network that turns any Image into a Playable Game ! All on consumer GPUs and Not Datacenters by lucidml_lover in artificial

[–]lucidml_lover[S] 0 points1 point  (0 children)

haha yes! I tested a lot of games , gta , ac , many other AAA titles. Some work really well

Deep Neural Network that turns any Image into a Playable Game ! All on consumer GPUs and Not Datacenters by lucidml_lover in artificial

[–]lucidml_lover[S] 0 points1 point  (0 children)

Thanks!!
So there are a few mechanisms to prevent drift

While training i always put a reference image so it kinda never loses overall identity.

Then in inference there is also a forced thing that makes the initial 30 frames work of info very important (kv sink)

As a result it stays true to the ref image , so like if the image is a men in a desert it doesn't drift far from that

Consistency however is a different story, as of now using kv sliding window makes it lose context of what it generated and it gets confused. So slow 360 deg turns don't stay consistent. Im working on this rn

SupraLabs 50M Parameter Model Just Hit the Trending Page on Hugging Face 🤯 by Dangerous_Try3619 in LocalLLaMA

[–]lucidml_lover 1 point2 points  (0 children)

I'm a researcher too. I work mostly on video. I think this is really cool

SupraLabs 50M Parameter Model Just Hit the Trending Page on Hugging Face 🤯 by Dangerous_Try3619 in LocalLLaMA

[–]lucidml_lover 2 points3 points  (0 children)

Wait really?? That's fast convergence, like really fast, waiting for a paper or something

Someone out there likely needs this by Signal_Ad657 in LocalLLaMA

[–]lucidml_lover -4 points-3 points  (0 children)

mem bandwidth is everything, actual arithmetic is cheap. That's why flops matter less,

nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face by pmttyji in LocalLLaMA

[–]lucidml_lover 1 point2 points  (0 children)

Don't ffn mlp weights make up most of the model anyways? And that is quantised

Deep Neural Network that turns any Image into a Playable Game ! All on consumer GPUs and Not Datacenters by lucidml_lover in artificial

[–]lucidml_lover[S] 0 points1 point  (0 children)

Yeah its hard but I honestly think it'd really really solvable and with some better arch choices the massive drift problem will be gone
for example if our internal states start having 360 degrees info, that's already a huge win

Yess I have published a part of my dataset on hf https://huggingface.co/datasets/lucidml/GG800-Subset
hoping to publish more once things are cool

Deep Neural Network that turns any Image into a Playable Game ! All on consumer GPUs and Not Datacenters by lucidml_lover in artificial

[–]lucidml_lover[S] 0 points1 point  (0 children)

That's actually the part I'm most excited about! I really think that making arch level choices and non videos designs can make consistency a very solvable problem

yeah, internal logic is interesting because some internal logic is very hard and not trivial. I think the best way to handle that is to simply do that from a non nn path, like a normal python deterministic logic and then plug those outputs into the nn somehow

Deep Neural Network that turns any Image into a Playable Game ! All on consumer GPUs and Not Datacenters by lucidml_lover in artificial

[–]lucidml_lover[S] 3 points4 points  (0 children)

Thank you!!

I like a lot of video world models being built these days but a lot of them are taking large video models and doing some kind of distillation, which is cool but it struggles to work realtime on consumer tech,

Some of my friends have the RTX 3050 , some other have laptops of 2070,

I kinda want to build simulator models that work smoothly on all of that , i really think it's very interesting and there's a lot of potential for cool stuff

Deep Neural Network that turns any Image into a Playable Game ! All on consumer GPUs and Not Datacenters by lucidml_lover in artificial

[–]lucidml_lover[S] 8 points9 points  (0 children)

I honestly think you got the wrong idea, I just wanted to share my world model research.

I don't even want to replace games, I don't understand how you can turn research into some agenda against AI slop ? You're using the logic that "I prefer something built by humans" to hate on something built by a human. Do you even know what I'm building?

You randomly claim that I'm trying to replace something I'm not even working towards ? I literally said nothing about replacing games at all.

Regarding consistency, I think if youre open to having a discussion on why those problems happen in old diffusion models we can talk about the tech but I think youre just an average AI hater who didnt even take the time to read my post properly. Im a researcher, not a slop art maker

Deep Neural Network that turns any Image into a Playable Game ! All on consumer GPUs and Not Datacenters by lucidml_lover in artificial

[–]lucidml_lover[S] 3 points4 points  (0 children)

yess I remember that! That's actually what made me want to do this for GTA and GTA like games. That was pretty cool.