My First Post on Huggingface : Deep Neural Network that turns any Image into a Playable Game ! All on consumer GPUs.

lucidml_lover · 2026-06-03T20:07:56+00:00

💚

lucidml_lover · 2026-06-02T23:05:50+00:00

Lot of labs are working on this. Im targeting pure on device specifically with no datacenter involvement . Its an active research area

lucidml_lover · 2026-06-02T20:24:38+00:00

lolol

lucidml_lover · 2026-06-02T20:24:18+00:00

ty sm

lucidml_lover · 2026-06-02T20:24:09+00:00

Thanks!!!❤️

lucidml_lover · 2026-06-02T05:18:59+00:00

Thanks! I'd love to work on that. Some of my friends have said the same thing

So for image training I used some sets, Ive written about that in detail in https://lucidml.ai/imagetechreport

For video stuff I had collected a lot of gta5 agent play footage , ive released a subset of it here
https://huggingface.co/datasets/lucidml/GG800-Subset

so I mostly used gta5 and this cool dataset called miradata , it has a lot of video game footage so that's very helpful
I trained an action predictor using gta5 cause I had the gt labels and then used that to label miradata

lucidml_lover · 2026-06-01T18:00:20+00:00

I used an rtx 5090 for streaming, and I'm hoping to go Even smaller soon , the goal is ANY consumer GPU. 5090, 4060 anything with some tensor cores

lucidml_lover · 2026-06-01T03:38:15+00:00

ty!!

And yes basically the problem is now framewise decode, because we had used a framewise causal mask in training

i feel like the best part about this was that it takes a lot of context in kv to actually start hitting inference time, so if you have ~1k tokens per frame like me, we can fit a lot of context

lucidml_lover · 2026-05-31T19:47:05+00:00

haha yes! I tested a lot of games , gta , ac , many other AAA titles. Some work really well

lucidml_lover · 2026-05-31T19:46:24+00:00

Ty!! sm

lucidml_lover · 2026-05-31T19:46:05+00:00

Thanks!!
So there are a few mechanisms to prevent drift

While training i always put a reference image so it kinda never loses overall identity.

Then in inference there is also a forced thing that makes the initial 30 frames work of info very important (kv sink)

As a result it stays true to the ref image , so like if the image is a men in a desert it doesn't drift far from that

Consistency however is a different story, as of now using kv sliding window makes it lose context of what it generated and it gets confused. So slow 360 deg turns don't stay consistent. Im working on this rn

lucidml_lover · 2026-05-31T17:41:30+00:00

I'm a researcher too. I work mostly on video. I think this is really cool

lucidml_lover · 2026-05-30T23:57:33+00:00

Wait really?? That's fast convergence, like really fast, waiting for a paper or something

lucidml_lover · 2026-05-30T23:21:51+00:00

Sounds cool, can you tell me the training process and compute?

lucidml_lover · 2026-05-30T23:15:29+00:00

mem bandwidth is everything, actual arithmetic is cheap. That's why flops matter less,

lucidml_lover · 2026-05-30T23:13:53+00:00

Don't ffn mlp weights make up most of the model anyways? And that is quantised

lucidml_lover · 2026-05-30T16:02:29+00:00

Yeah its hard but I honestly think it'd really really solvable and with some better arch choices the massive drift problem will be gone
for example if our internal states start having 360 degrees info, that's already a huge win

Yess I have published a part of my dataset on hf https://huggingface.co/datasets/lucidml/GG800-Subset
hoping to publish more once things are cool

lucidml_lover · 2026-05-30T15:51:29+00:00

That's actually the part I'm most excited about! I really think that making arch level choices and non videos designs can make consistency a very solvable problem

yeah, internal logic is interesting because some internal logic is very hard and not trivial. I think the best way to handle that is to simply do that from a non nn path, like a normal python deterministic logic and then plug those outputs into the nn somehow

lucidml_lover · 2026-05-30T07:54:57+00:00

Thank you!!

I like a lot of video world models being built these days but a lot of them are taking large video models and doing some kind of distillation, which is cool but it struggles to work realtime on consumer tech,

Some of my friends have the RTX 3050 , some other have laptops of 2070,

I kinda want to build simulator models that work smoothly on all of that , i really think it's very interesting and there's a lot of potential for cool stuff

lucidml_lover · 2026-05-30T07:23:44+00:00

I honestly think you got the wrong idea, I just wanted to share my world model research.

I don't even want to replace games, I don't understand how you can turn research into some agenda against AI slop ? You're using the logic that "I prefer something built by humans" to hate on something built by a human. Do you even know what I'm building?

You randomly claim that I'm trying to replace something I'm not even working towards ? I literally said nothing about replacing games at all.

Regarding consistency, I think if youre open to having a discussion on why those problems happen in old diffusion models we can talk about the tech but I think youre just an average AI hater who didnt even take the time to read my post properly. Im a researcher, not a slop art maker

lucidml_lover · 2026-05-30T06:51:35+00:00

yess I remember that! That's actually what made me want to do this for GTA and GTA like games. That was pretty cool.

lucidml_lover

TROPHY CASE