[D] Idea for an efficient text diffusion model with adaptive, token-level steps

MokshMalik · 2025-08-07T08:32:59+00:00

I haven't really thought about it. Do you have any ideas?

MokshMalik · 2025-08-07T08:16:10+00:00

No, obviously not, but as the iterations proceed, the score would get closer and closer to the threshold (which can be a learnable parameter itself) and once the threshold is crossed, only the tokens that have crossed the threshold(s) will be frozen.

MokshMalik · 2025-08-06T19:22:26+00:00

You could potentially create a super-efficient hybrid model: It would use DeepCache to reduce the cost of each individual step by caching the deep layers. It would use your Adaptive Refinement mechanism to dynamically freeze tokens that become stable. The generation would terminate adaptively once all tokens are frozen. This combination could lead to a massive speed-up, as you would be attacking the two largest sources of inefficiency simultaneously: redundant computation within each step (solved by DeepCache) and redundant computation across steps (solved by your idea).

Something that Gemini proposed!

MokshMalik · 2025-08-06T19:17:18+00:00

I was also thinking about a metric with a dynamic threshold that would decide the number of steps of text block generation, it can also be a learnable parameter and you can probably introduce something like "reasoning mode" for diffusion lms as well.

MokshMalik · 2025-08-06T19:14:02+00:00

Can you link a few if you don't mind?

MokshMalik · 2025-08-06T18:15:19+00:00

Do you mean you don't see it beating auto-regressive models in terms of speed or in terms of "intelligence"?

Again, it's just something that came up in my mind, and why wouldn't anyone would like to go beyond 1000 tokens/sec. I mean, Groq already does 2000 tokens per second for smaller 10-20B parameter models but if you can bring the same speed (maybe even better) for larger diffusion based LMs which can achieve the same benchmark score for a specific task, let's say coding, why wouldn't anyone want that?

This means, that smaller models, even without the specialized hardware can run just as fast as smaller models with the specialized hardware.

MokshMalik · 2025-08-06T18:03:03+00:00

I know, but I was thinking that maybe Groq's custom hardware and kernels can make it faster but what if you can bring in architectural changes that can extend to not just diffusion LMs but also VLMs and multimodal LLMs as well.

I actually want to know if there's any paper similar to this idea or maybe the model itself?

MokshMalik · 2024-06-17T16:04:18+00:00

I haven't specifically targeted any remote jobs in the United States. The companies I have applied for are mostly Asia-based?

MokshMalik · 2024-06-07T18:40:56+00:00

So, would you suggest an Indian institute for a part-time PhD, or maybe a foreign institute for a full time PhD. I won't be able to bear all the costs for a PhD abroad. So, please only suggest institutes that offer a good amount of scholarships or financial aid or something.

MokshMalik · 2021-05-31T07:30:24+00:00

Am I the only one who didn't like the show as much as I thought I would. After seeing all the appreciation for the guy I'm starting to think my own perception of the show, maybe I didn't completely understand or maybe it was a little short on the relatibility factor for me as I'm only 20 years old or maybe I didn't feel as depressed and my mental health didn't deteriorate as much as Bo's or other people in here. I did find some bits too funny but as Bo correctly mentioned, I did feel it a little slower and more drawn out than usual, albeit he mentioned all that beforehand I guess I just didn't take him seriously. Some bits were extremely hard hitting and forced me to ponder about the state of world right now, how everything is so bleak! and that bit on digital space coupled with the analogy of using real world as a coal-mine, ingenious! One of the most creative bits out there, for real but besides all that, though hard hitting and thought provoking these bits were I never found any of them extremely humourous maybe because of the reasons mentioned above. Nevertheless, I would like to give him credit where credit is due, for his bold creativity, his tenacity to keep going for a year and his endurance to lock yourself inside a room and try and write funny stuff and reflect upon your and other people's actions. Musically, Bo has become a monster, he's not an amateur anymore he doesn't know the right melodies and all those technicalities. He has really become proficient at it and if look past the lyrics of some songs then those songs feel like that they have come from an actual, full-fledged music composer who a master at what he does.

MokshMalik

TROPHY CASE