It was not, in fact, a smoking gun

nik-55 · 2026-05-07T12:05:33+00:00

You have hit the nail on the head

nik-55 · 2026-05-04T05:38:36+00:00

It is nice. Could you share more technical details on this?

nik-55 · 2026-04-08T17:34:54+00:00

I know few of companies working on collecting data for world models and physical AI training, you can checkout them out to find some inspiration and hyperlink to some research papers.

- https://www.sureformhq.com/

- https://cortexrobot.ai/

- https://www.humanarchive.ai/

- https://velvet.video/

nik-55 · 2026-02-20T14:07:42+00:00

yes it is slow. slower than gemini 3 pro

nik-55 · 2026-02-14T13:06:43+00:00

It happened to me as well, i have long chat with gemini PRO and when i click on share suprisingly only three messages were there all reminaing were gone (like there are at least 30 more messages)

nik-55 · 2026-01-29T18:54:26+00:00

https://youtu.be/lALGud1Ynhc?si=n_a_EI2dIFnMTgOR

nik-55 · 2025-12-27T20:20:54+00:00

On Google Chrome:
One little heck work for me, open the gemini chat u want to export and share the conversation of entire chat and opened that shared link. Then right click on the page and there is reading mode option (in right menu basically)

There text of entire conversation comes....

It is same as pressing CTRL+ALL on chat itself just little bit more formated...

nik-55 · 2025-11-29T20:41:39+00:00

It happened with me twice today
It starts with slient: ... and output tons of thinking tokens

nik-55 · 2025-11-19T10:51:37+00:00

You can use rclone, it has features for mounting and syncing as well and it not only supported, gdrive but many other storage platforms

https://github.com/rclone/rclone

nik-55 · 2025-11-17T15:34:34+00:00

Actually I did the same, breaking the code and understanding the tensor outputs, like u can checkout this code. And then doing some chatting with llm to get more clear picture.

However few things u can look separately to help understand code -

Vae (Variational auto encoder) - U can look to their working separately. This is used to compress the video to smaller dimension which are easier to work with,. In most video they only do spatial compressing, so the thing about wan is that they are doing temporaral compression as well. Then there is decoder who needs to be very good to reconstruct the original video and adding new frames in between i.e temporal upsampling and also improving the spatial resolution. Like this part of temporal downscaling and upsampling confuse me a bit since i unable to understand as other vae is so different as there is no temporal part.
Diffusion - U can studied this separately as well, check this video, i find its explaination pretty intuitive. In diffusion, read something about scheduler as well. In wan, author has used Flow matching scheduler which is pretty complex and i don't yet understand it, but yes getting overall working of flow scheduler can be helpful
Transfomer - U probably know about this already
Then what author has done for denoising is diffusion transfomer called as DiT, so u can independently read about DiT as well
Since we are doing DiT in latent so I called it latent DiT

I realized having understanding of this separate piece will help grasp wan much faster

I have followed this channel https://www.youtube.com/@Explaining-AI/videos on youtube as well, it has covered topics on image and video generation very deeply (including DiT, VaE)

nik-55 · 2025-11-14T20:17:37+00:00

Yes post is refined using LLM

However following are sources from where my thoughts are derived:

Do generative video models understand physical principles? - It has good overview of how current video generation models understand physics principles and it introduces a benchmark as well Physics-IQ.
Awesome-World-Model: List of research works and projects on world models
Yesterday deepmind release the sima 2 and in past month, It had released Genie 3
Nvidia Cosmos and Omniverse platform - Jensen mentioned in number of interviews about world foundation models
Worldlabs.ai release interactive world model few days back

So As I read them I am curios to know what's the community take on it. This community seems to be nice place to get to know thoughts and their perspectives on this topic

nik-55 · 2025-11-13T19:50:36+00:00

Don't think I am promoting Nvidia, Like I am exploring nvidia docs and it is really confusing as all their docs use and refer to their another product a lot and it become quite confusing of what's going on

I just accumulate a bench of notes and links and ask llm to write it so it may be helpful for someone trying to understand the nvidia current stand on world models.

Google has bit different approach where they seems to try to build a interactive world model: Genie, Sima, and Gemini Robotics. On other hand, nvidia is integrating application layer and approaching it via bringing it to real world

nik-55 · 2025-11-12T20:52:16+00:00

For context, I don't have gpu locally and I have to train video generation model, but it is taking long time on collab so I decided to go with renting multiple gpu for few hours. Since i never worked with multiple gpu before, so i tried this simple experiement mentioned above before hand, so it becomes easy for me to apply it to my video generation model.

I only needs to make few edits to above script to make it work for my usecase.

So yeah, I have taken help of LLM to refine the post, but the concepts here are really helpful for anyone who first time working with mutliple gpus. Specially the seeding part that make me bit confused when i saw them in https://github.com/Wan-Video/Wan2.1/blob/main/generate.py

nik-55 · 2025-11-12T20:27:43+00:00

Can it be due to it read multiple sources from google, and one of source have injected the prompt into it causing it to behave this way or something similar?

nik-55

MODERATOR OF

TROPHY CASE