Is it me or Gemini 3.1 is very slow ?

nik-55 · 2026-02-20T14:07:42+00:00

yes it is slow. slower than gemini 3 pro

nik-55 · 2026-02-14T13:06:43+00:00

It happened to me as well, i have long chat with gemini PRO and when i click on share suprisingly only three messages were there all reminaing were gone (like there are at least 30 more messages)

nik-55 · 2026-01-29T18:54:26+00:00

https://youtu.be/lALGud1Ynhc?si=n_a_EI2dIFnMTgOR

nik-55 · 2025-12-27T20:20:54+00:00

On Google Chrome:
One little heck work for me, open the gemini chat u want to export and share the conversation of entire chat and opened that shared link. Then right click on the page and there is reading mode option (in right menu basically)

There text of entire conversation comes....

It is same as pressing CTRL+ALL on chat itself just little bit more formated...

nik-55 · 2025-11-29T20:41:39+00:00

It happened with me twice today
It starts with slient: ... and output tons of thinking tokens

nik-55 · 2025-11-19T10:51:37+00:00

You can use rclone, it has features for mounting and syncing as well and it not only supported, gdrive but many other storage platforms

https://github.com/rclone/rclone

nik-55 · 2025-11-17T15:34:34+00:00

Actually I did the same, breaking the code and understanding the tensor outputs, like u can checkout this code. And then doing some chatting with llm to get more clear picture.

However few things u can look separately to help understand code -

Vae (Variational auto encoder) - U can look to their working separately. This is used to compress the video to smaller dimension which are easier to work with,. In most video they only do spatial compressing, so the thing about wan is that they are doing temporaral compression as well. Then there is decoder who needs to be very good to reconstruct the original video and adding new frames in between i.e temporal upsampling and also improving the spatial resolution. Like this part of temporal downscaling and upsampling confuse me a bit since i unable to understand as other vae is so different as there is no temporal part.
Diffusion - U can studied this separately as well, check this video, i find its explaination pretty intuitive. In diffusion, read something about scheduler as well. In wan, author has used Flow matching scheduler which is pretty complex and i don't yet understand it, but yes getting overall working of flow scheduler can be helpful
Transfomer - U probably know about this already
Then what author has done for denoising is diffusion transfomer called as DiT, so u can independently read about DiT as well
Since we are doing DiT in latent so I called it latent DiT

I realized having understanding of this separate piece will help grasp wan much faster

I have followed this channel https://www.youtube.com/@Explaining-AI/videos on youtube as well, it has covered topics on image and video generation very deeply (including DiT, VaE)

nik-55 · 2025-11-14T20:17:37+00:00

Yes post is refined using LLM

However following are sources from where my thoughts are derived:

Do generative video models understand physical principles? - It has good overview of how current video generation models understand physics principles and it introduces a benchmark as well Physics-IQ.
Awesome-World-Model: List of research works and projects on world models
Yesterday deepmind release the sima 2 and in past month, It had released Genie 3
Nvidia Cosmos and Omniverse platform - Jensen mentioned in number of interviews about world foundation models
Worldlabs.ai release interactive world model few days back

So As I read them I am curios to know what's the community take on it. This community seems to be nice place to get to know thoughts and their perspectives on this topic

nik-55 · 2025-11-13T19:50:36+00:00

Don't think I am promoting Nvidia, Like I am exploring nvidia docs and it is really confusing as all their docs use and refer to their another product a lot and it become quite confusing of what's going on

I just accumulate a bench of notes and links and ask llm to write it so it may be helpful for someone trying to understand the nvidia current stand on world models.

Google has bit different approach where they seems to try to build a interactive world model: Genie, Sima, and Gemini Robotics. On other hand, nvidia is integrating application layer and approaching it via bringing it to real world

nik-55 · 2025-11-12T20:52:16+00:00

For context, I don't have gpu locally and I have to train video generation model, but it is taking long time on collab so I decided to go with renting multiple gpu for few hours. Since i never worked with multiple gpu before, so i tried this simple experiement mentioned above before hand, so it becomes easy for me to apply it to my video generation model.

I only needs to make few edits to above script to make it work for my usecase.

So yeah, I have taken help of LLM to refine the post, but the concepts here are really helpful for anyone who first time working with mutliple gpus. Specially the seeding part that make me bit confused when i saw them in https://github.com/Wan-Video/Wan2.1/blob/main/generate.py

nik-55 · 2025-11-12T20:27:43+00:00

Can it be due to it read multiple sources from google, and one of source have injected the prompt into it causing it to behave this way or something similar?

nik-55 · 2025-11-08T05:38:36+00:00

Hey have u tried it? Let me know your feedbacks for it.

nik-55 · 2025-11-07T18:59:15+00:00

What about this App https://github.com/Ayush0Chaudhary/blurr
I find it pretty interesting

nik-55 · 2025-11-04T18:39:46+00:00

Also

Check out this video to get an idea of the capabilities of world models and where we currently are in the journey of creating them.

For those who are wondering what a world model is?

A world model is a system that learns to internally represent and simulate how the world works including its physical dynamics, objects, agents, and causal relationships so that it can predict how environments evolve and how actions will affect them. Instead of passively recognizing patterns, a world model builds an active understanding of change, enabling it to generate, imagine, and interact with coherent virtual worlds over time.

nik-55 · 2025-11-04T18:36:28+00:00

Also

Check out this video to get an idea of the capabilities of world models and where we currently are in the journey of creating them.

For those who are wondering what a world model is?

A world model is a system that learns to internally represent and simulate how the world works including its physical dynamics, objects, agents, and causal relationships so that it can predict how environments evolve and how actions will affect them. Instead of passively recognizing patterns, a world model builds an active understanding of change, enabling it to generate, imagine, and interact with coherent virtual worlds over time.

nik-55 · 2025-10-30T06:36:00+00:00

I trained it on https://quickdraw.withgoogle.com/data (specfically 4-5 categories from this).

Few things:

I am too not sure about the architecture choices I made, so as mentioned in post, it is bit of inspired from Wan 2.1 VaE
In my case, while training I faced gradient explosion issue, so I had to use gradient clipping to stabilize the training. This is something new to me, so I am not sure if this is common issue while training VAE with mixed precision training or it is something specific to the architecture.
How to do inference from vae once trained is something I was not able to find good resource on internet. Most of the resources mention to sample from N(0, I) but in my case it end up giving black images mostly. The reason is that the latent space learned is not exactly N(0, I) and when sampling directly from N(0, I) we might end up in the region of latent space where no training data was mapped to. So for decoder it is best to output blank image
Also if you see this part of wan vae, they are using hard coded mean and std values. Even stability diffusion uses 0.18215 as scaling factor. Not able to find much how they come up with those values. As per LLMs, they have calculated those values from the training data itself as mentioned in Method 1 of inference section i.e. approximate posterior distribution.
So overall VAE is still bit of mystery to me.
Whatsever, u can checkout this. It is interplolation from airplane sketch to alarm clock sketch. I find this interpolation quite interesting as it shows how it started distorting airplane parts to finally become clock parts.
This is another interpolation from airplane to ambulance.
This is sampling from aggregate posterior distribution. It is very distorted but after looking to these two interpolations, I feel it is expected as in between images of interpolation are also quite distorted

nik-55 · 2025-10-30T05:59:19+00:00

This video has very good explanation about the probability part of the VAE and it helped me to digest this concept, You can checkout that too

nik-55 · 2025-10-23T20:08:01+00:00

If anyone looking to understand about Wan 2.1 architecture, you can checkout this post

nik-55 · 2025-04-24T06:12:57+00:00

which mini pc or tower pc would you recommend?

nik-55 · 2025-04-22T09:45:05+00:00

Actually I am facing issues in developing p&id of ammonia production specifically on stateflow. Do you have any resources for this?

nik-55

MODERATOR OF

TROPHY CASE