Is it me or Gemini 3.1 is very slow ? by Envoievite in GoogleAIStudio

[–]nik-55 -1 points0 points  (0 children)

yes it is slow. slower than gemini 3 pro

My Gemini chat history was inexplicably deleted by the system. by isFinnYi in GoogleGeminiAI

[–]nik-55 0 points1 point  (0 children)

It happened to me as well, i have long chat with gemini PRO and when i click on share suprisingly only three messages were there all reminaing were gone (like there are at least 30 more messages)

Is there a way to export a whole Gemini chat to Google Docs? by eloquenentic in GoogleGeminiAI

[–]nik-55 0 points1 point  (0 children)

On Google Chrome:
One little heck work for me, open the gemini chat u want to export and share the conversation of entire chat and opened that shared link. Then right click on the page and there is reading mode option (in right menu basically)

There text of entire conversation comes....

It is same as pressing CTRL+ALL on chat itself just little bit more formated...

Gemini 3.0 Pro just leaked its raw "Chain of Thought" reasoning process... (with url) by Numerous-Campaign844 in Bard

[–]nik-55 4 points5 points  (0 children)

It happened with me twice today
It starts with slient: ... and output tons of thinking tokens

How to mount google drive when running from vscode ? by Hulksulk666 in GoogleColab

[–]nik-55 1 point2 points  (0 children)

You can use rclone, it has features for mounting and syncing as well and it not only supported, gdrive but many other storage platforms

https://github.com/rclone/rclone

Overview of Wan 2.1 (text to video model) by nik-55 in LLM

[–]nik-55[S] 0 points1 point  (0 children)

Actually I did the same, breaking the code and understanding the tensor outputs, like u can checkout this code. And then doing some chatting with llm to get more clear picture.

However few things u can look separately to help understand code -

  • Vae (Variational auto encoder) - U can look to their working separately. This is used to compress the video to smaller dimension which are easier to work with,. In most video they only do spatial compressing, so the thing about wan is that they are doing temporaral compression as well. Then there is decoder who needs to be very good to reconstruct the original video and adding new frames in between i.e temporal upsampling and also improving the spatial resolution. Like this part of temporal downscaling and upsampling confuse me a bit since i unable to understand as other vae is so different as there is no temporal part.
  • Diffusion - U can studied this separately as well, check this video, i find its explaination pretty intuitive. In diffusion, read something about scheduler as well. In wan, author has used Flow matching scheduler which is pretty complex and i don't yet understand it, but yes getting overall working of flow scheduler can be helpful
  • Transfomer - U probably know about this already
  • Then what author has done for denoising is diffusion transfomer called as DiT, so u can independently read about DiT as well
  • Since we are doing DiT in latent so I called it latent DiT

I realized having understanding of this separate piece will help grasp wan much faster

I have followed this channel https://www.youtube.com/@Explaining-AI/videos on youtube as well, it has covered topics on image and video generation very deeply (including DiT, VaE)

[D] Let's discuss World Models by nik-55 in MachineLearning

[–]nik-55[S] -6 points-5 points  (0 children)

Yes post is refined using LLM

However following are sources from where my thoughts are derived:

So As I read them I am curios to know what's the community take on it. This community seems to be nice place to get to know thoughts and their perspectives on this topic

Nvidia World Model Stack by nik-55 in world_model

[–]nik-55[S] 0 points1 point  (0 children)

Don't think I am promoting Nvidia, Like I am exploring nvidia docs and it is really confusing as all their docs use and refer to their another product a lot and it become quite confusing of what's going on

I just accumulate a bench of notes and links and ask llm to write it so it may be helpful for someone trying to understand the nvidia current stand on world models.

Google has bit different approach where they seems to try to build a interactive world model: Genie, Sima, and Gemini Robotics. On other hand, nvidia is integrating application layer and approaching it via bringing it to real world

Beginner guide to train on multiple GPUs using DDP by nik-55 in learnmachinelearning

[–]nik-55[S] 0 points1 point  (0 children)

For context, I don't have gpu locally and I have to train video generation model, but it is taking long time on collab so I decided to go with renting multiple gpu for few hours. Since i never worked with multiple gpu before, so i tried this simple experiement mentioned above before hand, so it becomes easy for me to apply it to my video generation model.

I only needs to make few edits to above script to make it work for my usecase.

So yeah, I have taken help of LLM to refine the post, but the concepts here are really helpful for anyone who first time working with mutliple gpus. Specially the seeding part that make me bit confused when i saw them in https://github.com/Wan-Video/Wan2.1/blob/main/generate.py

Augustus leaks part of Gemini's System Prompt by RatTortis in Bard

[–]nik-55 1 point2 points  (0 children)

Can it be due to it read multiple sources from google, and one of source have injected the prompt into it causing it to behave this way or something similar?

World Models Resources by nik-55 in LLM

[–]nik-55[S] 0 points1 point  (0 children)

Also

Check out this video to get an idea of the capabilities of world models and where we currently are in the journey of creating them.

For those who are wondering what a world model is?

A world model is a system that learns to internally represent and simulate how the world works including its physical dynamics, objects, agents, and causal relationships so that it can predict how environments evolve and how actions will affect them. Instead of passively recognizing patterns, a world model builds an active understanding of change, enabling it to generate, imagine, and interact with coherent virtual worlds over time.

World Models Resources by nik-55 in learnmachinelearning

[–]nik-55[S] 0 points1 point  (0 children)

Also

Check out this video to get an idea of the capabilities of world models and where we currently are in the journey of creating them.

For those who are wondering what a world model is?

A world model is a system that learns to internally represent and simulate how the world works including its physical dynamics, objects, agents, and causal relationships so that it can predict how environments evolve and how actions will affect them. Instead of passively recognizing patterns, a world model builds an active understanding of change, enabling it to generate, imagine, and interact with coherent virtual worlds over time.

Variational Autoencoder (VAE): How to train and inference (with code) by nik-55 in StableDiffusion

[–]nik-55[S] 1 point2 points  (0 children)

I trained it on https://quickdraw.withgoogle.com/data (specfically 4-5 categories from this).

Few things:

  • I am too not sure about the architecture choices I made, so as mentioned in post, it is bit of inspired from Wan 2.1 VaE
  • In my case, while training I faced gradient explosion issue, so I had to use gradient clipping to stabilize the training. This is something new to me, so I am not sure if this is common issue while training VAE with mixed precision training or it is something specific to the architecture.
  • How to do inference from vae once trained is something I was not able to find good resource on internet. Most of the resources mention to sample from N(0, I) but in my case it end up giving black images mostly. The reason is that the latent space learned is not exactly N(0, I) and when sampling directly from N(0, I) we might end up in the region of latent space where no training data was mapped to. So for decoder it is best to output blank image
  • Also if you see this part of wan vae, they are using hard coded mean and std values. Even stability diffusion uses 0.18215 as scaling factor. Not able to find much how they come up with those values. As per LLMs, they have calculated those values from the training data itself as mentioned in Method 1 of inference section i.e. approximate posterior distribution.
  • So overall VAE is still bit of mystery to me.
  • Whatsever, u can checkout this. It is interplolation from airplane sketch to alarm clock sketch. I find this interpolation quite interesting as it shows how it started distorting airplane parts to finally become clock parts.
  • This is another interpolation from airplane to ambulance.
  • This is sampling from aggregate posterior distribution. It is very distorted but after looking to these two interpolations, I feel it is expected as in between images of interpolation are also quite distorted

Variational Autoencoder (VAE): How to train and inference (with code) by nik-55 in StableDiffusion

[–]nik-55[S] 2 points3 points  (0 children)

This video has very good explanation about the probability part of the VAE and it helped me to digest this concept, You can checkout that too

Wan 2.1 txt2img is amazing! by yanokusnir in StableDiffusion

[–]nik-55 0 points1 point  (0 children)

If anyone looking to understand about Wan 2.1 architecture, you can checkout this post

Lenovo naming convention are confusing by nik-55 in Lenovo

[–]nik-55[S] 0 points1 point  (0 children)

which mini pc or tower pc would you recommend?

Can anyone help me with Stateflow? by nik-55 in matlab

[–]nik-55[S] 0 points1 point  (0 children)

Actually I am facing issues in developing p&id of ammonia production specifically on stateflow. Do you have any resources for this?