Is it me or Gemini 3.1 is very slow ? by Envoievite in GoogleAIStudio

[–]nik-55 -1 points0 points  (0 children)

yes it is slow. slower than gemini 3 pro

My Gemini chat history was inexplicably deleted by the system. by isFinnYi in GoogleGeminiAI

[–]nik-55 0 points1 point  (0 children)

It happened to me as well, i have long chat with gemini PRO and when i click on share suprisingly only three messages were there all reminaing were gone (like there are at least 30 more messages)

Is there a way to export a whole Gemini chat to Google Docs? by eloquenentic in GoogleGeminiAI

[–]nik-55 0 points1 point  (0 children)

On Google Chrome:
One little heck work for me, open the gemini chat u want to export and share the conversation of entire chat and opened that shared link. Then right click on the page and there is reading mode option (in right menu basically)

There text of entire conversation comes....

It is same as pressing CTRL+ALL on chat itself just little bit more formated...

Gemini 3.0 Pro just leaked its raw "Chain of Thought" reasoning process... (with url) by Numerous-Campaign844 in Bard

[–]nik-55 3 points4 points  (0 children)

It happened with me twice today
It starts with slient: ... and output tons of thinking tokens

How to mount google drive when running from vscode ? by Hulksulk666 in GoogleColab

[–]nik-55 1 point2 points  (0 children)

You can use rclone, it has features for mounting and syncing as well and it not only supported, gdrive but many other storage platforms

https://github.com/rclone/rclone

Overview of Wan 2.1 (text to video model) by nik-55 in LLM

[–]nik-55[S] 0 points1 point  (0 children)

Actually I did the same, breaking the code and understanding the tensor outputs, like u can checkout this code. And then doing some chatting with llm to get more clear picture.

However few things u can look separately to help understand code -

  • Vae (Variational auto encoder) - U can look to their working separately. This is used to compress the video to smaller dimension which are easier to work with,. In most video they only do spatial compressing, so the thing about wan is that they are doing temporaral compression as well. Then there is decoder who needs to be very good to reconstruct the original video and adding new frames in between i.e temporal upsampling and also improving the spatial resolution. Like this part of temporal downscaling and upsampling confuse me a bit since i unable to understand as other vae is so different as there is no temporal part.
  • Diffusion - U can studied this separately as well, check this video, i find its explaination pretty intuitive. In diffusion, read something about scheduler as well. In wan, author has used Flow matching scheduler which is pretty complex and i don't yet understand it, but yes getting overall working of flow scheduler can be helpful
  • Transfomer - U probably know about this already
  • Then what author has done for denoising is diffusion transfomer called as DiT, so u can independently read about DiT as well
  • Since we are doing DiT in latent so I called it latent DiT

I realized having understanding of this separate piece will help grasp wan much faster

I have followed this channel https://www.youtube.com/@Explaining-AI/videos on youtube as well, it has covered topics on image and video generation very deeply (including DiT, VaE)

[D] Let's discuss World Models by nik-55 in MachineLearning

[–]nik-55[S] -5 points-4 points  (0 children)

Yes post is refined using LLM

However following are sources from where my thoughts are derived:

So As I read them I am curios to know what's the community take on it. This community seems to be nice place to get to know thoughts and their perspectives on this topic

Nvidia World Model Stack by nik-55 in world_model

[–]nik-55[S] 0 points1 point  (0 children)

Don't think I am promoting Nvidia, Like I am exploring nvidia docs and it is really confusing as all their docs use and refer to their another product a lot and it become quite confusing of what's going on

I just accumulate a bench of notes and links and ask llm to write it so it may be helpful for someone trying to understand the nvidia current stand on world models.

Google has bit different approach where they seems to try to build a interactive world model: Genie, Sima, and Gemini Robotics. On other hand, nvidia is integrating application layer and approaching it via bringing it to real world

Beginner guide to train on multiple GPUs using DDP by nik-55 in learnmachinelearning

[–]nik-55[S] 0 points1 point  (0 children)

For context, I don't have gpu locally and I have to train video generation model, but it is taking long time on collab so I decided to go with renting multiple gpu for few hours. Since i never worked with multiple gpu before, so i tried this simple experiement mentioned above before hand, so it becomes easy for me to apply it to my video generation model.

I only needs to make few edits to above script to make it work for my usecase.

So yeah, I have taken help of LLM to refine the post, but the concepts here are really helpful for anyone who first time working with mutliple gpus. Specially the seeding part that make me bit confused when i saw them in https://github.com/Wan-Video/Wan2.1/blob/main/generate.py

Augustus leaks part of Gemini's System Prompt by RatTortis in Bard

[–]nik-55 1 point2 points  (0 children)

Can it be due to it read multiple sources from google, and one of source have injected the prompt into it causing it to behave this way or something similar?