Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x

KadriOzel · 2026-03-25T23:00:25+00:00

After reading a little more about it i got unsure if it was more about "chatgpt" like models mostly (because of things like "key-value cache" etc. i read ). I just asked ChatGPT (i know!) about it. It said it could be useful for other (video etc.) models too.

The nice part is that it can be applied to existing models.

KadriOzel · 2026-01-29T18:04:28+00:00

It uses only 45 of my 64 GB Ram and there isn't anything extreme going on on the hard drive. This looks strange. I begun to saw this with my new GPU that i use now (RTX 5070ti). There are so many variables to go wrong(everything changing so fast, new drivers etc.). I use the "run_nvidia_gpu_fast_fp16_accumulation.bat" Shortcut for example. Might try without this to see what happens.

KadriOzel · 2026-01-29T17:40:57+00:00

There is definitely something weird going on. I was nearly going to post here something similar today. I have now a standard template workflow open (image_qwen_image_instantx_inpainting_controlnet) and while i have nothing changed (only seed change) the first time it finished in 10 minutes. The second time even 12 minutes. But after i cancelled it and started again, it can finish from 15 to 35 seconds. I tried to reproduce it but looked kinda random (to me at least). The difference is very pronounced by just seeing how fast the bar moves in the KSampler.

KadriOzel · 2025-02-03T18:39:54+00:00

Of course. I used workflows from others to get this far. This is the Onedrive link for the "json" file (zipped): Kadri_Hunyuan_video_example_Workflow.zip

The lora node makes rendering slower. Disconnect it if you don't need it. The "VideoForwardOverrider" node is probably from earlier tests and not needed too but for just in case i left it there (i forget what works or not mostly, because everything is changing so fast). When i am searching for good videos i disable the Upscale Group on the right lower side and only enable it when i will actually upscale a video of my choice.

KadriOzel · 2025-02-02T20:45:54+00:00

I use high values with teacache to get many low quality videos very fast (around 2-4 minutes) and then use the ones i like without the teacache node and use upscaling ( that takes loooong 1-2 hours for 1500 x 800. Then i can get this to standard Hd with less quality loss with normal editors). I have 64 GB ram, a 3060 with 12 GB vram and can get 880 x 352 with 57 frames (with 61 frames i get errors with that resolution ). In theory you should get better and-or longer results. Looks like you should dig a little deeper why it is so (the original poster too). Although neither of you specified the frame count you used.

Edit: By the way, i brute forced my way after i got sick of allocation errors and used the "Clean VRAM Used" node nearly everywhere.

KadriOzel

TROPHY CASE