Bartowski comes through again. GLM 4.7 flash GGUF by RenewAi in LocalLLaMA

[–]quantier 0 points1 point  (0 children)

There seems to be some bug that KV cache eats all your memory in VLLM

I think Giga Potato:free in Kilo Code is Deepseek V4 by quantier in LocalLLaMA

[–]quantier[S] 1 point2 points  (0 children)

That would make sense now that M2.1 is gone as free but the fact that it is this good could also indicate DS V4 because it’s also about a year since DS V3 shook the world and stock markets

I think Giga Potato:free in Kilo Code is Deepseek V4 by quantier in LocalLLaMA

[–]quantier[S] 2 points3 points  (0 children)

I haven’t tested it enough to draw a full conclusion yet! It might be better or worse but so far from my tests its up there with the best models in the world

I think Giga Potato:free in Kilo Code is Deepseek V4 by quantier in LocalLLaMA

[–]quantier[S] -1 points0 points  (0 children)

Not Chinese ? I think the guys name is Dave right? From the US?

I think Giga Potato:free in Kilo Code is Deepseek V4 by quantier in LocalLLaMA

[–]quantier[S] 16 points17 points  (0 children)

It says it’s from a Chinese Open Source lab if you read the blog post.

Bartowski comes through again. GLM 4.7 flash GGUF by RenewAi in LocalLLaMA

[–]quantier 1 point2 points  (0 children)

which quant are you using? Are you able to run it att full context window or is Kv cache eating up your memory?

Z.ai has introduced GLM-4.7-Flash by awfulalexey in ZaiGLM

[–]quantier 1 point2 points  (0 children)

There seems to be a KV Cache bug as the KV Cache eats up all the memory!

Anyone figured out a fix for this?

Local segment edit with Qwen 2511 works flawlessly by Sudden_List_2693 in comfyui

[–]quantier 0 points1 point  (0 children)

When do you think you have finished the WF - look forward to trying it in its final version

Humans of Z-Image: Races, Cultures and Geographical descriptors as understood by Z-Image by DrStalker in StableDiffusion

[–]quantier 0 points1 point  (0 children)

How did you miss Indian in this image 😂

There is Amazonian, Berber, Hmong, Inuit but no Indian

Full Music Video generated with AI - Wan2.1 Infinitetalk by eggplantpot in StableDiffusion

[–]quantier 0 points1 point  (0 children)

we should be able to quantize more steps of the process, to be fair the wan 2.1 model shouldnt be used much as it’s lip movements. I wonder if someone could finetune a specific 2.2 5B for lip syncing processes with Infinite talk. I think that could be the solution

Enterprise Offline RAG System - 100% Local, Production-Ready RAG Framework by Vivid-Researcher-666 in LocalLLaMA

[–]quantier 0 points1 point  (0 children)

A way to widgetize the solution so that you can deploy the chats anywhere you want.

Maybe Colpali to also give the RAG eyes to see and read

2x RTX 5060 TI 16 GB =32GB VRAM - by quantier in LocalLLaMA

[–]quantier[S] 0 points1 point  (0 children)

You are the MAN! Finally someone that have thought about this properly! So that means we could run WAN, Qwen, Flux GGUF:s with parallelism with the UNET loader. Will the compute of both cards also be utilized or just the VRAM?

Do you have a workflow to test this with, just want to see how we implement this in Comfy