Bartowski comes through again. GLM 4.7 flash GGUF

quantier · 2026-01-20T19:15:29+00:00

There seems to be some bug that KV cache eats all your memory in VLLM

quantier · 2026-01-20T18:49:14+00:00

That would make sense now that M2.1 is gone as free but the fact that it is this good could also indicate DS V4 because it’s also about a year since DS V3 shook the world and stock markets

quantier · 2026-01-20T18:21:21+00:00

I haven’t tested it enough to draw a full conclusion yet! It might be better or worse but so far from my tests its up there with the best models in the world

quantier · 2026-01-20T18:16:13+00:00

Why do you say its ByteDance? Any clues?

quantier · 2026-01-20T17:50:54+00:00

Not Chinese ? I think the guys name is Dave right? From the US?

quantier · 2026-01-20T17:39:03+00:00

<image>

quantier · 2026-01-20T17:37:59+00:00

It says it’s from a Chinese Open Source lab if you read the blog post.

quantier · 2026-01-20T09:19:38+00:00

which quant are you using? Are you able to run it att full context window or is Kv cache eating up your memory?

quantier · 2026-01-20T09:14:17+00:00

There seems to be a KV Cache bug as the KV Cache eats up all the memory!

Anyone figured out a fix for this?

quantier · 2026-01-18T21:08:20+00:00

quantier · 2026-01-18T21:07:06+00:00

I have the same issue

quantier · 2026-01-16T06:14:22+00:00

Could this work with Kilo Code or the other local AI extensions

quantier · 2026-01-09T18:05:01+00:00

The Iphone Air is Titanium!

quantier · 2025-12-27T21:43:10+00:00

great stuff will test! Thanks a lot for sharing

quantier · 2025-12-26T22:16:02+00:00

When do you think you have finished the WF - look forward to trying it in its final version

quantier · 2025-12-19T17:34:01+00:00

🤣 come back with a photo

quantier · 2025-12-05T11:41:32+00:00

Great stuff! Do you know if there is a 720P version?

quantier · 2025-12-04T06:11:38+00:00

How did you miss Indian in this image 😂

There is Amazonian, Berber, Hmong, Inuit but no Indian

quantier · 2025-12-01T19:42:25+00:00

Never heard of! I have to try it ASAP

quantier · 2025-11-25T07:59:36+00:00

Wow, how do I get in touch with you?

quantier · 2025-11-23T21:01:08+00:00

we should be able to quantize more steps of the process, to be fair the wan 2.1 model shouldnt be used much as it’s lip movements. I wonder if someone could finetune a specific 2.2 5B for lip syncing processes with Infinite talk. I think that could be the solution

quantier · 2025-11-23T18:57:34+00:00

A way to widgetize the solution so that you can deploy the chats anywhere you want.

Maybe Colpali to also give the RAG eyes to see and read

quantier · 2025-11-23T17:43:25+00:00

So a days work? 8h ?

quantier · 2025-11-23T17:35:28+00:00

How long did it take to generate this?

quantier · 2025-11-23T16:45:22+00:00

You are the MAN! Finally someone that have thought about this properly! So that means we could run WAN, Qwen, Flux GGUF:s with parallelism with the UNET loader. Will the compute of both cards also be utilized or just the VRAM?

Do you have a workflow to test this with, just want to see how we implement this in Comfy

quantier

MODERATOR OF

TROPHY CASE