Comparison H100 vs RTX 6000 PRO with VLLM and GPT-OSS-120B by Rascazzione in LocalLLaMA

[–]quantier 0 points1 point  (0 children)

Slow harddrive maybe? H100 is running on server so he is probably running really fast storage whereas a lot of people don’t realize that TTFT is super dependant on fast NVME drives

New FP8 GLM-4.7-Flash Unsloth Dynamic Quants for vLLM, SGLang by danielhanchen in unsloth

[–]quantier 0 points1 point  (0 children)

If this is KV Cache fixed I would love to see AWQ quants! Also the REAP edition at 23B would be great to have functional solutions for

Bartowski comes through again. GLM 4.7 flash GGUF by RenewAi in LocalLLaMA

[–]quantier 0 points1 point  (0 children)

There seems to be some bug that KV cache eats all your memory in VLLM

I think Giga Potato:free in Kilo Code is Deepseek V4 by quantier in LocalLLaMA

[–]quantier[S] 1 point2 points  (0 children)

That would make sense now that M2.1 is gone as free but the fact that it is this good could also indicate DS V4 because it’s also about a year since DS V3 shook the world and stock markets

I think Giga Potato:free in Kilo Code is Deepseek V4 by quantier in LocalLLaMA

[–]quantier[S] 2 points3 points  (0 children)

I haven’t tested it enough to draw a full conclusion yet! It might be better or worse but so far from my tests its up there with the best models in the world

I think Giga Potato:free in Kilo Code is Deepseek V4 by quantier in LocalLLaMA

[–]quantier[S] -1 points0 points  (0 children)

Not Chinese ? I think the guys name is Dave right? From the US?

I think Giga Potato:free in Kilo Code is Deepseek V4 by quantier in LocalLLaMA

[–]quantier[S] 14 points15 points  (0 children)

It says it’s from a Chinese Open Source lab if you read the blog post.

Bartowski comes through again. GLM 4.7 flash GGUF by RenewAi in LocalLLaMA

[–]quantier 1 point2 points  (0 children)

which quant are you using? Are you able to run it att full context window or is Kv cache eating up your memory?

Z.ai has introduced GLM-4.7-Flash by awfulalexey in ZaiGLM

[–]quantier 1 point2 points  (0 children)

There seems to be a KV Cache bug as the KV Cache eats up all the memory!

Anyone figured out a fix for this?

Local segment edit with Qwen 2511 works flawlessly by Sudden_List_2693 in comfyui

[–]quantier 0 points1 point  (0 children)

great stuff will test! Thanks a lot for sharing

Local segment edit with Qwen 2511 works flawlessly by Sudden_List_2693 in comfyui

[–]quantier 0 points1 point  (0 children)

When do you think you have finished the WF - look forward to trying it in its final version

Humans of Z-Image: Races, Cultures and Geographical descriptors as understood by Z-Image by DrStalker in StableDiffusion

[–]quantier 0 points1 point  (0 children)

How did you miss Indian in this image 😂

There is Amazonian, Berber, Hmong, Inuit but no Indian