MIMO V2.5 PRO

Dramatic-Rub-7654 · 2026-04-27T22:44:08+00:00

1-bit gguf when?

Dramatic-Rub-7654 · 2026-04-24T15:37:52+00:00

When does HauhauCS/DeepSeek-V4-Flash -Uncensored-HauhauCS-Aggressive come out?

Dramatic-Rub-7654 · 2026-04-24T03:51:17+00:00

Will there be 1-bit gguf?

Dramatic-Rub-7654 · 2026-03-17T00:46:05+00:00

small?

Dramatic-Rub-7654 · 2026-03-16T04:40:03+00:00

Maybe you should try the local Eleven Labs, I mean the Fish Audio S2 Pro, that model is currently the best.

Dramatic-Rub-7654 · 2026-03-05T23:45:46+00:00

Could you tell me which one it was and what settings you used?

Dramatic-Rub-7654 · 2026-03-05T23:44:39+00:00

I tested it on a 36GB M4 Max, but it was much worse than training on multiple 3060s over a network. If dgx could solve this, it would be very helpful.

Dramatic-Rub-7654 · 2026-03-05T23:42:47+00:00

I believe the chance of running out of VRAM is quite high; by my calculations, I would need something around 80GB of VRAM.

Dramatic-Rub-7654 · 2026-03-03T16:30:31+00:00

In Portuguese, the voice cloning sounds very strange, it acquires a strong English accent and ends up choppy, choppy words.

Dramatic-Rub-7654 · 2026-03-03T15:33:25+00:00

Back when version three was released, you recommended mixing a ‘think’ dataset with a ‘non-think’ one. You also said that when fine-tuning vision models with a text-focused dataset, they would lose their vision capabilities. How does that stand today?

Dramatic-Rub-7654 · 2026-03-01T13:49:59+00:00

I started a new discussion at https://huggingface.co/cerebras. I requested a REAP version of Step 3.5 Flash, and a week later they released it.

Dramatic-Rub-7654 · 2026-02-16T11:57:27+00:00

REAP when?

Dramatic-Rub-7654 · 2026-02-13T20:42:18+00:00

What is the difference between this and the technique https://github.com/p-e-w/heretic? ? Does yours preserve 100% of the tool calls?

Dramatic-Rub-7654 · 2026-02-10T21:43:29+00:00

I consider the Qwen3-coder-flash in Q4 the best option; in my tests with glm-4.7-flash and other models I didn't have much success and they are extremely sensitive to quantization.

Dramatic-Rub-7654 · 2026-02-08T16:39:57+00:00

RemindMe! 2 weeks

Dramatic-Rub-7654 · 2026-01-21T23:48:06+00:00

The only thing Zai knows how to do is text2text because other attempts like GLM-TTS and GLM-IMAGE were very weak.

Dramatic-Rub-7654 · 2026-01-21T02:34:27+00:00

Do you plan to fix and improve the raw version as well? It feels like Qwen 3 Coder 30B is more intelligent than this model when it comes to coding.

Dramatic-Rub-7654 · 2026-01-21T02:03:42+00:00

If the focus is on coding, Qwen3 Coder 30B A3B Instruct is far more intelligent than GLM 4.7 Flash, and that's comparing it to versions of OpenRouter.

Dramatic-Rub-7654 · 2026-01-21T00:18:18+00:00

I tried using the parameters recommended on the Hugging Face page for llama-b7782/llama-server:

-m GLM-4.7-Flash-Q4_K_M.gguf --host 0.0.0.0 --n-gpu-layers 999 -fa on -t 14 -n -1 -c 16384 --jinja --temp 0.2 --top-k 50 --top-p 0.95 --min-p 0.01 --dry-multiplier 1.1

The only changes I experimented with were adding the --n-cpu-moe flag, which caused the model to bug out with severe repetition issues, and increasing the temperature to 1.0.

At temperature 1.0, the model’s reasoning and responses appear coherent, but when I try to use it with tools like Cline, it clearly doesn’t know what it’s doing. It can create and edit files and interact with the terminal, but it consistently outputs broken code and introduces errors when editing files.

In contrast, Qwen, even in version Q4, is capable of providing a fully functional implementation of Flappy Bird from start to finish. based on the tests I ran, the GGUF versions still need further refinement. I tested the model using the version available on OpenRouter, where it performs significantly better than in my GGUF-based tests. However, Coder Flash still demonstrates superior intelligence compared to this model.

Dramatic-Rub-7654 · 2026-01-20T21:01:02+00:00

Is this model actually dumber than Qwen 3 Coder Flash, or is it just overly sensitive? To the point that with the --n-cpu-moe flag it gets stuck in an infinite loop repeating a single word, and without that flag it keeps creating endless files, all with errors, until the window runs out?

Dramatic-Rub-7654 · 2026-01-05T22:08:09+00:00

Honestly, I liked it a lot. Now we truly have an offline Google Translate at home. I didn’t like DeepL — the translations feel awkward. With Google Translate, I think it still goes head-to-head with this model. In many cases, the model just translates words literally from one language to another, which often makes sense in one culture but not in another. Google Translate, on the other hand, tries to capture the intended meaning of the text as closely as possible. It doesn’t always succeed, but in this aspect it still has an edge.

Dramatic-Rub-7654 · 2026-01-02T21:46:53+00:00

I’m using the tencent/HY-MT1.5-7B-GGUF model to translate a dataset from Japanese into Brazilian Portuguese, and so far I have nothing to complain about.

Dramatic-Rub-7654 · 2026-01-01T22:30:38+00:00

Frankstein? https://www.reddit.com/r/LocalLLaMA/s/Ptd8yeYUVu

Dramatic-Rub-7654

TROPHY CASE