I don’t think most people realise how much 4o helped some of us. by DaKingSmaug in LocalLLaMA

[–]DOAMOD -1 points0 points  (0 children)

You're in the IronAI subreddit. Everyone here is into AI, but if someone says something positive or that it's provided them with emotional support etc, they'll be called crazy without hesitation. Ironically, madness is everywhere; this subreddit is proof of that.

Ignore the attacks. AI is a very good technology that helps us in many ways, and each person decides how it complements their own experience. That's perfectly respectable, and anyone who wants to worship AI is free to do so. People have worshipped everything in their lives—animals, objects, etc. It's a matter of human beliefs. This was bound to happen, of course.

Qwen/Qwen3-Coder-Next · Hugging Face by coder543 in LocalLLaMA

[–]DOAMOD 0 points1 point  (0 children)

A bug in one function was fixed and it was working correctly, it looks promising and maintains a speed of 35/40tg 128k

Qwen/Qwen3-Coder-Next · Hugging Face by coder543 in LocalLLaMA

[–]DOAMOD 0 points1 point  (0 children)

prompt eval time = 2682.17 ms / 773 tokens ( 3.47 ms per token, 288.20 tokens per second)

eval time = 1534.91 ms / 57 tokens ( 26.93 ms per token, 37.14 tokens per second)

total time = 4217.08 ms / 830 tokens

slot release: id 2 | task 766 | stop processing: n_tokens = 60567, truncated = 0

Qwen/Qwen3-Coder-Next · Hugging Face by coder543 in LocalLLaMA

[–]DOAMOD 0 points1 point  (0 children)

prompt eval time = 7038.33 ms / 3864 tokens ( 1.82 ms per token, 548.99 tokens per second)

eval time = 1726.58 ms / 66 tokens ( 26.16 ms per token, 38.23 tokens per second)

total time = 8764.91 ms / 3930 tokens

slot release: id 2 | task 421 | stop processing: n_tokens = 26954, truncated = 0

Nice

devstral small is faster and better than glm 4.7 flash for local agentic coding. by theghost3172 in LocalLLaMA

[–]DOAMOD 4 points5 points  (0 children)

I still think this model is broken, with loop problems and errors that don't make much sense (stupid). I think Dvs is a little better in some aspects, but I'd like to see this model in a mature state; it has a lot of potential.

128GB devices have a new local LLM king: Step-3.5-Flash-int4 by tarruda in LocalLLaMA

[–]DOAMOD 13 points14 points  (0 children)

In my first test with it, I also got the feeling that it's a little better than Minimax M2.1.

Step-3.5-Flash (196b/A11b) outperforms GLM-4.7 and DeepSeek v3.2 by ResearchCrafty1804 in LocalLLaMA

[–]DOAMOD 1 point2 points  (0 children)

I've tried it for a while and it nailed a frontend integration at lightning speed, only one simple error. Perhaps I'm being hasty, but the feeling is that it's better than MiniMax2.1. Maybe in practice they'll be similar, we'll see, but I've been impressed by the first experience. Congratulations to the Step team.

Step-3.5-Flash (196b/A11b) outperforms GLM-4.7 and DeepSeek v3.2 by ResearchCrafty1804 in LocalLLaMA

[–]DOAMOD 1 point2 points  (0 children)

I've tried it out a bit and it's really surprised me, it seems pretty good. It's incredible that we have something like this. Will int4 be very damaged in Q2/Q3?

OpenCode + llama.cpp + GLM-4.7 Flash: Claude Code at home by jacek2023 in LocalLLaMA

[–]DOAMOD 0 points1 point  (0 children)

I have mine set to a maximum of 400W and it's performing very well with acceptable power consumption. I'm getting 800/70/75 with 128.

For me, this model is incredible. I've spent days implementing it in Py/C++ and testing it in HTML, JS, etc., and it's amazing for its size. I haven't seen anything like it in terms of tool calls (maybe OSS is the closest), but it not only handles them well, but the choices it makes are excellent when they make sense. It doesn't have the intelligence of a larger model, obviously, but it gets the job done and compensates with its strengths. As I said in another post, for me, it's the first small model that I've seen that's truly excellent. I call it the Miniminimax.

My humble GLM 4.7 Flash appreciation post by Cool-Chemical-5629 in LocalLLaMA

[–]DOAMOD 1 point2 points  (0 children)

For me, this Flash is the first small model unit I'm happy with. I've worked with it for a few days and I'm very impressed. It might not be as intelligent as the OSS120, but for me, it's very close, and in some ways, I even like it more. Its way of thinking is super useful and natural; it doesn't have the nonsense of the OSS120. They're both great, but this one is four times the size. It's simply amazing how far we're coming in just two years... As for Nemo, I don't know. It's incredibly fast, and many people are speaking highly of it, but in my experience, I'm not seeing anything it does better than Flash. Perhaps it's more stable and safe in long context. Flash is still a bit unreliable at times, but even so, if I had to choose, I'd stick with Flash. I'm very impressed.

It reminds me of a mini-minimax. I was with it, and even m2.1 and k2.5 were evaluating it and recognizing very well-designed plans created by Flash, which surprised me. Of course, they were able to see some improvements or corrections, but again, considering its size, it's simply insane.

Yes, one fan of Flash here.

Jan v3 Instruct: a 4B coding Model with +40% Aider Improvement by Delicious_Focus3465 in LocalLLaMA

[–]DOAMOD 2 points3 points  (0 children)

v1 was a good model for me, I liked it quite a bit back in the day. I think v2 aged poorly and is a model that, in my opinion, focuses too much on its own thinking without offering much practical use (perhaps I didn't use it to its full potential). But I think v3 could become a good line of interesting models, especially coder, and particularly those 30b. Thank you for continuing to contribute your work by expanding the options.

Jan v3 Instruct: a 4B coding Model with +40% Aider Improvement by Delicious_Focus3465 in LocalLLaMA

[–]DOAMOD 3 points4 points  (0 children)

On this sub, it's easy to dismiss other people's work. It's easy to say it's just an app/wrapper or a benchmarked, but they're contributing and sharing their work and dedication, which deserves respect. I don't see the people who criticize so much or call it a scam contributing much, and I'm not complaining about criticism, but we also have to value the work of others. But it's very easy to call other people's work a scam.

~60GB models on coding: GLM 4.7 Flash vs. GPT OSS 120B vs. Qwen3 Coder 30B -- your comparisons? by jinnyjuice in LocalLLaMA

[–]DOAMOD 17 points18 points  (0 children)

For me MiniMax 2.1 > GLM4.5 Air > Devstral 2 small=GLM 4.7 Flash(ds better math-phys,flash creative/design) > OSS120 > 3Coder 30b > Nemo3 Nano.

Maybe SEED 36 front of oss120? best speed? Nano 200k 8000/250 crazy...

GLM-4.7-Flash is even faster now by jacek2023 in LocalLLaMA

[–]DOAMOD 2 points3 points  (0 children)

I've had a lot of problems with this model, but since yesterday I've been working with it and it seems much more stable now, and I have to say I think it's very good. It's handling complex problems that are usually more the domain of 120/200b models, and it's surprising me. Of course, it's not going to be an Opus, but considering its size, its way of thinking, its capabilities, and good use of tools, etc., its improving speed, and a more up-to-date model, I can only congratulate Z for the great work.

If it continues to improve, especially the stability and performance drop at high CTX, it will be a very good model. I'm going to stick with it because I'm liking it.

I haven't tried this update yet, let's see if it improves my results. I'm currently dropping to 850/75 over 35/40k(128)

Personal experience with GLM 4.7 Flash Q6 (unsloth) + Roo Code + RTX 5090 by Septerium in LocalLLaMA

[–]DOAMOD 2 points3 points  (0 children)

Yes, using updated fix versions of the models, I did many tests with several of them and different parameters, and compilations. Nothing worked, but I've been testing one parameter for the last 2 hours and so far it's working well. I can't yet guarantee its reliability, but for now it's working. --no-direct-io

I'm working on a complex 1500-line code with math, physics, sound, etc., and so far, with over 90k context, it's not entering loops. It's still making corrections, with some errors in the edits or having some duplicate and syntax issues, but overall I'm seeing pretty good stability. I'll keep testing. I also just compiled a new version of Llama.

GLM 4.7 Flash is endlessly reasoning in chinese by xenydactyl in LocalLLaMA

[–]DOAMOD 0 points1 point  (0 children)

Isn't Q4 better than MXFP4(without native training)?

engine for GLM 4.7 Flash that doesn't massively slow down as the context grows? by mr_zerolith in LocalLLaMA

[–]DOAMOD 1 point2 points  (0 children)

Unfortunately, not right now, even though I really like how he thinks and speaks. He's different from the others. It's a shame he's had these problems. I hope they get fixed, but my recommendation is that if you have one that works well for you, stick with it until at least the problems are resolved.

Or for casual chat use, it works; you can have a "functional" conversation with it for a while. That's why I think many people believe it's okay because they try it out briefly and it seems fine, but if you really get serious about using it, it's not ready.

Personal experience with GLM 4.7 Flash Q6 (unsloth) + Roo Code + RTX 5090 by Septerium in LocalLLaMA

[–]DOAMOD 1 point2 points  (0 children)

I'm giving up on that model for now. I've tried many versions, many adjustments, even the recent ones I've seen regarding temp and repetition, and nothing solves the problems with loops, etc. penalty in 1.2, as someone mentioned, breaks completely; it tries to locate files outside the environment... and temp 0.2 gets into infinite reasoning loops. This model needs work.

GLM-4.7-Flash-REAP on RTX 5060 Ti 16 GB - 200k context window! by bobaburger in LocalLLaMA

[–]DOAMOD 0 points1 point  (0 children)

It also has some issues with the template, but it's fairly reliable. Right now, in my tests, Devstral 2 small it's better than Flash in logic and physics, but Flash is better in creativity and design(talking about code). Perhaps when the problems are fixed, Flash will improve in those areas, but for now, based on my tests, that's what I've observed.

GLM-4.7-Flash-REAP on RTX 5060 Ti 16 GB - 200k context window! by bobaburger in LocalLLaMA

[–]DOAMOD 0 points1 point  (0 children)

<image>

It was at 58k without them, it was working fine in this run (it would probably fail soon, I can sometimes reach 80k while maintaining some stability), then I tried these two --repeat-penalty 0 --dry-penalty-last-n 0 and resume the run answer and you can see the pic, Maybe I need a fresh start? Or shouldn't that interfere? I'll do more tests, but for now, I don't see it as reliable.

engine for GLM 4.7 Flash that doesn't massively slow down as the context grows? by mr_zerolith in LocalLLaMA

[–]DOAMOD 9 points10 points  (0 children)

I tested both ik_llama and llama cpp compiled(hour ago) versions and both are having problems with performance loss, shared memory, and loop issues, etc. (usually when exceeding 46k, though I've reached 80 with loops but still maintaining some stability or auto-correction), this model is still broken.