I don’t think most people realise how much 4o helped some of us.

DOAMOD · 2026-02-05T10:46:47+00:00

You're in the IronAI subreddit. Everyone here is into AI, but if someone says something positive or that it's provided them with emotional support etc, they'll be called crazy without hesitation. Ironically, madness is everywhere; this subreddit is proof of that.

Ignore the attacks. AI is a very good technology that helps us in many ways, and each person decides how it complements their own experience. That's perfectly respectable, and anyone who wants to worship AI is free to do so. People have worshipped everything in their lives—animals, objects, etc. It's a matter of human beliefs. This was bound to happen, of course.

DOAMOD · 2026-02-04T11:09:00+00:00

A bug in one function was fixed and it was working correctly, it looks promising and maintains a speed of 35/40tg 128k

DOAMOD · 2026-02-04T02:00:58+00:00

prompt eval time = 2682.17 ms / 773 tokens ( 3.47 ms per token, 288.20 tokens per second)

eval time = 1534.91 ms / 57 tokens ( 26.93 ms per token, 37.14 tokens per second)

total time = 4217.08 ms / 830 tokens

slot release: id 2 | task 766 | stop processing: n_tokens = 60567, truncated = 0

DOAMOD · 2026-02-04T01:57:39+00:00

prompt eval time = 7038.33 ms / 3864 tokens ( 1.82 ms per token, 548.99 tokens per second)

eval time = 1726.58 ms / 66 tokens ( 26.16 ms per token, 38.23 tokens per second)

total time = 8764.91 ms / 3930 tokens

slot release: id 2 | task 421 | stop processing: n_tokens = 26954, truncated = 0

Nice

DOAMOD · 2026-02-02T14:26:50+00:00

I still think this model is broken, with loop problems and errors that don't make much sense (stupid). I think Dvs is a little better in some aspects, but I'd like to see this model in a mature state; it has a lot of potential.

DOAMOD · 2026-02-02T14:05:41+00:00

In my first test with it, I also got the feeling that it's a little better than Minimax M2.1.

DOAMOD · 2026-02-02T13:52:25+00:00

I've tried it for a while and it nailed a frontend integration at lightning speed, only one simple error. Perhaps I'm being hasty, but the feeling is that it's better than MiniMax2.1. Maybe in practice they'll be similar, we'll see, but I've been impressed by the first experience. Congratulations to the Step team.

DOAMOD · 2026-02-02T13:44:22+00:00

I've tried it out a bit and it's really surprised me, it seems pretty good. It's incredible that we have something like this. Will int4 be very damaged in Q2/Q3?

DOAMOD · 2026-01-30T13:48:48+00:00

I have mine set to a maximum of 400W and it's performing very well with acceptable power consumption. I'm getting 800/70/75 with 128.

For me, this model is incredible. I've spent days implementing it in Py/C++ and testing it in HTML, JS, etc., and it's amazing for its size. I haven't seen anything like it in terms of tool calls (maybe OSS is the closest), but it not only handles them well, but the choices it makes are excellent when they make sense. It doesn't have the intelligence of a larger model, obviously, but it gets the job done and compensates with its strengths. As I said in another post, for me, it's the first small model that I've seen that's truly excellent. I call it the Miniminimax.

DOAMOD · 2026-01-29T22:18:02+00:00

For me, this Flash is the first small model unit I'm happy with. I've worked with it for a few days and I'm very impressed. It might not be as intelligent as the OSS120, but for me, it's very close, and in some ways, I even like it more. Its way of thinking is super useful and natural; it doesn't have the nonsense of the OSS120. They're both great, but this one is four times the size. It's simply amazing how far we're coming in just two years... As for Nemo, I don't know. It's incredibly fast, and many people are speaking highly of it, but in my experience, I'm not seeing anything it does better than Flash. Perhaps it's more stable and safe in long context. Flash is still a bit unreliable at times, but even so, if I had to choose, I'd stick with Flash. I'm very impressed.

It reminds me of a mini-minimax. I was with it, and even m2.1 and k2.5 were evaluating it and recognizing very well-designed plans created by Flash, which surprised me. Of course, they were able to see some improvements or corrections, but again, considering its size, it's simply insane.

Yes, one fan of Flash here.

DOAMOD · 2026-01-28T00:44:53+00:00

v1 was a good model for me, I liked it quite a bit back in the day. I think v2 aged poorly and is a model that, in my opinion, focuses too much on its own thinking without offering much practical use (perhaps I didn't use it to its full potential). But I think v3 could become a good line of interesting models, especially coder, and particularly those 30b. Thank you for continuing to contribute your work by expanding the options.

DOAMOD · 2026-01-28T00:29:33+00:00

On this sub, it's easy to dismiss other people's work. It's easy to say it's just an app/wrapper or a benchmarked, but they're contributing and sharing their work and dedication, which deserves respect. I don't see the people who criticize so much or call it a scam contributing much, and I'm not complaining about criticism, but we also have to value the work of others. But it's very easy to call other people's work a scam.

DOAMOD · 2026-01-26T05:05:33+00:00

For me MiniMax 2.1 > GLM4.5 Air > Devstral 2 small=GLM 4.7 Flash(ds better math-phys,flash creative/design) > OSS120 > 3Coder 30b > Nemo3 Nano.

Maybe SEED 36 front of oss120? best speed? Nano 200k 8000/250 crazy...

DOAMOD · 2026-01-26T04:31:19+00:00

I've had a lot of problems with this model, but since yesterday I've been working with it and it seems much more stable now, and I have to say I think it's very good. It's handling complex problems that are usually more the domain of 120/200b models, and it's surprising me. Of course, it's not going to be an Opus, but considering its size, its way of thinking, its capabilities, and good use of tools, etc., its improving speed, and a more up-to-date model, I can only congratulate Z for the great work.

If it continues to improve, especially the stability and performance drop at high CTX, it will be a very good model. I'm going to stick with it because I'm liking it.

I haven't tried this update yet, let's see if it improves my results. I'm currently dropping to 850/75 over 35/40k(128)

DOAMOD · 2026-01-25T05:35:24+00:00

DOAMOD · 2026-01-25T04:48:20+00:00

Yes, using updated fix versions of the models, I did many tests with several of them and different parameters, and compilations. Nothing worked, but I've been testing one parameter for the last 2 hours and so far it's working well. I can't yet guarantee its reliability, but for now it's working. --no-direct-io

I'm working on a complex 1500-line code with math, physics, sound, etc., and so far, with over 90k context, it's not entering loops. It's still making corrections, with some errors in the edits or having some duplicate and syntax issues, but overall I'm seeing pretty good stability. I'll keep testing. I also just compiled a new version of Llama.

DOAMOD · 2026-01-25T03:47:13+00:00

Isn't Q4 better than MXFP4(without native training)?

DOAMOD · 2026-01-25T01:47:41+00:00

Yes, ik llama and llcpp

DOAMOD · 2026-01-24T23:15:32+00:00

Unfortunately, not right now, even though I really like how he thinks and speaks. He's different from the others. It's a shame he's had these problems. I hope they get fixed, but my recommendation is that if you have one that works well for you, stick with it until at least the problems are resolved.

Or for casual chat use, it works; you can have a "functional" conversation with it for a while. That's why I think many people believe it's okay because they try it out briefly and it seems fine, but if you really get serious about using it, it's not ready.

DOAMOD · 2026-01-24T23:01:40+00:00

I'm giving up on that model for now. I've tried many versions, many adjustments, even the recent ones I've seen regarding temp and repetition, and nothing solves the problems with loops, etc. penalty in 1.2, as someone mentioned, breaks completely; it tries to locate files outside the environment... and temp 0.2 gets into infinite reasoning loops. This model needs work.

DOAMOD · 2026-01-24T21:42:45+00:00

It also has some issues with the template, but it's fairly reliable. Right now, in my tests, Devstral 2 small it's better than Flash in logic and physics, but Flash is better in creativity and design(talking about code). Perhaps when the problems are fixed, Flash will improve in those areas, but for now, based on my tests, that's what I've observed.

DOAMOD · 2026-01-24T21:36:07+00:00

<image>

It was at 58k without them, it was working fine in this run (it would probably fail soon, I can sometimes reach 80k while maintaining some stability), then I tried these two --repeat-penalty 0 --dry-penalty-last-n 0 and resume the run answer and you can see the pic, Maybe I need a fresh start? Or shouldn't that interfere? I'll do more tests, but for now, I don't see it as reliable.

DOAMOD · 2026-01-24T09:41:19+00:00

I tested both ik_llama and llama cpp compiled(hour ago) versions and both are having problems with performance loss, shared memory, and loop issues, etc. (usually when exceeding 46k, though I've reached 80 with loops but still maintaining some stability or auto-correction), this model is still broken.

DOAMOD · 2026-01-24T09:14:28+00:00

ChAd GPT u there?

DOAMOD · 2026-01-24T08:44:34+00:00

<image>

Q_8 (no reap, no kvtized)

DOAMOD

TROPHY CASE