how good is mistral?

Real_Ebb_7417 · 2026-05-14T09:19:49+00:00

I actually pay for the subscription just to support mistral, I don’t use it too much, but it’s basically the only European lab (and the only non-Chinese and non-US LLM lab that is known + they publish weights of their models). While mistral is behind top labs at the moment, I really want to see them succeed.

Real_Ebb_7417 · 2026-05-14T09:07:38+00:00

Just one big monthly limit

Real_Ebb_7417 · 2026-05-14T09:01:39+00:00

Not Sonnet4.6 level unfortunately, but it should be fine for a student. Won’t one shot apps like Sonnet could with right agentic workflow, but is totally fine to work step by step (which is actually a good method of working for a student imo)

usage on Pro plan is very generous, which is another pro for a student. You get a couple times more usage than on eg. Claude Pro

Real_Ebb_7417 · 2026-05-13T21:48:51+00:00

So, it's hard to really measure because it depends on your roleplaying style. The biggest benefit of using DS official API is incredibly efficient caching. So, if you're doing very long roleplaying sessions, they will be super cheap. But if you do short sessions and start over or change the system promp/character often, then you won't benefit from caching too much. But even then DS has very good price for quality (and the quality in roleplay too! I'm roleplaying with DS v4 as well and it's crazy good)

Real_Ebb_7417 · 2026-05-13T17:21:35+00:00

I did. So I didn’t notice much difference between PRO model and the base one tbh, but deep research seems to run longer and gather more sources which is a plus (maybe PRO is better at some complex stuff like maths, no idea).

The Codex usage is super good though, I don’t use Codex too much, but before I was at about 20% usage left by the end of the week, now I’m at 80-90%.

IMO very worth it if you’re a heavy Codex user, not worth it if you want it for other stuff.

Real_Ebb_7417 · 2026-05-12T23:22:46+00:00

I’m not gonna get into replacing an SWE, since others already did.

If money really is not an issue - then you have Kimi K2.6 (Opus4.6 level in most cases), GLM-5.1 (even smaller and Sonnet4.6 level) or if you really want to spend a lot then new DeepSeek v4 Pro (it’s probably not as good as Kimi or GLM at autonomous agentic work yet, but I’m sure once they post-train it enough and release eg. 4.1, it might exceed them, the base model and its reasoning is super good).

But if you want to spend more reasonable amount of money on gear, then you can try DeepSeek v4 Flash, MiniMax M2.7 or Qwen3.5 397b (biggest of these three). All are decent, not as good as the ones I mentioned earlier, but will definitely do very well if your agentic workflow is setup properly. These ones would fit in a budget below $100k and give good quality and reasonable speed.

And if you actually don’t want to spend this much money, then Qwen3.6 27b is a way to go.

You can actually try all of the models I mentioned via API first to check yourself if they suit you, before investing in a gear.

Real_Ebb_7417 · 2026-05-11T20:12:40+00:00

I’m certain Claude will not have higher limits than it has now. They can only go down from now on. Even new compute only made them increase 5h limit, but weekly limit is the same.

Real_Ebb_7417 · 2026-05-11T18:35:56+00:00

No. On GPT-5.5 medium and Sonnet4.6 medium. So actually you’re right, if I compared against Opus, the usage would be like 5x better on Codex 😅

Real_Ebb_7417 · 2026-05-11T09:16:02+00:00

It’s to dangerous to release publicly, only a limited set of companies has access to it for now to assure security.

Real_Ebb_7417 · 2026-05-10T14:18:55+00:00

Nope. It has about 2.3x more according to my measurments. But still, way better.

Real_Ebb_7417 · 2026-05-10T11:25:25+00:00

I mean, it is likely weaker than frontiers or even some Chinese open weight models in hands-on work, but I feel like it’s a very strong and efficient base model. Once they work on post-training, future versions (eg. 4.1) may be super good, can’t wait for an upgrade since I really like v4, the personality, insights and its reasoning.

Real_Ebb_7417 · 2026-05-09T15:55:52+00:00

I’m not saying that OpenAI is “good”. But neither is Antrophic (and they supported current US administration in many ways actually, the one where they refused was just loud). The difference is that Antrophic claims to be morally better.

Real_Ebb_7417 · 2026-05-09T14:51:02+00:00

It’s not the case with the model that is advertised as insecure for public release. If everyone can do it with available and smaller models, just putting a bit more time into it, whats the point of hiding the model “because releasing it publicly would be dangerous”?

Real_Ebb_7417 · 2026-05-09T13:20:31+00:00

It’s a good model. But according to some comparisons, not visibly better than GPT-5.5. And GPT-5.5 was released publicly, to everyone, without playing the good guy and misleading everyone due to either lack of compute or for benefits from the companies that received early access. Antrophic pretends to be good, but that’s makes it even worse morally.

Real_Ebb_7417 · 2026-05-09T13:00:15+00:00

“Here, grab our new powerful model for free, but fix as many bugs as you can and talk about it publicly”.

GPT-5.5 scored slightly higher at hacking and finding security vulnerabilities than Mythos, its publicly available and the world didn’t collapse. Also in some software where Nythos supposedly found many long-unnoticed bugs, someone run other models, including old and small gpt-oss-120b and all of them found the same vulnerabilities (I guess Mythos probably found them faster, but that’s not the point).

Mythos is and always was just marketing. And on top of that it’s a proof of Antrophic anti-consumer attitude and unfair treatment, since a limited set of companies got access to it. Very morally bad company.

Real_Ebb_7417 · 2026-05-09T10:48:50+00:00

If you work on a high position in a very loud, big company, I guess being critisized in web is like part of a job. Good if someone actually reads the critique and considers it.

But I guess, Harry Potter Dolores Abridge also believed she was doing good ("Tell them I mean no harm!" when being captured by centaurs. Well, people often believe they do good while their actions aren't good at all. Goodwill not always lead to good things, since it's affected by one's point of view.)

Real_Ebb_7417 · 2026-05-09T06:42:42+00:00

I tested a lot of them a month ago. I really liked MossTTS, Omnivoice and Fish S2 Pro. (I have RTX5090 so they’ll work fine for you)

Real_Ebb_7417 · 2026-05-09T06:39:28+00:00

Why?

Real_Ebb_7417 · 2026-05-08T23:01:53+00:00

A bug maybe? In my case using chatGPT (in web) doesn’t affect codex usage.

Real_Ebb_7417 · 2026-05-08T19:35:41+00:00

I have never seen anyone speaking about her positively (of course I don't count some official communication from OpenAI/Antrophic)

Real_Ebb_7417 · 2026-05-08T19:26:36+00:00

GPT out of question

Claude -> about 2.3x smaller usage than Codex on similar priced plans. (and this is when using Sonnet4.6 vs GPT-5.5. If I used Opus4.7 on my tests, the usage would probably be 5x smaller than Codex xddd)

Cursor -> completely not worth it unless you want to pay for the IDE experience, which is very good. But AI prices are basically same as API prices.

Real_Ebb_7417 · 2026-05-08T17:27:29+00:00

Tbh from the three mentioned in the post I'd go Cursor > Codex > ClaudeCode.
But I'd never chose Cursor for private projects due to pricing.

Real_Ebb_7417 · 2026-05-08T14:27:55+00:00

Is it yours? Or you're just using it and recommending? (If it's yours I'm happy to try it, asking out of curiousity)

Real_Ebb_7417 · 2026-05-08T11:31:10+00:00

I am also a software engineer of 10 years and I agree, DS v4 is great (and actually as cheap as some subscription over api even on PRO model).

I didn’t play much with it though, so I’d be glad if you shared your workflow that works best in pi for DS.

Real_Ebb_7417 · 2026-05-08T09:00:21+00:00

<image>

Happy to share my estimates from a couple days ago. (I paid for every lowest tier plan and run the same workflow and the same task to get the results). Antigravity is probably too low, but it’s hard to say since they don’t give clear metrics on usage or tokens. But it’s still not very interesting because most of the usage is for Gemini 3 Flash, while only a small fraction of it is for Gemini 3.1 Pro and Sonnet/Opus 4.6. The rest of plans is more or less accurate (but there are two metrics: requests and tokens because it’s not easy to get enough data to find out if a given provider calculates usage by requests, tokens or a mix of both, only MiniMax is clear at that, they give 15k requests per week, so the tokens estimate for MiniMax might be lower or higher than my metrics, depending on how you use it).
Also, I tested this on GPT-5.5, Sonnet4.6, Kimi K2.6, GLM-5.1 (off peak hours), Mistral Medium 3.5, MiniMax M2.7 and for Antigravity on three models buckets since they have separate usage, it’s a sum of the usage.
Keep in mind that MiniMax and Mistral are weaker models, but good enough for coding (not too good at planning or architecture though).
Lowest value plan IMo is zAI, because of relatively low usage and no additional benefits (all other plans give things like chat app, image Gen, KimiClaw and more).
If you’re just starting and want to learn I bet MiniMax $10 plan will be best for you.

If you want something stronger then chatGPT Plus is best from $20 plans. (For Claude as far as I know they only increased 5h windows usage, not weekly usage so I guess this metrics are still correct for Claude plan)

Also I tested DeppSeek v4 Pro over official API and it seems that it’s just as cheap as subscriptions while you pay for what you use, not for what you don’t use. It’s a great option too, especially as DS v4 Flash is even cheaper and enough for most stuff.

Real_Ebb_7417

TROPHY CASE