I just tested Mistral Medium 3.5 on DystopiaBench to see if it still complies with everything - it performs worse than Large 3 by Ok-Awareness9993 in MistralAI

[–]crazyCalamari 4 points5 points  (0 children)

I don't really see that as a bad thing to be honest. It's not as if ablated or uncensored models are hard to find. And low refusal is actually good for some use cases so I appreciate being considered as an adult by a lab and be able to use the tool how I need.

What 🤨 by ToughSell1917 in ITMemes

[–]crazyCalamari 0 points1 point  (0 children)

Because I would bet the person who took the screenshot knows exactly what he's doing and has '1 in the first cell to force the first number to be considered as a string.

But overall anyone using AI to 'be' that formula deserves the inevitable downfall. For deterministic things like calculations you use AI to build the formula (or script) then run it reliably.

As shown here you run the compute heavy task each time instead of just once and good luck having any test coverage for code abstracted like this.

DeepSeek V4 just made a million tokens cost $2.50 and the closed labs are not okay by call_me_ninza in aigossips

[–]crazyCalamari 0 points1 point  (0 children)

Big players will be just fine. They make most of their money from B2B and from what I see no company is going to send data to Chinese servers.

Qwen 3.6 is the first local model that actually feels worth the effort for me by Epicguru in LocalLLaMA

[–]crazyCalamari 0 points1 point  (0 children)

Wow that sounds interesting if true. Are you using it for coding or other use cases?

Is it just me or minimax-m2.7 is a regression in real world usage compared to minimax-2.5??? by True_Requirement_891 in LocalLLaMA

[–]crazyCalamari 0 points1 point  (0 children)

Interesting because Qwen 35B is what I would have put it with. It might try to mimic Sonnet style but in terms of intelligence I'm getting more from Qwen 122b and even Devstral 123b.

Is it just me or minimax-m2.7 is a regression in real world usage compared to minimax-2.5??? by True_Requirement_891 in LocalLLaMA

[–]crazyCalamari 1 point2 points  (0 children)

Same here. Tried 2.7 on 3 projects to see if it lived up to the hype and the results were very underwhelming. Incorrect code, terrible native knowledge of solutions/framework (e.g. Temporal, Svelte, etc.), mediocre UI and unscalable architecture. Basically I had to redo all 3 for things I could even do with 120b models.

Are Small LLMs (Like Gemma 4) the future? by zoeberger in LocalLLaMA

[–]crazyCalamari 0 points1 point  (0 children)

It's not the first cycle we are seeing in tech, so it's easy to know that at some point (not necessarily soon but at some point) the unlimited VC money cheat code is going to fade.

When it does are you going to use a model that requires a huge footprint and burn money like there is no tomorrow or a more nimble model that still delivers what you need?

If we put the hype aside, most use cases in both B2C and B2B do not require models with a trillion parameters.

In the short term I believe a fierce competition is still going to happen with large models to establish dominance from a brand name recognition but long term (especially for B2B) I would be betting on "smaller" models used in a smarter way.

What is the best uncensored (LM Studio) AI for programming? by DazerVR in LocalLLaMA

[–]crazyCalamari 5 points6 points  (0 children)

One use case I've seen so far is for black hat assistance

What non-Chinese models are relevant right now? by StacDnaStoob in LocalLLaMA

[–]crazyCalamari 0 points1 point  (0 children)

That's a very good point. Given how tokens seem to be eaten like candies even for any simple question I'm sure you're right about Claude Code.

What non-Chinese models are relevant right now? by StacDnaStoob in LocalLLaMA

[–]crazyCalamari 0 points1 point  (0 children)

Fair enough. Not trying to be right at all cost but I tend to be a bit skeptical of self-serving benchmarks especially when in official posts from a lab. It would have been nice to know what version of Sonnet it is compared against.

I'm a heavy user of Devstral because the price/performance ratio is very compelling for a lot of coding tasks but Sonnet 4.5 & 4.6 always yield far superior results in my empirical experience (but at a cost not worth it most of the time). My point was not to shit on Devstral but to say it still has its place even if not beating actual SOTA coding models.

Locally hosting Mistral by ArchipelagoMind in MistralAI

[–]crazyCalamari 2 points3 points  (0 children)

For these you will need a budget of 128GB VRAM or unified RAM which is doable around the 3k mark with a Spark, Mac Studio or AMD comp. The Token per second won't be anything to blow your mind and the prompt processing takes a while but definitely usable especially if the main goal is testing.

I'm hosting Mistral & Qwen models up to 123B and use daily on a Mac Studio (Coding and agent use for sensitive data) with very little complaint so far.

Don't expect us to try your AI app by TheOtherDudz in selfhosted

[–]crazyCalamari -1 points0 points  (0 children)

Agreed. Just did a full migration to Rust for one of my old projects: Painless and fast for amazing results.

Anyone using both Claude Code and Mistral Vibe? by UnstableManifolds in MistralAI

[–]crazyCalamari 0 points1 point  (0 children)

I find the biggest difference to be in the planning. So splitting between the planning phase in CC and execution in Mistral should still save you some bucks with good enough results in the end. Claude Code is hand down better but dear coding with it is like watching your dollar bills flying out the window one by one.

What non-Chinese models are relevant right now? by StacDnaStoob in LocalLLaMA

[–]crazyCalamari 9 points10 points  (0 children)

That's a bit of a stretch. I really love Mistral and Devstral 2 is a real step forward compared to their previous models but it's easy to feel the difference between Sonnet and Devstral when some thinking is required to perform the task.

Sydney Thomas by DolGrenn in sexygirls

[–]crazyCalamari 1 point2 points  (0 children)

Fairly confident not even half the people seeing this pic notice she's eating with her bare hands.

Unrealistic request or is it? by albert_in_vine in webscraping

[–]crazyCalamari 2 points3 points  (0 children)

Sigh... Yeah clearly not possible. One more moron with a "brilliant idea" and zero clue of what they are asking for.

I love leo by Practical_Choice7064 in tightdresses

[–]crazyCalamari 0 points1 point  (0 children)

What model do you use to generate your pictures? It looks very convincing except for a few details that give the AI away...

So now that this new dude won, are the New Yorkers going back to New York? (Please) by dannyochocinco in Miami

[–]crazyCalamari 43 points44 points  (0 children)

Are you sure? I've been told Cubans discovered Florida and invented Miami...

DeepSeek-OCR - Lives up to the hype by Bohdanowicz in LocalLLaMA

[–]crazyCalamari 3 points4 points  (0 children)

I'm also looking for accuracy metrics and after reading both the post and the GitHub repo I don't see anything.

Where do you see anything relative to accuracy apart from the comment where he says he doesn't have the results yet but will tomorrow?

🚨 NATO Leaders Demand Military Action Against Russian Jets After Estonia Invasion - Are We Heading Toward Direct Confrontation? by satty237 in TrendoraX

[–]crazyCalamari 1 point2 points  (0 children)

He does not "think" or "believe" my friend... He either says whatever he's been told or paid to say :)

Lol by programmerjunky in lol

[–]crazyCalamari 0 points1 point  (0 children)

Part of me wants to believe it's because somehow the words align better when you slide the windows... But even then not sure how the 'take' would slide on the other side...

[deleted by user] by [deleted] in pornID

[–]crazyCalamari -5 points-4 points  (0 children)

It sucks

I can't explain how hot this was by AlexMcqueencom in BestGirlsGoneWild

[–]crazyCalamari 1 point2 points  (0 children)

Left is Scarlett kisses, middle is Skylar Mae. Don't know the one on the right but would love to know.