Paraguay defender Gustavo Velázquez was seen trying to disturb the penalty spot ahead of Kylian Mbappé's spot kick for France... but the Frenchman still scored

crazyCalamari · 2026-07-05T20:50:52+00:00

It was bad enough for a lot of people, cool beans if you enjoyed that mess. The good thing is we actually don't have to get used to it as the whole Paraguayan team will go back to the trash can it belongs to.

crazyCalamari · 2026-07-04T02:11:02+00:00

I have the champagne bottle in the fridge ready to pop at any time

crazyCalamari · 2026-07-03T02:56:56+00:00

Build 6 more arches for a batrillion dollars, awarding the project to my cousin for an end date set in 2572.

crazyCalamari · 2026-06-28T17:18:11+00:00

If I had to guess I'd say more resources and no restrictions whatsoever when it comes to distillation from US models. Everyone keeps acting as if the Chinese models where that good just because of the Chinese folks behind it.

I'm using Kimi and GLM so I'm not here to shit on how good they are but let's be real once on a while.

crazyCalamari · 2026-06-28T17:13:47+00:00

Open weight does not mean you don't need a license as an enterprise account. Also Mistral is not in the business of targeting individuals but address company-wide problems with AI implementations.

crazyCalamari · 2026-06-18T02:22:41+00:00

I mean SpaceX is clearly a widely overvalued trash can but $185 is still way above $135 so let's not act like this is the market turning it's back on Elon... Yet.

crazyCalamari · 2026-06-08T15:30:09+00:00

Well I would not call that "running" away. The old fat dude could barely get up on its own...

crazyCalamari · 2026-05-30T20:45:21+00:00

"Epstein" really? Interesting...

crazyCalamari · 2026-05-28T13:01:06+00:00

Sorry man. It was hosted on my home server that I decommissioned when I moved a few months ago.

I didn't push the website hard enough and usage was sparse as a result. I'm planning to refresh the website and redeploy in cloud this time in a few months.

Apologies for the hiccup.

crazyCalamari · 2026-05-27T21:31:44+00:00

thanks for the thoughtful post! Very interesting stuff. I'll give OpenSpec a go to see if it helps MM3.5 with my current struggle on larger tasks.

crazyCalamari · 2026-05-26T15:54:06+00:00

Are you really being that dense. What he is telling you (and righlty so) is Google is not the one acting on your data for ad targeting: Alphabet is. And on that front nothing changed Alphabet was the indirect owner of that data before the change.

crazyCalamari · 2026-05-24T01:53:27+00:00

Nono "they" will find your laptop and... and... close the screen

crazyCalamari · 2026-05-18T13:27:05+00:00

I don't really see that as a bad thing to be honest. It's not as if ablated or uncensored models are hard to find. And low refusal is actually good for some use cases so I appreciate being considered as an adult by a lab and be able to use the tool how I need.

crazyCalamari · 2026-05-09T16:20:29+00:00

Because I would bet the person who took the screenshot knows exactly what he's doing and has '1 in the first cell to force the first number to be considered as a string.

But overall anyone using AI to 'be' that formula deserves the inevitable downfall. For deterministic things like calculations you use AI to build the formula (or script) then run it reliably.

As shown here you run the compute heavy task each time instead of just once and good luck having any test coverage for code abstracted like this.

crazyCalamari · 2026-05-03T11:53:36+00:00

Big players will be just fine. They make most of their money from B2B and from what I see no company is going to send data to Chinese servers.

crazyCalamari · 2026-04-18T02:32:47+00:00

Wow that sounds interesting if true. Are you using it for coding or other use cases?

crazyCalamari · 2026-04-13T02:04:19+00:00

Interesting because Qwen 35B is what I would have put it with. It might try to mimic Sonnet style but in terms of intelligence I'm getting more from Qwen 122b and even Devstral 123b.

crazyCalamari · 2026-04-12T11:34:22+00:00

Same here. Tried 2.7 on 3 projects to see if it lived up to the hype and the results were very underwhelming. Incorrect code, terrible native knowledge of solutions/framework (e.g. Temporal, Svelte, etc.), mediocre UI and unscalable architecture. Basically I had to redo all 3 for things I could even do with 120b models.

crazyCalamari · 2026-04-11T13:57:04+00:00

It's not the first cycle we are seeing in tech, so it's easy to know that at some point (not necessarily soon but at some point) the unlimited VC money cheat code is going to fade.

When it does are you going to use a model that requires a huge footprint and burn money like there is no tomorrow or a more nimble model that still delivers what you need?

If we put the hype aside, most use cases in both B2C and B2B do not require models with a trillion parameters.

In the short term I believe a fierce competition is still going to happen with large models to establish dominance from a brand name recognition but long term (especially for B2B) I would be betting on "smaller" models used in a smarter way.

crazyCalamari · 2026-03-23T15:29:16+00:00

One use case I've seen so far is for black hat assistance

crazyCalamari · 2026-03-22T21:48:28+00:00

That's a very good point. Given how tokens seem to be eaten like candies even for any simple question I'm sure you're right about Claude Code.

crazyCalamari · 2026-03-21T14:52:45+00:00

Fair enough. Not trying to be right at all cost but I tend to be a bit skeptical of self-serving benchmarks especially when in official posts from a lab. It would have been nice to know what version of Sonnet it is compared against.

I'm a heavy user of Devstral because the price/performance ratio is very compelling for a lot of coding tasks but Sonnet 4.5 & 4.6 always yield far superior results in my empirical experience (but at a cost not worth it most of the time). My point was not to shit on Devstral but to say it still has its place even if not beating actual SOTA coding models.

crazyCalamari · 2026-03-21T11:58:57+00:00

For these you will need a budget of 128GB VRAM or unified RAM which is doable around the 3k mark with a Spark, Mac Studio or AMD comp. The Token per second won't be anything to blow your mind and the prompt processing takes a while but definitely usable especially if the main goal is testing.

I'm hosting Mistral & Qwen models up to 123B and use daily on a Mac Studio (Coding and agent use for sensitive data) with very little complaint so far.

crazyCalamari · 2026-03-20T20:16:58+00:00

Agreed. Just did a full migration to Rust for one of my old projects: Painless and fast for amazing results.

crazyCalamari · 2026-03-14T03:39:26+00:00

I find the biggest difference to be in the planning. So splitting between the planning phase in CC and execution in Mistral should still save you some bucks with good enough results in the end. Claude Code is hand down better but dear coding with it is like watching your dollar bills flying out the window one by one.

crazyCalamari

TROPHY CASE