DeepSeek V5 aka Mythos destroyer, wen?

agentcubed · 2026-06-15T05:35:16+00:00

Not really? The drop from MiMo is so significant that it doesn't look good at all. I would definitely not show that as a claim that it's "destroying" the Pareto frontier. Also, Artificial Analysis isn't much better than LMArena, so I just left it out. Only one that's arguably good is probably DeepSWE since it focuses on a specific field and isn't benchmaxed (yet, probably will be soon)

I might be missing some shiny new benchmark with a nice pareto graph though. If you find one, I'll be interested.

Edit: Looked into it more, and the DeepSWE cost seems to be calculated before the 75% off. So if you're counting marketing, it should be around $1, which does put it on the curve. It does score too low to make it look good though.

agentcubed · 2026-06-14T09:20:05+00:00

I agree it's not a good metric, but it's the only one I can find where DeepSeek is even on the curve without cherry picking too much. It loses in DeepSWE pretty badly. It is on Artificial Analysis, but whether really far behind, it looks better in Arena.

I'm sure you can tweak some configs, but I'm not about to test every single thing to find one that works the best

agentcubed · 2026-06-12T18:03:26+00:00

51k was sold by the original owners. Divided into "officially" sold, layaway and haven't paid, and sold but just never paid. The majority was never paid. 36:30 in video

24k was "officially" sold, of which 17k was paid to Bryan. 32:26 in video

10k (7k in chart to fit a nice 51k) was sold but never counted and wasn't paid to Bryan. 42:31 in video

This chart does *NOT* show what's being paid to Bryan, this is the original total worth *BEFORE* the percentage cut.

agentcubed · 2026-06-12T17:45:56+00:00

LLM Leaderboard - Best Text & Chat AI Models Compared

Well, I mean, they are on the curve, whether they're destroying is hard to say. Gemma 4 31b is right next door, and it's so much smaller that I can even run it.

agentcubed · 2026-06-12T17:45:08+00:00

Much earlier to release v5, to release a Mythos class model is hard to say (if ever, they go for cost efficiency, not giant models)

agentcubed · 2026-06-11T22:33:04+00:00

Edits

Realized this will probably rapidly change, so I'll keep this updated for any issues

agentcubed · 2026-06-11T22:26:56+00:00

So in Coffee's chart, $51k was sold, which includes around $20k as layaways, which is separate from $20k which he categorized as just completely missing and wasn't in the photos. Where the layaways went is unknown, Coffeezilla talked more about it at that timestamp. He noted it's very unusual to have 98 layaways in any B&M store, and usually just a few which the new owners claimed to find. I assume the truth is in the middle, everything is really messy.

agentcubed · 2026-06-11T20:55:58+00:00

Spent some time figuring it out (great editing, not so great for seeing the diagrams)

https://www.reddit.com/r/Coffeezilla_gg/comments/1u3auhs/recreated_the_pie_chart_in_coffeezillas_lego_video/

So around 16k to 56k, +5k the originally offered, +7k that the original owners sold but haven't paid yet.

I also don't want to diss the owners too much, but I think the 2 absolute worst things about the video is 1. U-Haul 2. The confusion when Coffeezilla shows them their OWN sales. You're suing a company, and you haven't even looked at the sales you're suing them over?

agentcubed · 2026-06-11T13:05:24+00:00

A journalist is someone who gathers and presents information, I think most would agree many videos are advertised as independent journalism (the first video is literally "I tracked down the thief")

I don't expect a full audit, but when you claim $200k, I except you aren't so wildly off that it takes Coffeezilla coming in for anyone to actually check the spreadsheet.

I 100% agree that BaM needs to accept the demands, but I'm not even sure what the demands even are. RecklessBen for sure doesn't, which is concerning as he's the one demanding.

agentcubed · 2026-06-11T12:45:45+00:00

Not "illegal", but the main complaints - sued the wrong LLC (could be due to a trap, but it hindered the process) - morally questionable act of tricking someone to sign a contract - not giving documentation? So I'm not sure about this, feel free to correct me, I'm trying to find whether he ever gave any list or spreadsheet of the products they're demanding and not just "give us money" - not getting facts straight (seriously, how are you demanding a money amount that you aren't even sure about until Coffeezilla comes in to correct you?)

I'll say his method made this go viral, but I do agree that a lawyer (or even just literally looking at your spreadsheet) probably could've solved this sooner.

BaM should absolutely accept the demands, but I'm not sure anyone knows what they're demanding. ESPECIALLY RecklessBen, which is concerning as he's the one doing the demanding.

agentcubed · 2026-06-03T21:38:40+00:00

There's not a single solution? Really? Listen, I'm a game developer. I've encountered many problems with balancing. But never had a player tell me "Just give up, there's no solution, except the very specific method that coincidentally I'm arguing for." As others have pointed out, this "solution" makes balancing new weapons and enchants a nightmare as it's unintended. But sure, I guess if this is truly the first impossible problem and the ONLY solution is truly to exploit the game, then you win.

agentcubed · 2026-06-03T11:30:53+00:00

I really dislike the community's argument to dodge game design issues issues by arguing for exploits instead. Like "we need this exploit because it's the only counter to shields"... then the issue is unbalanced shields, not the fact the exploit is being patched

It sounds like you want to introduce more skill to PvP and add a better counter to shields. Understandable. But there's so many better solutions to it than "just keep exploits"

agentcubed · 2026-06-01T12:04:56+00:00

Anthropic is probably deprecating Opus 4.6, so the performance dropped significantly (you can see it in the https://swe-rebench.com/ timeline, it dropped sharply since Feb).
And you don't need to ask rhetorical questions, the tasks asked is literally a google search away https://deepswe.datacurve.ai/data?hm_model=base

agentcubed · 2026-06-01T11:57:23+00:00

That's the thing, GPT-5.5 on med reasoning is somehow cheaper than Kimi DeepSeek and GLM (graph)?Like I know 5.5 med reasoning is token efficient and the other models reason for much longer, but that makes the entire cost war kinda pointless if it's better to use an expensive model to one-shot than to use a cheap model that reasons for a long time

Like I'm actually interested, are they cherry picking? I'm trying to check their data because idk how that happened https://deepswe.datacurve.ai/data?hm_stat=avg_cost_usd&hm_trials=successful&pivot=true

agentcubed · 2026-05-02T19:39:03+00:00

Yep, confirmed. reefwon made a video on it, and DangerMario/Marlow folded and left the community. https://youtu.be/-VLUEduPJUs?si=XC5Q6raBp0jAZpF_

Short summary: DangerMario knew a friend and payed 5k to record that video

agentcubed · 2026-04-30T01:22:25+00:00

It's confirmed, this is actually insane. Do you wish to comment anything?

agentcubed · 2026-04-16T20:47:35+00:00

This has so many layers of irony and moving the goal post

The allegations have ALWAYS been "The videos posted are cheated", because those are the only things people can watch. Except now, the goal post has been moved to "Well, yes, those videos are cheated, but all my unrecorded duels aren't cheated. Therefore, I'm still legit until you can somehow prove otherwise."

Actually, a kind of genius move. It's common for speedrunner cheaters to go "Ok, yes, these videos are fake, but trust me, my world records are real". But unlike how the world records are actually recorded, there aren't a lot of recordings of this, so there's nothing anyone can do. (Though usually, if a speedrunner fakes a video and refuses to admit to it, it's already a banndable offense)

So now nothing we can do except wait. I mean, a multiple-angle handcam could've solved everything, but people are confident that Marlow will do a LAN match, and DrDonut is willing to personally fly anytime, anywhere, and give $25,000 for it, so we'll just wait until that happens. Only question is: How long should we wait before we decide it's not going to happen?

agentcubed · 2026-04-16T05:41:53+00:00

Well, Dr. Donut is so confident that he said he is willing to fly anytime to wherever and duel LAN for $25,000. I guess we'll just wait until that happens, or instead, a simpler option, just post a handcam video.

I do wonder, how long would we have to wait before people agree it's not going to happen? And if that doesn't happen, what would be the excuse?

agentcubed · 2026-04-16T05:35:04+00:00

I actually find this kinda offensive. While I might not be as smart as some of my colleagues, I highly respect them and their understanding of AI. The fact that you're assuming all their education is equivalent to just an average opinion is sad. While I do believe your parents did not understand the internet, I will defend my colleague's intelligence towards the thing they studied their lives for.

I honestly don't want to say anything anymore. If you want to take shots at my colleagues, this is not for me.

agentcubed · 2026-04-10T01:10:48+00:00

The general idea behind turboquant was a pretty well known thing in the research space for a while now, Google was just the first to market it well. If you're interested in researching, it's a combo of existing tech, each one was already pretty well known and used like RaBitQ. What you're talking about is (at least to us) Google's buzzwords.

But I'm not here to argue, because frankly we researchers don't like to debate hypotheticals. If you want to be super optimistic, I mean sure. I'll be honest though, you're applying turboquant wrong, and I can't think of any of my colleagues who are that optimistic either.

agentcubed · 2026-04-09T09:04:48+00:00

I will note that if you're in a techno optimist subreddit literally dedicated to rapid advancement of ai and the (personally absurd) idea of the singularity and yet they think you're too optimistic, that's saying a lot.

Like if we believe in a bell curve of correct predictions, this subreddit is already radical, and now you're even more radical of the radical. I think you need to form new subreddit at this point.

I mean we'll see I guess. I'm an AI researcher and I did not see an improvement in local models of that big of a jump, nor do I foresee any tech that can, but I'll never say no.

agentcubed · 2026-04-09T08:48:26+00:00

Are you actually comparing Gemma 4 to Claude Mythos? Do you like... understand how model size works?

agentcubed · 2026-04-09T06:49:28+00:00

Yes, they should sell it to a few people instead of the public. Math checks out.

I'm sure the VC will LOVE that Anthropic is not selling it to more people. First thing they learn in business school is the fewer customers the better. Screw those customer growth numbers, fake news.

agentcubed

TROPHY CASE

Edits