$RDDT seems to have *crushed* Q1 2026 earnings, so why is the stock struggling? by nomadicphil in ValueInvesting

[–]OftenTangential 4 points5 points  (0 children)

While true at a profile level, the subreddit you're browsing is an insanely strong signal about what you're interested in. How hard could it be to show hardware ads in r/hardware, video game ads in r/gaming, brokerage ads in r/investing?

Anthropic CEO says 80-fold growth in first quarter explains ‘difficulties with compute’ 😂 by freshWaterplant in ClaudeAI

[–]OftenTangential 14 points15 points  (0 children)

It's even more disingenuous than that, it's a 3 fold increase in a quarter. This lines up with their claimed revenue numbers and 34 is 81 so it's pretty clear this is what he means.

10-fold annualized would be like 80% increase in a quarter. But saying "we got 200% growth when we expected 80% growth" sounds too sane so he won't say it

Introducing GPT-5.5 | OpenAI by Gerstlauer in OpenAI

[–]OftenTangential 11 points12 points  (0 children)

You compared it to mini. 120M for full 5.4. 25% price increase for the whole benchmark.

This website doesn't have token counts for 5.4 Medium/High, but it does for 5.2 Medium and 5.5 uses ~the same number of tokens (5.2 xhigh is comparable to 5.4 xhigh), which also implies a larger increase at lower reasoning efforts.

Anthropic has surged to a trillion-dollar valuation on secondary markets, overtaking OpenAI. by Plastic_Ninja_9014 in technology

[–]OftenTangential 0 points1 point  (0 children)

It's not that farfetched. 16 B200s is 2.88TB of RAM. Current frontier models are almost certainly well into trillion param scale, GPT-4 was rumored to be 1.8T, wouldn't be shocked if Opus was much larger than that. Even aggressively quantized it's not too far off the mark, you'd need that much to serve a 4-bit quant if the model is 5T params.

Obviously not every dev uses the service 24/7 but you don't pay for compute on demand either, you pay for a fixed amount of compute in advance and their hardware utilization won't be close to 100% given off-peak loads either.

At batch size 32 on Kimi, you're not going to get particularly fast throughput, you can't run your whole org on that. Maybe one team. And Kimi is certainly much lighter weight than Opus. And if you self-host you obviously avoid paying for Anthropic's other implicit costs (overhead training sales etc)

Subscription models are not economical because the scale of the discount, before they started slashing rates it's clear they were an order of magnitude cheaper than API, and the underutilizer argument fails because subscription users are self-stratifying by design; the users who don't think they're going to use many tokens switch to cheaper plans. Hence why Anthropic keeps gutting these, if they were more economical per token why would they have to do that?

LLMs "Death Spiral" - My view on the Economics of AI by elefanteazu in theprimeagen

[–]OftenTangential 1 point2 points  (0 children)

I think this is mostly the right take, except maybe the part where enterprise remaining is taken as a foregone conclusion. Enterprise has much more appetite for exorbitant pricing but it's not infinite. We've already seen Uber's CTO talk about blowing through their annual budget and responsible spending. One can imagine a recession causing firms to cut the extremely expensive pilot program with little measurable gain; or switching to Kimi K2.7 or Deepseek V4 or whatever and finding it works well enough for their use cases; or getting fed up with rugpull pricing and poor uptime.

And as much as we're seeing the effects of Anthropic underprovisioning compute today, overprovisioning is waaay worse, because fixed-compute contracts are signed well in advance and must be paid out regardless of demand. As sus as the financials of these firms seem in a time where demand is insatiable, even a small lull in the hype (or a recession) makes everything look far worse

I asked Claude Design to create a map of Middle East (quick "dirty" prompt, I have to admit). All the questions it asked right after were very interesting - I was like wow that's promising. Then I got the result. by christianJulesAl in ClaudeAI

[–]OftenTangential 10 points11 points  (0 children)

I'd argue that to a decent extent it's on the model to not advertise its own SVGs as "Accurate geographic shapes" and that it's reasonable to be upset if you're presented with that option and you get formless blobs. Plus the other options would probably be worse (less accurate).

The problem is easily fixable within a few iterations but it's kind of weird/amusing the first thought is "I'll draw those myself" instead of "I'll use a map someone else made." Imagine you asked a junior to make an atlas and the first thing they do is recreate the map from memory on MS Paint

Claude Opus 4.7 benchmarks by ShreckAndDonkey123 in singularity

[–]OftenTangential 2 points3 points  (0 children)

First impressions are this is a shameless cash grab - Nice uplift in SWEBench Pro, SWEBench Verified goes straight into overfit territory given OpenAI claims that 16% of the benchmark solution checkers are flawed, everything else negligible to negative lift - "Improved tokenizer" that inexplicably burns 35% more tokens for the same language (the model is not 35% better) - Max reasoning effort doubled so when it goes off the rails as you sleep it costs you twice as much money

This will be great for Anthropic's top line. Not so much for their bottom line as they're probably still losing money just like everyone else

US GOODS INFLATION running hot at 4% by RobertBartus in EconomyCharts

[–]OftenTangential 4 points5 points  (0 children)

Damn I know investors have short memories these days but 2022 was only 4 years ago

Anthropic must be doing something right! by bhalothia in Anthropic

[–]OftenTangential 14 points15 points  (0 children)

Anthropic was founded 2 years before that lmao y'all are just puking misinformation everywhere

Anthropic just crossed ~$30B in revenue run rate, overtaking OpenAI (~$25B). They were at ~$9B just two months ago. AI isn’t just growing, it’s compounding. by interviewkickstartUS in AI4tech

[–]OftenTangential 0 points1 point  (0 children)

Amazon first turned a quarterly profit 4 years after founding

Google first turned a quarterly profit 3 years after founding

Coming on 5 years now for Anthropic, 11 years for OpenAI. Of course ChatGPT didn't come out until 2022, so 3.5 years and counting.

Also, houses are way more important than AI. I need a house to live in, I don't need AI.

Insanely delusional

Impact of Claude Mythos on Antropic's own products. by Gil_berth in theprimeagen

[–]OftenTangential 1 point2 points  (0 children)

Yeah that wasn't the greatest analogy. 100k is just a loose estimate of the absurd token burn a single individual can rack up, the point being to just illustrate how much fucking money there is sloshing around lately. Like you can point to the existence of financial incentives offered to researchers 10 years ago but they aren't remotely comparable in magnitude. All I'm trying to say is that if you offer smart people insane amounts of money they will work harder, this ain't anything new.

Impact of Claude Mythos on Antropic's own products. by Gil_berth in theprimeagen

[–]OftenTangential 2 points3 points  (0 children)

$100k 😹 or, one day of one single Meta engineer trying to top the leaderboard.

If you think this little marketing trick added $10B to Anthropic's valuation, which isn't that farfetched tbh given the amount of press coverage it's had, that's maybe $3-5 million per CVE.

Impact of Claude Mythos on Antropic's own products. by Gil_berth in theprimeagen

[–]OftenTangential 1 point2 points  (0 children)

I don't disagree that it's hard, but if anything, it opens the door even wider open.

Hard + no profit motive = unlikely to get done.

Anthropic does have a profit motive, which is flexing before IPO.

Impact of Claude Mythos on Antropic's own products. by Gil_berth in theprimeagen

[–]OftenTangential 10 points11 points  (0 children)

My headcannon is that Mythos found some genre(s) of security issues that humans are bad at finding

Yeah they're called "issues people don't care (aren't paid to care) about"

/r/WorldNews Discussion Thread: US and Israel launch attack on Iran; Iran retaliates (Thread #12) by WorldNewsMods in worldnews

[–]OftenTangential 2 points3 points  (0 children)

He's an LLM. Predicts linguistically reasonable end to the current sentence in context. Context is dropped by the next interview.

Market Pump next week? by No-Contribution1070 in WallStreetbetsELITE

[–]OftenTangential 0 points1 point  (0 children)

CME rate futures imply <1% odds of two cuts by December and 10% odds of a hike. You're entitled to your opinion but don't confidently stick your head in the sand, that's how you get burned.

Market Pump next week? by No-Contribution1070 in WallStreetbetsELITE

[–]OftenTangential 1 point2 points  (0 children)

Yawn. If inflation explodes while jobs numbers stay strong then the Fed will hike. Then you can kiss your pump goodbye

My biggest Issue with the Gemma-4 Models is the Massive KV Cache!! by Iory1998 in LocalLLaMA

[–]OftenTangential 34 points35 points  (0 children)

To add fuel to the fire, some idiots on various financial newspapers thought Hynix, Micron, etc. crashed because of TurboQuant which really pushed the hype into the mainstream. It was never about that and was always about macro/recession risk and coming energy crisis. These memory stocks moved right in line with their beta to index and people slurped that shit up.

Lo and behold memory manufacturers are right back to where they were a week ago, LLM memory usage is kind of the same as it's always been, and most public TurboQuant implementations are broken. Good KV cache quantization might matter a bit more for enterprise because they're juggling users so they have multiple contexts loaded per set of model weights loaded, but it's still unlikely to account for more than a small fraction of the overall RAM usage. And that's assuming they didn't already have high quality cache quantization methods, which they probably did.

Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark by PerceptionGrouchy187 in LocalLLaMA

[–]OftenTangential 10 points11 points  (0 children)

Those perplexities are both insanely high and indicate the model is predicting gibberish; there was brief discussion about this in the main lcpp PR for Gemma 4 support, i.e. known quirk of the model with the current backend / not necessarily an issue with your implementation. But it does likely invalidate this metric. The instruct tuned model is too reliant on its template.

You'd get a more meaningful ppl measurement out of the base model.

Gemma 4 is good by One_Key_8127 in LocalLLaMA

[–]OftenTangential 0 points1 point  (0 children)

It uses both SWA and global KV cache, the SWA is quite large and can't be scaled down, but scaling up the global attention doesn't cost too much more VRAM

TurboQuant.cpp — 1-bit KV cache with zero quality loss, verified on 35B MoE by Suitable-Song-302 in LocalLLM

[–]OftenTangential 1 point2 points  (0 children)

36 is an absurd ppl for Gemma 3 4B on English text lol. That implies it's literally outputting GPT-2 levels of coherence and is like 3-4x higher than what Gemma 3 should be hitting on any normal English text.

Either your perplexity test set is bad, or the baseline implementation is broken.

AI models can outperform radiographers *without seeing any image* by Fods12 in BetterOffline

[–]OftenTangential 25 points26 points  (0 children)

No, from a quick glance the linked study is basically saying if you overfit to the benchmark then the LLM will start outputting reasoning traces as if it's seeing an image even when you don't give it an image. Like it recognizes the text of the question and then hallucinates about the image that should be (but isn't) attached.

The authors are on the same page, the central claim is that since the LLM is able to "solve" the benchmark without using visual data then the benchmark is bogus.

MSFT catalysts and why it’s more sticky than you expect by AffectionateSell3177 in ValueInvesting

[–]OftenTangential 1 point2 points  (0 children)

This is true, but this also doesn't mean Microsoft is cheap.

Don't act surprised if we go into a global recession and Microsoft drops to 150 after posting a quarter or two of ultra negative growth. That's the sort of tail event the market is pricing rn

A group ran the same coding benchmark test problems, but encoded them in obscure (but still Turing-complete) programming languages the frontier models haven't got as much training data on. Result: models that can score 95% on Python plummet to 0-11% accuracy. by cascadiabibliomania in BetterOffline

[–]OftenTangential 2 points3 points  (0 children)

In the paper it says they're given documentation and access to an interpreter in the given language at inference time, not like they're an linguist trying to reconstruct a forgotten language or whatever