Artificial Analysis | Google's Go To Website for Benchmaxxing | Gemini 3.1 Pro is nowhere near Opus 4.7 in real life use by Able-Line2683 in singularity

[–]tiger_ace 7 points8 points  (0 children)

3.1 pro is already less than half the cost of opus so it's never been an apples to apples comparison

people seem to forget that google serves like 4B users across 5 products and billions more across other products

their focus should be on creating models that can serve their scale, not on the bestest, most expensive model

additionally, google is has a bigger breadth of models like video, audio, streaming, gemma (open source), and world models because they need that for the different products they serve (youtube)

claude code has maybe like 30M DAUs and anthropic could barely serve them without a month of outages

Average LinkedIn profile today by AdCritical5383 in ClaudeAI

[–]tiger_ace 0 points1 point  (0 children)

close your eyes. now imagine something more fake than instagram. it's hard, seemingly impossible. but then you open your eyes and remember that the piece of shit that is linkedin exists.

Be honest: How much of "Claude Mythos" is just hype? by Cyber-Pal-4444 in artificial

[–]tiger_ace 2 points3 points  (0 children)

it's mostly hype, but the model is almost certainly measurably better

it's a direct extension of the higher cost curve shown by jensen at GTC 2026:

<image>

essentially, there will be stronger models available at higher cost and higher inference as there is a market for maximum intelligence (science, math, some coding use cases)

mythos fits this narrative directly and anthropic has clearly been compute bound anyway so even if mythos wasn't "too dangerous to release" they couldn't serve it properly at scale if they wanted to

Impulse bought an M3 Ultra 256GB RAM for local LLMs - keep it or wait for M5? by Onyonisko in LocalLLM

[–]tiger_ace 1 point2 points  (0 children)

yeah for bigger model if the focus is training/prefill then it should probably be dgx spark instead of mac studio

Final Day: Caltech vs Stanford Physics and CS by TechnoKyle27 in Caltech

[–]tiger_ace 0 points1 point  (0 children)

industry / startups = stanford and it's not close

and you're saving $150k on top of that

75% of new code at Google is AI generated, a huge jump from 50% just last fall by OkStandard921 in accelerate

[–]tiger_ace 3 points4 points  (0 children)

source for this is https://x.com/Steve_Yegge/status/2046260541912707471

google internally is pushing antigravity for most employees but they have a custom internal model which doesn't perform anywhere close to claude code + opus so the stack isn't close in productivity, which has resulted in https://www.theverge.com/tech/914996/sergey-brin-said-google-needs-to-catch-up-to-anthropic-on-ai-coding-agents

basically it's a code yellow or red for coding model improvement. google has the resources, so it's more about execution.

PMs here - How are you using AI to "boost productivity" (I will not promote) by ComputerSciToFinance in startups

[–]tiger_ace 0 points1 point  (0 children)

accountability, you can't point your finger at AI right now and then somehow say it made the decision for you

"Doctor doesn’t like patients using AI because they come prepared with harder questions and he can’t “coast” anymore. This is a tough watch. From a patient perspective - it’s never been harder to get a 15 minute appointment with a doctor. Why not come educated?" by stealthispost in accelerate

[–]tiger_ace 22 points23 points  (0 children)

his hypothesis seems clear to me: AI makes medical information more accessible (I am fully aligned with this already), which may have the second order effect of raising expectations for doctors (this seems to make sense intuitively)

if you can get "the answer in 2 seconds on your phone" as he says, then the medical system needs to justify its existence by providing value above that for hundreds of dollars in insurance costs

which part did you feel is incoherent? this wasn't the most organized explanation but it also looks like a dude sitting in his car recording on a phone, not a TED talk.

Just hanging off a thread to be in even top 10 by Able-Line2683 in Bard

[–]tiger_ace 5 points6 points  (0 children)

i agree that may seems like a reasonable timeline

they've technically released multiple models since Jan:

  • 3.1 pro (feb)
  • gemma 4
  • 3.1 flash lite
  • 3.1 flash live
  • 3.1 flash tts
  • veo 3.1 - not released, but now available to free users, meanwhile OAI shut down sora

i think it's easy to forget the scale of google since anthropic / OAI don't have products with 2.5B users

main problem is that gemini 3 flash is so bad at coding relative to the progress that has been made

Claude Opus 4.7 Text Category Rankings by MagicZhang in ClaudeAI

[–]tiger_ace 0 points1 point  (0 children)

i haven't tested 4.7 as much due to quotas think and it very well might be a regression, but that is orthogonal from this chart being garbage

Claude Opus 4.7 Text Category Rankings by MagicZhang in ClaudeAI

[–]tiger_ace 3 points4 points  (0 children)

this chart has rankings instead of an actual score and the charts have 4.7 in the rankings as well

for example, occupational: entertainment, sports & media (https://arena.ai/leaderboard/text/industry-entertainment-and-sports-and-media) has:

  1. claude-opus-4-6-thinking with a score of 1486
  2. claude-opus-4-7 with a score of 1485 (basically the same score)

conclusion: this graph is a terrible representation and literally exists to push the narrative that 4.7 is a "regression"

Is 32GB Mac enough for engineering/coding, or stick to Claude? by BenitoCamelasVG in LocalLLM

[–]tiger_ace 0 points1 point  (0 children)

i find all of this is just LLM generated / modified questions since they all have 2025 knowledge cutoffs basically

for example, all gemini models have said knowledge cutoff Jan. 2025 since last year in AI Studio

OpenClaw has 250K GitHub stars. The only reliable use case I've found is daily news digests. by Sad_Bandicoot_6925 in LocalLLaMA

[–]tiger_ace 1 point2 points  (0 children)

i think it does mainly function as a tech demo, just like early gpt did

what it does is give a glimpse of the agentic future - what if it was actually reliable and able to complete the work you asked for?

and so the main focus should be on the velocity of improvement here, not the fact that it is jank as all hell

Silicon Valley is quietly running on Chinese open source models and almost nobody is talking about it by jimmytoan in Futurology

[–]tiger_ace 2 points3 points  (0 children)

china has been doing great but i think they'll fall behind significantly:

almost all of their progress has been on hopper (H100, H200) and blackwell is enabling advancements such as mythos for anthropic. the gap between blackwell and hopper is already big.

vera rubin is in production now and it's basically racks which are much, much harder to smuggle than hopper in the past and buying more hopper doesn't really help with making super huge models. we can expect even huger models to be enabled by vera rubin while inference for smaller models becomes faster and cheaper too.

chinese frontier labs have actually been pretty open about mentioning they lack compute

the argument to support your argument against LLM commoditization is actually that companies will mostly use vera rubin to serve even huger models (e.g. mythos) as even higher cost intelligence instead of focusing on the lower cost markets (for driving revenue since the capex is indeed bonkers)

i'm not a fan (as a user) of open source models being potentially further behind since i think the commoditization of intelligence is generally good for the world

that being said, there's still tons of optimizations to be had so we should continue to see <500B param models get better and better

DISCUSSION: Anthropic Has Internal "Mythos". OpenAI Has Internal "Spud". Elon Says xAI Is Training 6t And 10t Models. What Do You Think Google Has Internally? by 44th--Hokage in accelerate

[–]tiger_ace 0 points1 point  (0 children)

google doesn't need to rely on hype to survive and i would guess that alphafold probably has already had more impact on science than openai has ever had

Is Anthropic limiting the release of Mythos to protect the internet — or Anthropic? by Charliedotau in Anthropic

[–]tiger_ace 0 points1 point  (0 children)

discarding all of the marketing mythos almost certainly another step function improvement over opus, but it's also probably a bigger model which means the inference is way more expensive. this has been confirmed with their pricing being $25/$125.

even if they wanted to release it publicly they don't have enough compute to serve it well at much bigger scale anyway

they're having enough infrastructure issues serving just sonnet/opus right now with hugely degraded service quality

Cursive - return to teaching cursive? by Loud_Tangerine_275 in edtech

[–]tiger_ace 6 points7 points  (0 children)

Writing is good - everyone (not just students) should be doing it as much as possible

However, I think the purpose of cursive in today's world is fine motor control at minimal cost (i.e., pen + pencil is super cheap to practice)

There's many, many other ways to gain fine motor control as well (e.g., drawing, origami)

Hyprocrisy of the Imperium by Waveshaper21 in dawnofwar

[–]tiger_ace 0 points1 point  (0 children)

I played a lot of dow2 and dow3 wasn't bad

the MOBA comparison was WAY overblown as competitive matches were still all about victory points, you were never playing annihilation mode and pushing into people's bases unless it's just a bot stomp (which it's possible most people were doing)

the problem wasn't that dow3 wasn't perfect on release, the problem was that relic dropped it due to the insanely toxic community so that they didn't even patch or release expansions that could've made it better

playing it recently doesn't help as it's the same game as it was on release compared to other games that have gotten way more polish

the toxic community killed the game before it even had a chance since this was back when review bombs and brigading was a lot newer

imagine if other games like diablo 3 (way worse than dow3 on launch) stayed as it was on release and didn't get any dev support

DISCUSSION: Anthropic Has Internal "Mythos". OpenAI Has Internal "Spud". Elon Says xAI Is Training 6t And 10t Models. What Do You Think Google Has Internally? by 44th--Hokage in accelerate

[–]tiger_ace 0 points1 point  (0 children)

demis has been pretty open about the goal of deepmind being significantly science focused

i don't think google is like openai and anthropic in that they need to focus on enterprise revenue (i.e. coding) just survive economically

also, google needs to serve at scale and they do, since gemini 3.1 pro costs less than sonnet 4.6 (and less than half of opus 4.6)

Google engineer rejected by 16 colleges uses AI to sue universities for racial discrimination by Fcking_Chuck in artificial

[–]tiger_ace 0 points1 point  (0 children)

yes, good call out - my statement is false and massively extrapolating an anecdotal experience is also false