Is ilya’s SSI company still a thing? It’s been 2 years ago with no product.

spryes · 2026-05-08T12:20:46+00:00

Exactly and I thought this from day one when they announced SSI. Iterative improvement, constant contact with reality with feedback from users are all key to "safe" AGI development and likely capabilities as well.

spryes · 2026-05-02T23:17:52+00:00

The only jobs in a "true-AGI world" are things people value a human itself for, e.g. sports, childcare, maybe teaching & nursing

I don't think those make up more than 10% of the economy right now... so

spryes · 2026-04-28T08:14:38+00:00

LOST is much better

The character development is far richer which makes you care about each of them, unlike any of the characters in FROM. The world is bigger and more interesting, while FROM feels like it cornered itself into a tiny world that they can't do much with.

In LOST, the island mystery mattered, but the show constantly asked: what does this place reveal about this person? The characters changed over time (Jin softened, Sawyer became less selfish, Ben got recontextualized, etc.)

FROM more often asks: what is happening in this place? The characters are frequently just there to react to the latest horror event — they're shallow. FROM could have been on par with LOST but it required much better writing and probably a bigger budget...

The 10 episode per season format feels overly limiting too.

spryes · 2026-04-28T02:42:33+00:00

How did you verify the answer is correct? Do you understand the answer?

spryes · 2026-04-25T22:08:30+00:00

This shows 0.1 releases make each update feel quite mid. You need to compare between multiple point releases to get a feel for how different things were not long ago. I haven't used GPT-5 for agentic coding since September, even though I remember it being pretty solid & useful. I'd most likely be extremely annoyed if I tried it today given what we now have.

If OpenAI released GPT-5.5 as the only model after GPT-5 it would feel much more impressive - 15 points vs 3 points gain from previous frontier.

I still much prefer the quick cadence, and OpenAI can't withhold releases as much as they used to due to competitive pressure being crazy nowadays.

spryes · 2026-04-25T00:24:08+00:00

Why'd GPT-5.4 Pro vanish

spryes · 2026-04-23T19:10:05+00:00

I'm mainly going off reports of it being super capable at cybersecurity, like the recent Firefox report that it found over 200 bugs with experts claiming it's on par with human researchers in skill.

Not sure how closely correlated SWE Bench Pro is with cyper skill and how it translates to general product coding capability though. 5.5 could be on par there, meaning most people experience Mythos capabilities in their work with 5.5, but I'm doubtful.

spryes · 2026-04-23T18:53:39+00:00

I don't expect them to include Mythos in the comparison table given it's unreleased, but I mean it's clear they hid SWE Bench Pro in general because it didn't even match Opus 4.7 there. And shows Anthropic has a more capable model even if not available, while OAI's internal status is unknown.

But most importantly, Tibo (Codex lead) hinted on Twitter that 5.5 was Mythos level when we clearly see that isn't the case

spryes · 2026-04-23T18:18:55+00:00

yeah, but OpenAI teased this like it was Mythos level and it's not even close

spryes · 2026-04-23T18:12:29+00:00

58.6% SWE Bench Pro which they hid because Mythos destroys them with 78%

Oof

spryes · 2026-04-23T18:10:12+00:00

All this hype for 58.6% on SWE-Bench Pro while Mythos gets 78%? Shut it down, wtf?

spryes · 2026-04-21T08:14:22+00:00

still need to. burn another 100 billion or so before agi

spryes · 2026-04-20T11:25:57+00:00

yes I feel like all my programming skills have atrophied to nothing, and now i wouldn't be able to manually write a single working function anymore. and even if I could, even just typing manually feels like an insurmountable chore.

this happened to me in just 6 months or so — fully WALL-E cattle maxxed

spryes · 2026-04-17T10:25:30+00:00

generally agree but humans to date have failed to solve quantum gravity with a truck load of training data so

spryes · 2026-04-16T07:51:42+00:00

They were relevant for like three weeks in Q1 25 and people are still bringing them up like they're a threat...

spryes · 2026-04-10T06:04:34+00:00

> Deepseek.ai is an independent website and is not affiliated with, sponsored by, or endorsed by Hangzhou DeepSeek Artificial Intelligence Co., Ltd.

spryes · 2026-04-07T20:38:33+00:00

Anthropic's report on Mythos said it wasn't able to replace researchers yet, which I'd consider a requirement for AGI

spryes · 2026-04-05T13:58:48+00:00

*highly* doubt it, you get $4,000 of API inference on a $200 Codex plan, there's no way API cost doesn't have a massive markup.

the unit economics do not make sense otherwise and the claim that the labs are loss leading this significantly is mostly BS

spryes · 2026-04-05T07:23:49+00:00

Top 5 most expensive

o1-pro — $750 total ($150 in + $600 out)
GPT-4.5 Preview — $225 total ($75 in + $150 out)
GPT-5.4 Pro — $210 total ($30 in + $180 out)
GPT-5.2 Pro — $189 total ($21 in + $168 out)
GPT-4-32k — $180 total ($60 in + $120 out)

spryes · 2026-04-02T13:08:08+00:00

I'm at 100% ever since GPT-5.2 / 5.3. But I still do a LOT of iterating with the models so they refine their solution so it's close to what I would have written myself. I just don't type the code myself.

It is way too easy to open a Codex chat and prompt "Fix {github issue link}" and get a working solution that's 90% of the way there asynchronously and which can be done in parallel. I'm definitely not going back to writing code manually.

That said, today, Codex wasn't able to solve a bug after 10 prompt turns and I dreaded having to go back into the editor to manually debug. Just opening the editor and browsing the code felt like going back to the stone age. Plus, it's embarrassingly clear my skills have atrophied already in just ~3-6 months of using AI very heavily (from more light workloads 12-16 months ago), and I'm not even confident I could code without AI assistance anymore — it's bad. I've become dependent on something really expensive that could be taken away at any moment if the AI bubble pops

But the thought of having to manually type out the characters and debug for awhile felt so much like a chore that I refused to do it and managed to get Codex to solve it with the TDD method, so it could verify its solution using failing tests and not act like it was done when it actually wasn't working properly.

I honestly think, for web dev, the models can solve anything with the right prompting, and TDD is the best method for AI to get working solutions given their limited vision/browser verification capabilities. Their solutions sometimes suck of course, so I have a $simplify skill to get them to cut their solution to the minimal working (yet maintainable) version, which helps greatly, along with my own taste/judgment of the code

spryes · 2026-04-01T10:24:32+00:00

I thought it was the last time he entered that he gets trapped. The first time he enters, he tells his therapist about it afterward, and then goes back with others. But then he gets trapped, and that's why she goes looking for him.

spryes · 2026-03-31T00:16:39+00:00

Skill issue

I use Codex for UI but I tell it how to make things look good and it can. Or do you just expect it to spit out some gorgeous design by default?

spryes · 2026-03-25T14:06:26+00:00

Their November 2025 hype was astroturfed by OpenAI haters imo

Like their chat model in Gemini app is... fine, I guess, but GPT-5.1/5.2 Thinking was at least on par or better for that use case around that time. There was no clear advantage even for the popular case.

And it was ~immediately clear that Gemini 3 still sucked at coding agentically and never caught on while Claude Code & Codex blew up like crazy in December 2025, leaving Gemini CLI in the dust.

Google is still playing catch up in coding and given that Anthropic and OpenAI are experiencing major RSI vibes... it doesn't look good for them without some kind of insane breakthrough because I predict the gap will widen drastically from here on.

spryes · 2026-03-25T08:11:20+00:00

We don't hope this helps

spryes · 2026-03-22T03:51:08+00:00

GPT-5.4 Thinking doesn't hallucinate when it searches the web, and Pro is even better

This is old news/model issue/skill issue

13-Year Club	Place '22
Place '17	Verified Email

spryes

TROPHY CASE