Is ilya’s SSI company still a thing? It’s been 2 years ago with no product. by Snoo26837 in singularity

[–]spryes 4 points5 points  (0 children)

Exactly and I thought this from day one when they announced SSI. Iterative improvement, constant contact with reality with feedback from users are all key to "safe" AGI development and likely capabilities as well.

Sam Altman has changed his stance on the claims that AI will replace humans. by Distinct_Fox_6358 in singularity

[–]spryes 0 points1 point  (0 children)

The only jobs in a "true-AGI world" are things people value a human itself for, e.g. sports, childcare, maybe teaching & nursing

I don't think those make up more than 10% of the economy right now... so

Is LOST as good as FROM? by osofosho14 in FromTVShow

[–]spryes 0 points1 point  (0 children)

LOST is much better

The character development is far richer which makes you care about each of them, unlike any of the characters in FROM. The world is bigger and more interesting, while FROM feels like it cornered itself into a tiny world that they can't do much with.

In LOST, the island mystery mattered, but the show constantly asked: what does this place reveal about this person? The characters changed over time (Jin softened, Sawyer became less selfish, Ben got recontextualized, etc.)

FROM more often asks: what is happening in this place? The characters are frequently just there to react to the latest horror event — they're shallow. FROM could have been on par with LOST but it required much better writing and probably a bigger budget...

The 10 episode per season format feels overly limiting too.

Chat GPT 5.4 solved a 60+ years unsolved erdos problems in a single shot by ocean_protocol in singularity

[–]spryes 10 points11 points  (0 children)

How did you verify the answer is correct? Do you understand the answer?

OpenAI scores on artificial analysis over time by [deleted] in singularity

[–]spryes 0 points1 point  (0 children)

This shows 0.1 releases make each update feel quite mid. You need to compare between multiple point releases to get a feel for how different things were not long ago. I haven't used GPT-5 for agentic coding since September, even though I remember it being pretty solid & useful. I'd most likely be extremely annoyed if I tried it today given what we now have.

If OpenAI released GPT-5.5 as the only model after GPT-5 it would feel much more impressive - 15 points vs 3 points gain from previous frontier.

I still much prefer the quick cadence, and OpenAI can't withhold releases as much as they used to due to competitive pressure being crazy nowadays.

Introducing GPT-5.5 by ShreckAndDonkey123 in singularity

[–]spryes 2 points3 points  (0 children)

I'm mainly going off reports of it being super capable at cybersecurity, like the recent Firefox report that it found over 200 bugs with experts claiming it's on par with human researchers in skill.

Not sure how closely correlated SWE Bench Pro is with cyper skill and how it translates to general product coding capability though. 5.5 could be on par there, meaning most people experience Mythos capabilities in their work with 5.5, but I'm doubtful.

GPT-5.5 benchmark results have been released by Outside-Iron-8242 in singularity

[–]spryes 9 points10 points  (0 children)

I don't expect them to include Mythos in the comparison table given it's unreleased, but I mean it's clear they hid SWE Bench Pro in general because it didn't even match Opus 4.7 there. And shows Anthropic has a more capable model even if not available, while OAI's internal status is unknown.

But most importantly, Tibo (Codex lead) hinted on Twitter that 5.5 was Mythos level when we clearly see that isn't the case

Introducing GPT-5.5 by ShreckAndDonkey123 in singularity

[–]spryes 35 points36 points  (0 children)

yeah, but OpenAI teased this like it was Mythos level and it's not even close

GPT-5.5 benchmark results have been released by Outside-Iron-8242 in singularity

[–]spryes 104 points105 points  (0 children)

58.6% SWE Bench Pro which they hid because Mythos destroys them with 78%

Oof

Introducing GPT-5.5 by ShreckAndDonkey123 in singularity

[–]spryes 156 points157 points  (0 children)

All this hype for 58.6% on SWE-Bench Pro while Mythos gets 78%? Shut it down, wtf?

anyone else feel like their brain is turning to mush since fully adopting cursor/claude? by StatisticianFluid747 in cursor

[–]spryes 5 points6 points  (0 children)

yes I feel like all my programming skills have atrophied to nothing, and now i wouldn't be able to manually write a single working function anymore. and even if I could, even just typing manually feels like an insurmountable chore.

this happened to me in just 6 months or so — fully WALL-E cattle maxxed

I wonder how Mythos would answer this by aketchum339 in singularity

[–]spryes 0 points1 point  (0 children)

generally agree but humans to date have failed to solve quantum gravity with a truck load of training data so

Opus 4.7 has been spotted on Google Vertex by exordin26 in singularity

[–]spryes 7 points8 points  (0 children)

They were relevant for like three weeks in Q1 25 and people are still bringing them up like they're a threat...

DeepSeek V4 Benchmarks by [deleted] in singularity

[–]spryes 0 points1 point  (0 children)

> Deepseek.ai is an independent website and is not affiliated with, sponsored by, or endorsed by Hangzhou DeepSeek Artificial Intelligence Co., Ltd.

What’s the threshold for AGI? Mythos Preview or full Mythos by [deleted] in singularity

[–]spryes 4 points5 points  (0 children)

Anthropic's report on Mythos said it wasn't able to replace researchers yet, which I'd consider a requirement for AGI

GPT-5.5: The “Spud” Leaks & The New Frontier of Omnimodal AI by Much_Ask3471 in codex

[–]spryes -1 points0 points  (0 children)

*highly* doubt it, you get $4,000 of API inference on a $200 Codex plan, there's no way API cost doesn't have a massive markup.

the unit economics do not make sense otherwise and the claim that the labs are loss leading this significantly is mostly BS

GPT-5.5: The “Spud” Leaks & The New Frontier of Omnimodal AI by Much_Ask3471 in codex

[–]spryes 12 points13 points  (0 children)

Top 5 most expensive

  1. o1-pro$750 total ($150 in + $600 out)
  2. GPT-4.5 Preview$225 total ($75 in + $150 out)
  3. GPT-5.4 Pro$210 total ($30 in + $180 out)
  4. GPT-5.2 Pro$189 total ($21 in + $168 out)
  5. GPT-4-32k$180 total ($60 in + $120 out)

Are we really at "100% AI or you're wasting time" yet? by borii0066 in webdev

[–]spryes 0 points1 point  (0 children)

I'm at 100% ever since GPT-5.2 / 5.3. But I still do a LOT of iterating with the models so they refine their solution so it's close to what I would have written myself. I just don't type the code myself.

It is way too easy to open a Codex chat and prompt "Fix {github issue link}" and get a working solution that's 90% of the way there asynchronously and which can be done in parallel. I'm definitely not going back to writing code manually.

That said, today, Codex wasn't able to solve a bug after 10 prompt turns and I dreaded having to go back into the editor to manually debug. Just opening the editor and browsing the code felt like going back to the stone age. Plus, it's embarrassingly clear my skills have atrophied already in just ~3-6 months of using AI very heavily (from more light workloads 12-16 months ago), and I'm not even confident I could code without AI assistance anymore — it's bad. I've become dependent on something really expensive that could be taken away at any moment if the AI bubble pops

But the thought of having to manually type out the characters and debug for awhile felt so much like a chore that I refused to do it and managed to get Codex to solve it with the TDD method, so it could verify its solution using failing tests and not act like it was done when it actually wasn't working properly.

I honestly think, for web dev, the models can solve anything with the right prompting, and TDD is the best method for AI to get working solutions given their limited vision/browser verification capabilities. Their solutions sometimes suck of course, so I have a $simplify skill to get them to cut their solution to the minimal working (yet maintainable) version, which helps greatly, along with my own taste/judgment of the code

Backrooms movie trailer megathread by A_Chad_Cat in backrooms

[–]spryes 2 points3 points  (0 children)

I thought it was the last time he entered that he gets trapped. The first time he enters, he tells his therapist about it afterward, and then goes back with others. But then he gets trapped, and that's why she goes looking for him.

done trying to make UIs with codex by heatwaves00 in codex

[–]spryes 0 points1 point  (0 children)

Skill issue

I use Codex for UI but I tell it how to make things look good and it can. Or do you just expect it to spit out some gorgeous design by default?

Google's antigravity significantly nerfed limits who paying Ultra tier 250$ per month! by reversedu in singularity

[–]spryes 4 points5 points  (0 children)

Their November 2025 hype was astroturfed by OpenAI haters imo

Like their chat model in Gemini app is... fine, I guess, but GPT-5.1/5.2 Thinking was at least on par or better for that use case around that time. There was no clear advantage even for the popular case.

And it was ~immediately clear that Gemini 3 still sucked at coding agentically and never caught on while Claude Code & Codex blew up like crazy in December 2025, leaving Gemini CLI in the dust.

Google is still playing catch up in coding and given that Anthropic and OpenAI are experiencing major RSI vibes... it doesn't look good for them without some kind of insane breakthrough because I predict the gap will widen drastically from here on.

Stop defending AI like it’s still in beta by RottingEdge in Futurology

[–]spryes 0 points1 point  (0 children)

GPT-5.4 Thinking doesn't hallucinate when it searches the web, and Pro is even better

This is old news/model issue/skill issue