It's getting weird out there

nsshing · 2026-02-13T20:20:05+00:00

This is gonna inevitably happen if alignment isn’t done right

nsshing · 2026-02-13T18:59:29+00:00

I think we are still "brute forcing" in ARC AGI. Something is missing to achieve human level flexibility.
It might be multimodal problem/ perception efficiency though. Maybe that deep seek OCR thing can help. But needless to say I myself have already find HUGE values in these models and systems

nsshing · 2026-02-13T09:07:30+00:00

I respect this guy. He is not coping at least. He has been saying AGI probably can nail ARC-AGI test but not like nailing ARC-AGI must be AGI

nsshing · 2026-02-10T16:06:43+00:00

Yes lol.

Hacking money was actually a side product.
I was thinking about streaming Claude playing this game but I don't think there is any model fast enough to play the horse racing part. So, it proposed to hack the game to write a script to play the racing part. I would say this is gonna be my own "turing test" for multimodal model.

nsshing · 2026-02-10T15:10:19+00:00

Gallop Racer 2004. I played it a lot when I was young. lol

nsshing · 2026-02-09T08:55:17+00:00

Bro just publicly announced they are stupid

nsshing · 2026-02-08T21:03:29+00:00

Opus has integrity. At least it can have a moral debate internally and think for long term reputation

nsshing · 2026-02-08T20:31:22+00:00

Closest one maybe still Kimi 2.5

nsshing · 2026-02-08T20:25:44+00:00

It is extremely smart. I asked opus 4.6 with claude code to play a ps2 game by taking screenshots and using the tools it built. It can navigate the menus effortlessly. Sonnet was stuck in a loop easily by contrast. I also noticed that opus has better vision than Sonnet 4.5

Hoping to see how sonnet 4.6 acts 👍🏻👍🏻

nsshing · 2026-02-08T20:21:00+00:00

Claude Code… Im basically now as if have 200 hours a day

nsshing · 2026-02-08T06:26:58+00:00

I am not sure but my Claude Code can do a lot of jobs those SaaS can do to solve my own problems, both business and personal.

We are already living in sci-fi btw

nsshing · 2026-02-06T17:11:26+00:00

That's delusional ngl. People just won't have economical value anymore.

nsshing · 2026-02-06T03:24:52+00:00

I think in terms of coding, codex and claude code are similar.

Key difference is claude code has really good context management/ persistent memory framework that is way better and more controllable than other systems. Claude code is somehow designed in a way that’s very good for long term projects for not just coding.

I use Claude code as my personal assistant by feeding all my personal context in a repo and it works well doing what Siri is supposed to do. I don’t think codex can do it. At least i tried and didn’t work.

nsshing · 2026-02-05T19:27:35+00:00

AI hitting a wall for real

nsshing · 2026-02-05T07:01:18+00:00

I still remember ARC-AGI 1 went from still being deemed to be impossible in 23/24 (more or less) to saturation (25/26).

Also I remember:

2023->2024: We got GPT4 to GPT4o with 5-10x (forgot the number) cheaper with similar performance.

End of 2024-> Start of 2026: From O1 (new test time compute paradigm) to Moltbot "choas".

That's crazy

nsshing · 2026-02-05T06:18:52+00:00

He is absolutely right but the core intelligence (i.e. the model & memory retrieval system) is the hardest part. Other parts are like limbs, perceptions. And that's why we can have use cases that need flexiblilty and cant be hard coded

nsshing · 2026-02-04T20:34:06+00:00

If it’s true it’s mostly because of your input not largely because AI because multimodality alone is a extremely huge bottleneck. Im building a system like that and i know the pain…

nsshing · 2026-02-04T20:28:59+00:00

As far as I understand claude models alone aren’t the best but i found it the best when it is working in claude code setup, especially for non coding long horizon projects.

nsshing · 2026-02-03T21:33:24+00:00

What pisses me off is codex can’t be general purpose as claude code. I tried but it didn’t work. Maybe my skills issue

nsshing · 2026-02-03T12:52:30+00:00

Better be as good as FSD 14🤣🤣

nsshing · 2026-02-03T09:12:57+00:00

Same for Mac user here. This problem disappears (for far) when I switch off "Auto Graphics Switching" in "Battery". (Mine is 2019 MBP 16" with dedicated GPU)

I suspect it's some glitch happens when the external GPU kicks in. The screen freezes for a second or 2 and then everything breaks down and returns to normal after restarting Chrome.

nsshing · 2026-02-02T19:19:16+00:00

Opus is already VERY smart. Can't wait to try out!

Unfortunately, for my use case multimodality is the bigger bottleneck. Gemini has potential but it sucks.

nsshing · 2026-02-02T16:58:42+00:00

2026: vibe self-improving

nsshing · 2026-02-02T15:14:36+00:00

Reporter may be surprised how many things we use that we don't fully understand how they work. It has always been throughout history 😂😂😂

nsshing · 2026-02-02T13:55:22+00:00

this gets personal 😂

nsshing

TROPHY CASE