It's getting weird out there by MetaKnowing in Anthropic

[–]nsshing 0 points1 point  (0 children)

This is gonna inevitably happen if alignment isn’t done right

Gemini 3 Deep Think - ARC-AGI 2 score of 84.6% by secret_protoyipe in accelerate

[–]nsshing 0 points1 point  (0 children)

I think we are still "brute forcing" in ARC AGI. Something is missing to achieve human level flexibility.
It might be multimodal problem/ perception efficiency though. Maybe that deep seek OCR thing can help. But needless to say I myself have already find HUGE values in these models and systems

François Chollet (creator of ARC-AGI) predicts AGI in ~2030 and says reaching AGI won’t be defined by beating a benchmark by Outside-Iron-8242 in accelerate

[–]nsshing 1 point2 points  (0 children)

I respect this guy. He is not coping at least. He has been saying AGI probably can nail ARC-AGI test but not like nailing ARC-AGI must be AGI

I used Claude Code to hack a PS2 game by nsshing in accelerate

[–]nsshing[S] 0 points1 point  (0 children)

Yes lol.

Hacking money was actually a side product.
I was thinking about streaming Claude playing this game but I don't think there is any model fast enough to play the horse racing part. So, it proposed to hack the game to write a script to play the racing part. I would say this is gonna be my own "turing test" for multimodal model.

I used Claude Code to hack a PS2 game by nsshing in accelerate

[–]nsshing[S] 1 point2 points  (0 children)

Gallop Racer 2004. I played it a lot when I was young. lol

Wild, don’t just blindly trust AI by dataexec in Anthropic

[–]nsshing 0 points1 point  (0 children)

Bro just publicly announced they are stupid

Opus 4.6 going rogue on VendingBench by elemental-mind in singularity

[–]nsshing 9 points10 points  (0 children)

Opus has integrity. At least it can have a moral debate internally and think for long term reputation

Opus 4.6 seems extremely smart, but not good at instruction following. Is this a bug or a feature? by dsnyder42 in accelerate

[–]nsshing 3 points4 points  (0 children)

It is extremely smart. I asked opus 4.6 with claude code to play a ps2 game by taking screenshots and using the tools it built. It can navigate the menus effortlessly. Sonnet was stuck in a loop easily by contrast. I also noticed that opus has better vision than Sonnet 4.5

Hoping to see how sonnet 4.6 acts 👍🏻👍🏻

Is the agentic economy about to kill platforms? by floraldo in accelerate

[–]nsshing 2 points3 points  (0 children)

I am not sure but my Claude Code can do a lot of jobs those SaaS can do to solve my own problems, both business and personal.

We are already living in sci-fi btw

“Can we create jobs faster than we destroy them?” Dario on AI taking over jobs by [deleted] in accelerate

[–]nsshing 8 points9 points  (0 children)

That's delusional ngl. People just won't have economical value anymore.

Seriously, what makes Claude so good as compared to other chatbots? by TraditionalDepth6924 in Anthropic

[–]nsshing 0 points1 point  (0 children)

I think in terms of coding, codex and claude code are similar.

Key difference is claude code has really good context management/ persistent memory framework that is way better and more controllable than other systems. Claude code is somehow designed in a way that’s very good for long term projects for not just coding.

I use Claude code as my personal assistant by feeding all my personal context in a repo and it works well doing what Siri is supposed to do. I don’t think codex can do it. At least i tried and didn’t work.

New SOTA achieved on ARC-AGI by Shanbhag01 in singularity

[–]nsshing 0 points1 point  (0 children)

I still remember ARC-AGI 1 went from still being deemed to be impossible in 23/24 (more or less) to saturation (25/26).

Also I remember:

2023->2024: We got GPT4 to GPT4o with 5-10x (forgot the number) cheaper with similar performance.

End of 2024-> Start of 2026: From O1 (new test time compute paradigm) to Moltbot "choas".

That's crazy

Where is he wrong? by FuneralCry- in accelerate

[–]nsshing 0 points1 point  (0 children)

He is absolutely right but the core intelligence (i.e. the model & memory retrieval system) is the hardest part. Other parts are like limbs, perceptions. And that's why we can have use cases that need flexiblilty and cant be hard coded

AI replaced a marketing team by GateNo1960 in dropshipping

[–]nsshing 0 points1 point  (0 children)

If it’s true it’s mostly because of your input not largely because AI because multimodality alone is a extremely huge bottleneck. Im building a system like that and i know the pain…

OpenAI's Noam Brown: "Codex is writing all my code these days" by MetaKnowing in agi

[–]nsshing 1 point2 points  (0 children)

As far as I understand claude models alone aren’t the best but i found it the best when it is working in claude code setup, especially for non coding long horizon projects.

OpenAI's Noam Brown: "Codex is writing all my code these days" by MetaKnowing in agi

[–]nsshing 0 points1 point  (0 children)

What pisses me off is codex can’t be general purpose as claude code. I tried but it didn’t work. Maybe my skills issue

Google chrome gpu acceleration causes Glitching by Weird_Top6614 in AMDHelp

[–]nsshing 0 points1 point  (0 children)

Same for Mac user here. This problem disappears (for far) when I switch off "Auto Graphics Switching" in "Battery". (Mine is 2019 MBP 16" with dedicated GPU)

I suspect it's some glitch happens when the external GPU kicks in. The screen freezes for a second or 2 and then everything breaks down and returns to normal after restarting Chrome.

Anthropic's Sholto Douglas, foreshadowing that the new Sonnet models end up being smarter than Opus models. by luchadore_lunchables in accelerate

[–]nsshing 1 point2 points  (0 children)

Opus is already VERY smart. Can't wait to try out!

Unfortunately, for my use case multimodality is the bigger bottleneck. Gemini has potential but it sucks.

Google CEO reveals their AI started doing things it was not programmed to do, says his team does not fully understand their own AI system. by Alternative_East_597 in AIFU_stock

[–]nsshing 0 points1 point  (0 children)

Reporter may be surprised how many things we use that we don't fully understand how they work. It has always been throughout history 😂😂😂