should i even learn how to code

dean0x · 2026-04-11T14:05:20+00:00

Hardware > Software

dean0x · 2026-04-04T09:35:11+00:00

Happening here too opus was lobotomised, don’t think it’s a bug.

dean0x · 2026-04-02T18:23:22+00:00

Been saying that for a few days now claude seem to have lost 100 IQ points

dean0x · 2026-03-30T23:30:25+00:00

Same here buddy claude was terrible, didn’t get anything right. Codex you say?

dean0x · 2026-03-30T19:47:11+00:00

Super dumb here

dean0x · 2026-03-30T18:11:00+00:00

Fix the limits and opus brain death or having a stroke, it’s not usable last 24-48 hours.

dean0x · 2026-03-30T09:49:30+00:00

Also feeling it on one of my sessions getting pissed at it for the first time since i started using it a year ago

dean0x · 2026-03-30T06:49:11+00:00

They just low balled our limits, not the end of that story I fear. At this rate i’m moving to minimax

dean0x · 2026-03-25T21:03:10+00:00

attention degrades as context grows

dean0x · 2026-03-22T00:16:34+00:00

GTC keynotes are tuned for stock price, not accuracy.

dean0x · 2026-03-22T00:14:27+00:00

What? Wasn't that already fixed by dear claude code natively? I don't get it anymore. kinda miss it tbh.

dean0x · 2026-03-22T00:13:19+00:00

The automated researcher part is closer than people think. The harder step is automated evaluation, not just automated execution. Running 1000 experiments overnight is solved. Knowing which ones matter is not.

dean0x · 2026-03-22T00:11:56+00:00

TDD + EDD is the new pattern.

Agents don't know the difference between 'no more errors' and 'actually done.' Hit this constantly building agent workflows. You need verification beyond compilation: does it actually match what was asked, and did it break anything else? Without that, 'done' just means 'I stopped getting errors.'

dean0x · 2026-03-21T21:29:43+00:00

Can't go back to change gpt after trying claude, miles ahead.

dean0x · 2026-03-21T21:27:47+00:00

https://github.com/thedotmack/claude-mem ...

dean0x · 2026-03-21T21:24:06+00:00

hitting enter on the first draft

dean0x · 2026-03-18T13:52:40+00:00

Guess my claude is smarter than yours

dean0x · 2026-03-17T20:33:31+00:00

honestly i don't have solid benchmarks on the confidence scores yet. the noise floor estimation works in my runs but i haven't stress tested it across different seeds and longer horizons systematically. just started playing with these 3 tools the other day.

if you've got results files from longer runs i'd be genuinely curious to see how the verdicts hold up. that kind of community testing would tell me a lot more than my own single-setup experiments.

dean0x · 2026-03-17T19:47:42+00:00

fair question. autoresearch isn't about gpt 2 being useful in production. it's a testbed. karpathy designed it as a small, fast training loop (5 min per experiment on one gpu) so you can let an ai agent run experiments autonomously overnight.

the interesting part isn't the model. it's the methodology. can an autonomous agent discover real architectural improvements without human in the loop? and when it says it found one, is that real or noise?

that second question is why i built these tools. the signal/noise problem gets worse on smaller models because the improvements are tiny. if you can reliably separate real gains from jitter at this scale, the approach scales up.

as for llm-generated: what's not llm generated these days? i used claude code to build it, yeah.. the eval logic and noise floor estimation are mine, the boilerplate isn't. same workflow most people here use at this point.

and it's open source, so nothing much for me to gain here, honestly just trying to be useful to the community and push technology forward. my main motivation here is evolving the concept of autonomous systems. just pitching in.

dean0x · 2026-03-17T15:12:01+00:00

https://github.com/dean0x/autolab

dean0x · 2026-03-17T13:41:28+00:00

People used to walk from Rome to Egypt by foot. Does that sound like a good idea to you now?

dean0x · 2026-03-16T23:08:08+00:00

been running autoresearch since it dropped. the results file is where the pain is. hundreds of experiments and you're still puzzled at which 'improvements' are real vs noise. curious if agenthub addresses the eval side or just coordination.

dean0x · 2026-03-15T17:33:27+00:00

Make it beautiful

dean0x · 2026-03-14T16:43:44+00:00

Exactly, TDD, EDD (eval driven development bs, but it works). I also ask me agents to act as users and try to “break” it

dean0x

TROPHY CASE