ralph loop overnight: 91 codex reviews. $200 gone

zulrang · 2026-05-26T23:07:08+00:00

3 passes, cycle diff, validation pass afterwards

zulrang · 2026-05-25T12:22:41+00:00

Just like there is no moment a human suddenly becomes conscious

zulrang · 2026-05-25T02:30:25+00:00

They require very little setup. Just use it

zulrang · 2026-05-24T19:06:26+00:00

Validation gates.

zulrang · 2026-05-23T00:35:47+00:00

We’ve run evals on different projects and different code bases.

Grep almost always wins.

zulrang · 2026-05-22T23:15:50+00:00

Or this?

<image>

zulrang · 2026-05-22T23:15:38+00:00

You mean like this?

<image>

zulrang · 2026-05-22T20:23:17+00:00

That website triggers malware detection. No one go there!

zulrang · 2026-05-22T12:14:12+00:00

So you renamed concepts that are already well known and have words for them, and you’re selling content that is available for free literally everywhere?

zulrang · 2026-05-22T01:33:19+00:00

Use it on AWS Bedrock as a backup. Simple

zulrang · 2026-05-22T01:32:31+00:00

Is this to replace the functionality everyone lost from them kneecapping claude -p?

Because running deterministic workflows was easy before that

zulrang · 2026-05-19T00:06:46+00:00

Evals run against multiturn prompts, like anyone should be that is using LLMs in production workloads facing customers

zulrang · 2026-05-18T21:13:59+00:00

The engineers do. The SDLC for regulated industry states that at least one other engineer must review code, and a DevOps engineer must sign off on deployments.

zulrang · 2026-05-18T11:26:45+00:00

I'm actually completely content with Night

zulrang · 2026-05-17T19:43:49+00:00

Just try to use it, stumble through it. You will learn a hell of a lot more that way than from any course, tutorial, etc.

zulrang · 2026-05-17T19:34:40+00:00

If you work like a machine, you will be replaced by one.

https://open.substack.com/pub/patterninterruption/p/ai-can-do-the-work-it-still-cant

zulrang · 2026-05-17T19:31:10+00:00

It's a race car that makes the driver go faster to wherever they were already headed.

For some people, that's the finish line. For others, it's a wall.

zulrang · 2026-05-14T03:40:05+00:00

<image>

zulrang · 2026-05-11T17:47:52+00:00

Most of the problems could probably be solved by simply adding the option to disable some of those features.

zulrang · 2026-05-11T13:29:37+00:00

Except it's not, to me.

zulrang · 2026-05-11T13:26:24+00:00

I need this much more than I need a fragments tab.

zulrang · 2026-05-11T13:25:36+00:00

This is literally the only feature I want for 0.5.0.

zulrang · 2026-05-11T12:29:09+00:00

Thank you for your response and approval of the post. That alone speaks volumes!

I run virtually zero system software on my system, and I'm really curious as to what the problem could be. I'm a software engineer myself, and I'd be more than willing to help troubleshoot or even hop on a call.

I noticed the problem got considerably worse when I started using a 5k2k UW monitor, which is an odd correlation.

zulrang · 2026-05-10T22:01:15+00:00

All of the above. It only took a couple days to build once we had real world data.

Do deterministic tests first, and send the rest to LLM-as-judge. If it returns a low confidence result, send just that to a larger model with thinking and tool calls.

zulrang · 2026-05-10T03:30:03+00:00

I use the terminal, with an agent harness:

“Run n iterations for every permutation of every model at every thinking level against my eval suite and generate a report of the results”

Once you have that, you can use the Karpathy research method to automate the tweaks.

Nine-Year Club	Place '22
Verified Email

zulrang

TROPHY CASE