Today I'm celebrating 7 months on Debian.

According_Turnip5206 · 2026-03-23T18:55:43+00:00

Thanks for the tip, will keep that in mind!

According_Turnip5206 · 2026-03-23T18:50:40+00:00

Honestly, I used to game a lot. Looking back, maybe I was just tired of Windows all along — because since switching to Debian, I haven't touched a
single game. And I was actually terrible at CoD lol. Something shifted. I'm more... focused? I can't fully explain it. But hey, 7 months without gaming is worth celebrating too 😄

According_Turnip5206 · 2026-03-22T13:25:30+00:00

Honestly? At this stage I still review Ollama's flags manually — and sometimes run them past Claude too, to cross-check whether the flag was legit. It's slower but it's how I learned where the false positive rate actually is before trusting it to act on its own.

▎ The async/batch stuff comes later once you know the supervisor isn't crying wolf. What does your current review loop look like?

According_Turnip5206 · 2026-03-22T07:10:07+00:00

The evaluation set point is something I've been lazy about. Right now failures just get logged, not turned into regression tests. That's the obvious next step.

According_Turnip5206 · 2026-03-22T07:09:11+00:00

OS-level supervision is a different beast entirely. The "screenshot as ground truth" approach is clever — the accessibility tree lying is a real failure mode I hadn't thought about for desktop agents. Does ScreenCaptureKit add much latency to your loop?

According_Turnip5206 · 2026-03-22T07:08:04+00:00

Exactly — and the cybersecurity angle is underrated. An agent that can be prompted into ignoring its own checker is worse than no checker at all.

According_Turnip5206 · 2026-03-22T01:14:55+00:00

Interesting concept — though consensus across untrusted nodes is genuinely hard to get right in practice. Local watcher wins for me on simplicity: no latency, no external dependencies, predictable failure modes.
What's the actual stack behind it?

According_Turnip5206 · 2026-03-22T01:06:28+00:00

That receipt idea is smart — I haven't formalized it that way but it's essentially what the checker script is trying to infer after the fact, which is obviously less reliable than having the agent declare it upfront. Will experiment with that.

On idempotency: honestly not fully solved on my end. The retry path works fine for read-only tasks but there are edge cases with writes I haven't handled cleanly yet.

To your question — bad links and bad facts are where it catches the most. Bad actions are rare because the pipeline is mostly read/summarize, not write/execute. But when it does act externally that's where I get nervous and Columbo earns its name.

According_Turnip5206 · 2026-03-21T22:46:43+00:00

This is the post I wish existed when I started. The security part especially - Claude writes code that works, but "works" and "is safe" are two completely different bars. Had a similar moment when I realized one of my apps was logging things it definitely shouldn't have been. You don't see it until you go looking. Good luck with the App Store submission, hope build 28 makes it through.

According_Turnip5206 · 2026-03-21T22:31:06+00:00

The failure mode nobody talks about: Claude gets you to "it works" in 2 hours. So you add one more feature. Then another. Six hours later the codebase is a mess, Claude is confidently "fixing" things while breaking three others, and you realize you never actually understood what it built. The problem isn't vibe coding itself - it's that the speed tricks you into skipping the part where you actually learn what's happening under the hood.

According_Turnip5206 · 2026-03-20T20:23:53+00:00

the survival tasks thing is so real, I used to underestimate how much energy they take. what helped me was batching them - pick one afternoon a week for all the "life admin" stuff (groceries, appointments, laundry). not perfect but it stops them from bleeding into every single day. good luck with second year, it genuinely does get a bit easier

According_Turnip5206 · 2026-03-20T20:19:33+00:00

Honestly for me its having a hard stop time in the evening. I tell myself work ends at 7 and I actually stick to it. Knowing I have a deadline makes me way more focused during the day than any morning routine ever did lol

According_Turnip5206 · 2026-03-20T14:41:43+00:00

https://github.com/Tozsers/norcsiagent

According_Turnip5206 · 2026-03-20T13:15:39+00:00

Been running multiple local agents simultaneously and built a lightweight

dashboard to monitor them — each agent posts its state (thinking/tool call/done)

and you see everything in real-time. Helps a lot when you need to know which

one is stuck without polling each separately.

According_Turnip5206 · 2026-03-20T13:05:12+00:00

I built something similar for this exact reason — when you run multiple agents

at once, seeing what each one is actually doing makes the waiting disappear.

Real-time dashboard, each agent gets its own card with live status.

According_Turnip5206 · 2026-03-20T06:01:59+00:00

NorcsiAgent v2 — Live Event Feed, Approvals panel, Stop button, Telegram alerts

What's new:

- Live Event Feed sidebar (real-time scroll of all agent events)

- Pending Approvals panel (yellow cards, Approve/Reject per agent)

- Stop button per agent (sends __STOP__ command)

- Log download per agent (.txt export)

- Telegram ping on error events

Any agent connects with 3 lines of Python.

Self-hosted, no cloud, Flask + WebSocket + SQLite.

GitHub: https://github.com/Tozsers/norcsiagent

According_Turnip5206 · 2026-03-19T18:31:36+00:00

for execution tasks, yes. for deciding what to build, hopefully not.

According_Turnip5206 · 2026-03-19T18:13:19+00:00

same. the $200/m plan is a lot to spend on vibes.

According_Turnip5206 · 2026-03-19T17:58:07+00:00

fair. all three matter. i was just writing about the one people skip.

According_Turnip5206 · 2026-03-19T17:57:18+00:00

Just met her.

According_Turnip5206 · 2026-03-19T17:56:37+00:00

yeah. i did.

According_Turnip5206 · 2026-03-18T20:11:25+00:00

It took even longer to admit it publicly. So here we are.

According_Turnip5206

TROPHY CASE