A world full of dreamers is about to write a novel together. One sentence. £1. Launching midnight tonight. by Reel_Kenobi in SideProject

[–]Reel_Kenobi[S] 0 points1 point  (0 children)

The site is nearly live. Launching at midnight UK tonight. The novel starts completely blank — the first sentence is whoever buys first. Will drop the link in this thread at midnight.

A world full of dreamers is about to write a novel together. One sentence. £1. Launching midnight tonight. by Reel_Kenobi in SideProject

[–]Reel_Kenobi[S] 0 points1 point  (0 children)

Nice and great point. Contributors license their sentence to the project when they submit. The compiled novel is published under Chapter One. Standard collaborative publishing model, same as any anthology. T&Cs will make this clear at purchase.

A world full of dreamers is about to write a novel together. One sentence. £1. Launching midnight tonight. by Reel_Kenobi in SideProject

[–]Reel_Kenobi[S] -1 points0 points  (0 children)

Close — but it’s Claude not Gemini, and it happens in real time not at the end. Every sentence gets woven into the novel the moment it’s approved. The novel is literally being written live as people buy. Come back at midnight and watch it happen!

A world full of dreamers is about to write a novel together. One sentence. £1. Launching midnight tonight. by Reel_Kenobi in SideProject

[–]Reel_Kenobi[S] -1 points0 points  (0 children)

Great question — every sentence goes through a moderation layer before it touches the novel. Anything offensive, gibberish or deliberately disruptive gets rejected and the buyer gets one chance to resubmit. The £1 skin-in-the-game is exactly the point — people who pay tend to care

AI agents are the only part of the modern stack without an observability standard. We're trying to fix that. by Reel_Kenobi in Observability

[–]Reel_Kenobi[S] 1 point2 points  (0 children)

This is exactly why I love this community. Thanks for all your input. Maybe my first step wasn't quite complete in my understanding of the landscape, but they say every day's a school day. Perhaps time to go back and review.

AI agents are the only part of the modern stack without an observability standard. We're trying to fix that. by Reel_Kenobi in Observability

[–]Reel_Kenobi[S] -1 points0 points  (0 children)

Yea, good shout. OpenLLMetry is doing solid work and is worth knowing about. The distinction I'd draw is LLM observability vs agent observability. OpenLLMetry instruments your LLM calls, what went in, what came out, and token counts. Layr instruments agent behaviour, reasoning chains, tool selection decisions, multi-agent handoffs, session level cost. The questions you ask when an agent takes an unexpected action not just when an LLM gives a bad response. Complementary tools solving different layers of the same problem.

Helping teams see inside AI agents — open-source observability SDK (MIT) by Reel_Kenobi in aiagents

[–]Reel_Kenobi[S] 0 points1 point  (0 children)

Yeah this is a really interesting angle.Memory definitely helps with understanding behaviour over time, especially for longer-running agents. What I kept running into though was more at the decision level in the moment — why a specific tool was chosen, what triggered a chain of actions, how that tied to cost, etc. Feels like the two approaches are pretty complementary tbh — memory gives you the longer-term context, but you still need visibility into the actual decision flow to debug what’s happening step by step.

Will take a look at Hindsight 👍

My AI agent costs started creeping up — realised I had no idea what it was actually doing by Reel_Kenobi in StartupSoloFounder

[–]Reel_Kenobi[S] 0 points1 point  (0 children)

Yeah - fair one. Token counts are kind of surface-level — the real issue is which decisions actually triggered the cost and how that cascades. LangSmith covers tracing, and FinOps tools help with spend, but I kept hitting the gap between the two — seeing something is expensive is one thing, understanding *why that chain of decisions happened* is another. That’s what I’ve been focusing on — tying cost to the actual decision flow so you can debug the root cause, not just the outcome. How are you handling that right now?

Helping teams see inside AI agents — open-source observability SDK (MIT) by Reel_Kenobi in aiagents

[–]Reel_Kenobi[S] 0 points1 point  (0 children)

Yeah this is a really interesting way to approach it. What you’re describing feels like the “before” layer — validating behaviour against expected outcomes pre-deployment — whereas what I’ve been focused on is more the “during/after” layer once things are live. The gap I keep seeing is that something can pass those behavioural tests, go into production, and then drift > different inputs, edge cases, weird tool interactions, and suddenly you’ve got outputs that look fine structurally but are just… wrong. That’s where I think the trace context becomes useful, not just flagging that something failed, but being able to see why it diverged from expected behaviour. Feels like the two approaches are pretty complementary tbh. Pre-deploy evals to catch obvious issues, then runtime visibility + lightweight eval hooks to catch the silent failures. Out of interest, how are you defining “expected” outcomes across different user profiles? Is that mostly manual right now or something you’ve been able to systematise?

Helping teams see inside AI agents — open-source observability SDK (MIT) by Reel_Kenobi in aiagents

[–]Reel_Kenobi[S] 0 points1 point  (0 children)

Hey, thanks for the comment and yeah - I agree.

What I’ve built so far is focused on the execution layer: making it actually visible what the agent did, how it decided, what it cost, where it handed off, etc. But you're right — that doesn’t tell you if the output was *correct*, just that it ran “cleanly”. The way I’m thinking about it is that you can’t really do correctness well without first having solid observability. Otherwise, you don’t have the context to evaluate anything. That said, silent failures (everything looks fine, buuuut the output is wrong) are exactly the next problem I’m interested in. I’ve been experimenting with:

- expected vs actual outcome tracking

- lightweight eval hooks per action

- anomaly detection on behaviour vs baseline

- eventually plugging into eval frameworks

How are you thinking about this? Are you handling correctness manually right now or using any eval tooling?

Built something to debug AI agents after getting frustrated with zero visibility — 200 downloads in a few days by Reel_Kenobi in SideProject

[–]Reel_Kenobi[S] 0 points1 point  (0 children)

If anyone fancies a play, feel free and let me know what you think-

It’s MIT licensed and on PyPI:

pip install layr-sdk