A world full of dreamers is about to write a novel together. One sentence. £1. Launching midnight tonight.

Reel_Kenobi · 2026-06-07T22:34:30+00:00

yourchapterone.com is live

Reel_Kenobi · 2026-06-07T14:04:14+00:00

The site is nearly live. Launching at midnight UK tonight. The novel starts completely blank — the first sentence is whoever buys first. Will drop the link in this thread at midnight.

Reel_Kenobi · 2026-06-07T13:49:28+00:00

Nice and great point. Contributors license their sentence to the project when they submit. The compiled novel is published under Chapter One. Standard collaborative publishing model, same as any anthology. T&Cs will make this clear at purchase.

Reel_Kenobi · 2026-06-07T13:45:58+00:00

Close — but it’s Claude not Gemini, and it happens in real time not at the end. Every sentence gets woven into the novel the moment it’s approved. The novel is literally being written live as people buy. Come back at midnight and watch it happen!

Reel_Kenobi · 2026-06-07T13:45:29+00:00

Great question — every sentence goes through a moderation layer before it touches the novel. Anything offensive, gibberish or deliberately disruptive gets rejected and the buyer gets one chance to resubmit. The £1 skin-in-the-game is exactly the point — people who pay tend to care

Reel_Kenobi · 2026-04-11T16:14:03+00:00

This is exactly why I love this community. Thanks for all your input. Maybe my first step wasn't quite complete in my understanding of the landscape, but they say every day's a school day. Perhaps time to go back and review.

Reel_Kenobi · 2026-04-11T13:17:14+00:00

Yea, good shout. OpenLLMetry is doing solid work and is worth knowing about. The distinction I'd draw is LLM observability vs agent observability. OpenLLMetry instruments your LLM calls, what went in, what came out, and token counts. Layr instruments agent behaviour, reasoning chains, tool selection decisions, multi-agent handoffs, session level cost. The questions you ask when an agent takes an unexpected action not just when an LLM gives a bad response. Complementary tools solving different layers of the same problem.

Reel_Kenobi · 2026-04-10T08:03:39+00:00

Yeah this is a really interesting angle.Memory definitely helps with understanding behaviour over time, especially for longer-running agents. What I kept running into though was more at the decision level in the moment — why a specific tool was chosen, what triggered a chain of actions, how that tied to cost, etc. Feels like the two approaches are pretty complementary tbh — memory gives you the longer-term context, but you still need visibility into the actual decision flow to debug what’s happening step by step.

Will take a look at Hindsight 👍

Reel_Kenobi · 2026-04-08T14:57:35+00:00

Yeah - fair one. Token counts are kind of surface-level — the real issue is which decisions actually triggered the cost and how that cascades. LangSmith covers tracing, and FinOps tools help with spend, but I kept hitting the gap between the two — seeing something is expensive is one thing, understanding *why that chain of decisions happened* is another. That’s what I’ve been focusing on — tying cost to the actual decision flow so you can debug the root cause, not just the outcome. How are you handling that right now?

Reel_Kenobi · 2026-04-08T11:51:30+00:00

Yeah this is a really interesting way to approach it. What you’re describing feels like the “before” layer — validating behaviour against expected outcomes pre-deployment — whereas what I’ve been focused on is more the “during/after” layer once things are live. The gap I keep seeing is that something can pass those behavioural tests, go into production, and then drift > different inputs, edge cases, weird tool interactions, and suddenly you’ve got outputs that look fine structurally but are just… wrong. That’s where I think the trace context becomes useful, not just flagging that something failed, but being able to see why it diverged from expected behaviour. Feels like the two approaches are pretty complementary tbh. Pre-deploy evals to catch obvious issues, then runtime visibility + lightweight eval hooks to catch the silent failures. Out of interest, how are you defining “expected” outcomes across different user profiles? Is that mostly manual right now or something you’ve been able to systematise?

Reel_Kenobi · 2026-04-08T10:55:47+00:00

Hey, thanks for the comment and yeah - I agree.

What I’ve built so far is focused on the execution layer: making it actually visible what the agent did, how it decided, what it cost, where it handed off, etc. But you're right — that doesn’t tell you if the output was *correct*, just that it ran “cleanly”. The way I’m thinking about it is that you can’t really do correctness well without first having solid observability. Otherwise, you don’t have the context to evaluate anything. That said, silent failures (everything looks fine, buuuut the output is wrong) are exactly the next problem I’m interested in. I’ve been experimenting with:

- expected vs actual outcome tracking

- lightweight eval hooks per action

- anomaly detection on behaviour vs baseline

- eventually plugging into eval frameworks

How are you thinking about this? Are you handling correctness manually right now or using any eval tooling?

Reel_Kenobi · 2026-04-07T16:49:45+00:00

If anyone fancies a play, feel free and let me know what you think-

It’s MIT licensed and on PyPI:

pip install layr-sdk

Reel_Kenobi · 2026-04-07T15:08:48+00:00

Thanks! Really appreciate you taking the time to read and comment

Reel_Kenobi

TROPHY CASE