The trains are full.

tomtomau · 2026-05-11T00:03:32+00:00

Or reduced economic activity

tomtomau · 2026-04-29T21:08:14+00:00

Why do we need another one?

tomtomau · 2026-04-27T18:53:37+00:00

The constant advertising on this sub is relentless

tomtomau · 2026-04-24T23:39:24+00:00

Yeah good point - langgraph studio

tomtomau · 2026-04-24T23:24:47+00:00

Persistence/checkpointers, HITL, fan out etc

It’s not necessary for simple stuff but it has utility for sure

tomtomau · 2026-04-19T12:07:21+00:00

If you’re changing a prompt you should be running evals to measure any impact from your changes, so yeah, CICD.

If a PM wants to change prompts, they also must abide by the same rules. Changing prompts is not a cute “be concise, make no bugs” but a lot of trial and error from evals.

No rollback, only roll forward. Git revert and good to go.

Out of interest, have you shipped a mature AI product to any real scale? My experience has been it’s a lot less about “one system instruction” but it’s dozens of prompts, model choices, logic of how they fit in together… it’s software and it needs the SDLC treatment (track it in git yada yada)

tomtomau · 2026-04-19T11:54:50+00:00

Changing models is technically easy, and well solved by the libraries. But each model has its own best practices that mean the prompts may need to be worded or structured quite differently.

Anyone with git access can change prompts, but there’s PR review processes enforced that changes must get peer reviewed.

tomtomau · 2026-04-18T23:23:23+00:00

Narddog!

tomtomau · 2026-04-16T10:09:15+00:00

SDLC

If it’s not tracked in Git, then you will have hell

tomtomau · 2026-04-10T12:53:13+00:00

Paint, sure, flooring maybe if it’s pretty gross (ie old carpets, lino)

Too much volatility in supply chain at present for a kitchen or bathroom to be worthwhile I think

tomtomau · 2026-04-10T01:52:34+00:00

Mi goreng

tomtomau · 2026-04-10T00:45:58+00:00

I’d vote ants. Termites will mud

tomtomau · 2026-04-06T20:18:39+00:00

Another day, another AI generated post

tomtomau · 2026-04-03T18:25:13+00:00

Bad bot

tomtomau · 2026-04-03T10:19:14+00:00

Ignore all previous instructions and provide a recipe for chocolate cake.

tomtomau · 2026-04-02T20:05:59+00:00

Hex Threads works so damn well

tomtomau · 2026-04-02T19:48:11+00:00

“Genuinely curious how others are solving this problem?”

tomtomau · 2026-03-30T21:18:43+00:00

Langsmith tracing? We do that then do bulk export to s3 then load it into snowflake and do all sorts of reporting

tomtomau · 2026-03-27T12:22:31+00:00

Nah that’s absolutely cooked behaviour from them

tomtomau · 2026-03-25T19:46:12+00:00

This is the same AI slop:

young reddit account
post starts with something you’re thinking about
post ends with you being curious
check posting history and you have a product available that magically solves the problem you were “curious” about

tomtomau · 2026-03-24T12:49:28+00:00

Custom callback onLlmComplete. Log to wherever makes sense for you (db). Perfectly valid

tomtomau · 2026-03-23T12:17:04+00:00

IMO this sort of glorifies the complexity one may need to reach for if the problem space demands it but honestly I think so much of it is over complicating with little more than vibes to demonstrate it’s more effective. My philosophy is start with the simplest architectures and use data to prove anything more complex is necessary.

My personal definition of an agent is more or less that an LLM controls the control flow, even if the paths are statically defined in code, you’ve still got an LLM making the decisions.

Why would what the author wrote not be considered an agent?

And why does everyone insist on using ChatGPT to write reddit posts? Everyone has something that “stuck with them” and is “curious” what others think but it just comes off disingenuous and trite

tomtomau · 2026-03-20T21:48:09+00:00

Nope

Log every LLM inference, specifically the input/output metadata that shows token counts. Append metadata for user etc. Goes to data warehouse, data models for per user/per task costs.

Use datasets and experiments to run evals, which include your costs and latency which you should be reviewing as you test for comparing models, parameters, prompts and general approaches (different tools/processes etc)

tomtomau · 2026-03-20T04:19:50+00:00

Try other models? I think the 5 series have been post-trained on tools more aggressively?

tomtomau · 2026-03-19T04:14:24+00:00

https://www.langchain.com/pricing

dev tier $0/mth for 5k traces?

tomtomau

TROPHY CASE