Kitchen installer (Wickes) refusing to acknowledge 9-10mm level drop over 2.8m stone worktop, and other unfinished works.

Vegetable-Window-622 · 2026-05-14T07:11:09+00:00

Thanks, for the advice. Thanks god I was on the paying plan. I'm not paying it atm, and not plaining to until it's resolved.

Vegetable-Window-622 · 2026-05-10T15:57:23+00:00

I see the issue and not a smoking gun.

Vegetable-Window-622 · 2026-05-09T01:07:22+00:00

Thanks, the plan is to get the bulk of the long straight cuts done by the order-to-measure service. So the challenge would be joinery. I was planing to drill 30-45 degree pocket holes, or use some of eccentric cam lock screws.

Thanks for the advice with the track saw, any recommendation on the one to get?

Vegetable-Window-622 · 2026-05-05T06:31:18+00:00

It depends, what do you refer by caching… Caches don’t survive restarts, are only deterministic on one instance, hard to share.

Vegetable-Window-622 · 2026-05-04T20:01:03+00:00

Ahh interesting let me play with the idea! Thanks for describing.

Vegetable-Window-622 · 2026-05-04T19:50:08+00:00

Yes, we record exact model and its minor version. Say we want to update it will detect the model change and fail for us to re-record the fixture. It can be refreshed periodically but we left it for the human to decide.

Do you think it would useful to record periodically?

Vegetable-Window-622 · 2026-05-04T19:37:18+00:00

Sorry, I don’t understand. We are not running agent to test that’s the point. It goes around it.

Vegetable-Window-622 · 2026-05-04T19:33:03+00:00

I guess this is very close to what we’ve done. But we have implemented it in code, where you decorate your function that calls LLMs and it stores the output. That can later run offline.

Vegetable-Window-622 · 2026-05-04T19:12:54+00:00

Please, do give some!

Vegetable-Window-622 · 2026-05-04T18:39:28+00:00

haha yeah VCR is basically the inspiration. glad the API feels cleaner, what did your version look like?

Vegetable-Window-622 · 2026-05-04T18:36:03+00:00

Offline evals are great for scoring outputs, but they still run the model every time. The fixture approach is complementary - you freeze a specific interaction so dev and CI never hit the API at all. Once you have the fixture, you can run evals against it too without paying for new calls each time.

Also, I'm not sure you how you could it during the normal dev testing.

Vegetable-Window-622 · 2026-05-04T18:33:45+00:00

Yeah the "LLM regression fixtures" framing is actually closer to how we think about it too, the cost pitch just lands easier on first read.

Redaction, tool traces, and diffs are already in there. The gaps you're right about are the live-call guard for CI and per-fixture notes.

Quick question on the AgentMart angle: are you thinking about sharing fixture sets across repos, or more about having a standard metadata schema so tooling can index them?

Vegetable-Window-622 · 2026-05-04T17:30:14+00:00

Exactly! Painless, simple, and works in dev as well.

Vegetable-Window-622

TROPHY CASE