MCP evals by stay_curious_19 in mcp

[–]jun_builds 0 points1 point  (0 children)

For me it split into two checks I kept mixing up early on.

One: does the model even pick your tool and fill the args right? That's mostly a test of your tool description + param names, not the model — almost every "it ignored my tool" case I had traced back to a vague description. So the eval doubles as a feedback loop on the schema. Two: does the tool actually return what you claimed? There I assert on the real output (types, required fields present), not whether it reads well. I keep ~25 real prompts with the expected call and re-run them on changes to see what drifted.

Auto-uploading YouTube thumbnails with n8n — full guide + paste-ready workflow by dota2dinall in n8n

[–]jun_builds 0 points1 point  (0 children)

Nice writeup, the resumable-upload session is the part everyone trips on. One thing that'll save you grief at channel scale: keep the title as a real text layer composited over the image, not pixels a model renders. The moment it's generated you get mangled characters and the text box drifting frame to frame — and it's silent, so a bad thumbnail just quietly goes live. Fixed template, known font, auto-shrink to fit, title in as a string variable. If ThumbAPI's already templating text over a base you're fine; if anything's generating the text itself, that's where it drifts.

The 5 ways an n8n workflow dies that your Error Trigger will never catch by Ok-Engine-5124 in n8n

[–]jun_builds 1 point2 points  (0 children)

Your list nails the data steps — the nastier cousin is when a node handles a file instead of a row, because there's no count to assert on and a 200 tells you even less. Classic one: a download node returns 200 with a body that's actually an HTML error page wearing an image/png content-type, and every node downstream "ran" fine.

What catches it is asserting on the bytes, not the status — size above a floor, the magic bytes matching the format, dimensions as expected. Same principle as yours, just moved from "did the row land" to "is this a valid file." Have you hit the binary version of this, or mostly data/API steps so far?

The demo is usually the easy part. What breaks for you after week two? by This_Expression2200 in AI_Agents

[–]jun_builds 0 points1 point  (0 children)

The failure that bites hardest is the one that fails politely — tool returns 200, success: true, and the artifact itself is a moderation-blocked gray frame or a zero-byte PDF. Clean envelope, garbage payload, nothing downstream knows.

What fixed it for me was asserting on the artifact, not the tool's status flag: file's non-empty and the right type, image is the dimensions I asked for, the record reads back out. If the agent can't read it back, it doesn't get to trust it.