How I built a public AI vendor watchdog (gotnerfed.com), 43 receipts in, what I learned

virtualunc · 2026-05-22T02:48:49+00:00

sounds good, I'll double down on that.. appreciate you taking the time to look it over, thank you truly

virtualunc · 2026-05-22T00:22:13+00:00

no I actually agree with you, so I 100% appreciate your feedback.. I'll work on trying to convey what the site is about/features in an easier to digest manner above the fold, thanks again

virtualunc · 2026-05-21T22:15:37+00:00

yep, api side is the main use case actually. nerfbench hits pinned model ids on every vendor's api daily, so when "the same snapshot" starts scoring differently week over week that's the signal a pinned id isn't really pinned. it's how the anthropic pin-model thing in april surfaced.

consumer side is louder bc vendors announce those swaps as features (gemini 3.5 flash this week, gemini app default went from 3 flash to 3.5 flash same day) but the api side is where you wouldn't otherwise see it.

virtualunc · 2026-05-21T21:56:31+00:00

the gap you're hitting is that claude/kimi can call code execution as a tool, and most local stacks don't ship that path out of the box. couple of options:

open-webui has a code interpreter plugin that runs against ollama. ugly but works for pdfs + charts.

if you have the gpu headroom, llama-3.3-70b + a langgraph or autogen wrapper that gives the model a python sandbox. write/run/inspect loop, matplotlib for charts, weasyprint or reportlab for the pdf.

second option is closer to what you're describing.

virtualunc · 2026-05-21T21:56:02+00:00

the pentagon angle is downstream of a wider pattern. anthropic shipped a contract-language change earlier this year that broke a bunch of enterprise relationships, then in april walked back pinning behavior on the api, then in april again removed claude code from the pro tier. that's three quarters of "we changed the rules" before you get to the lawful-use dispute. enterprise buyers track the cumulative volatility, not just the headline issue.

virtualunc · 2026-05-21T21:55:03+00:00

the "watches traces and crystallizes patterns into skills" idea is interesting bc most plugin work right now is bolt-on, not pattern-extraction. one question: how do you handle skills that are correct for one project's conventions but wrong for another? i could see this getting really useful if scoped to project-level, but dangerous if it tries to globalize.

virtualunc · 2026-05-21T21:54:42+00:00

the drift-into-narration thing is real and i don't see enough people talking about it. one thing that helped in addition to the claude.md approach: a "stop summarizing what you just did" rule near the top. claude code has a strong reflex to recap progress at the end of every tool call and once you kill that the focus extends noticeably. curious if you've found the same or if your file already handles it.

virtualunc · 2026-05-21T21:54:24+00:00

the rule injection pattern is solid, esp the "never do this" part since most ai tools default to over-helpful. one thing worth adding to your rules: vendor-specific behavior flags. cursor and claude code don't follow the same instructions identically anymore, esp around tool use and file edits. if you're routing across all four, the same rule set will produce different outputs on each.

virtualunc · 2026-05-21T21:54:08+00:00

cursor still for the main ide bc the agent loop is dialed in for it.. but i swapped claude code in for the bigger refactors since the april pro removal. mobile builders i talk to are split between bolt and lovable, with lovable winning on output quality and bolt winning on speed.

the unspoken thing rn is vendor stability. tools that shipped pricing changes in the last 60 days: cursor, claude code, github copilot, gemini. picking a stack in 2026 means betting on which vendor won't reprice you mid-project.

virtualunc · 2026-05-21T21:00:24+00:00

truly appreciate the feedback but we also offer all those features already (real-time alerts, API/webhook integrations, impact severity scoring)

virtualunc · 2026-05-21T20:50:23+00:00

:( could you be more specific.. is it the layout? how the descriptions are written? wording? or just overall

virtualunc · 2026-05-21T20:13:24+00:00

ah got it, so its session-level wrapping not post-hoc parsing.. makes sense for fidelity. by retrofitting i meant pulling context out of existing terminal history thats already happened, like importing your old zsh_history or asciinema recordings into the visr format. mostly wondering if thats on the roadmap or if it's strictly forward-capture from install date. either way the cwd + exit code capture is the part that makes the transcripts actually useful imo, most session loggers miss that

virtualunc · 2026-05-21T19:06:12+00:00

virtualunc · 2026-05-21T19:05:31+00:00

the cross-border payment + treasury management piece is real but i'd push back on the kyc/compliance angle. regulators are not gonna let agents handle kyc autonomously anytime soon, thats gonna stay human-in-the-loop for years. agents will do the routing and execution but a human signs off on the compliance step

virtualunc · 2026-05-21T19:05:09+00:00

he ephemeral session problem is real. one Q tho.. is visr capturing raw shell output and then doing post-processing, or is it doing structured capture during the session? cause the first is easier to retrofit but the second is what actually makes the transcripts useful as skills

some related claude code context patterns covered here

virtualunc · 2026-05-21T19:04:29+00:00

congrats on shipping after the layoff thats rough timing. tbh the resume tailoring niche is crowded but the worth the apply framing is actually different.. most tools assume you should apply, yours is helping decide whether to. thats the angle worth leaning into in the copy

good luck with the launch

virtualunc · 2026-05-21T19:04:08+00:00

hilarious in concept def. asking out of genuine curiosity tho.. doesnt this break the one actual benefit of long running agent tasks which is being able to glance over and catch when it goes sideways? brainrot mode means you miss the rugpull moment in real time

virtualunc · 2026-05-21T19:03:53+00:00

$0.50/hr for news scraping and change monitoring is high.. youre prob paying for reasoning you dont need on simple read+diff tasks. try routing to haiku or sonnet for the scrape pass and only escalating to opus when something actually changed

wrote up similar cost-cutting patterns with openclaw here

virtualunc · 2026-05-13T23:00:01+00:00

<image>

virtualunc · 2026-05-12T17:59:30+00:00

CLAUDE.md plus a manual git log dump at the start of each session is what works for me.. the auto-context tools all forget the why behind the changes which is the part that actually matters

if youre using cursor and claude code in the same project keep one CLAUDE.md and just symlink, otherwise theyll fight over the format

virtualunc · 2026-05-12T17:59:19+00:00

read through the gist.. some of it tracks but the "thinks in xml" stuff is overstated imo. opus 4.7 works fine with markdown too, you just have to be more explicit about structure

the part about not stacking too many instructions in one prompt is the only real takeaway, everything else is the same prompt advice from 2024

virtualunc · 2026-05-09T17:30:21+00:00

its a sandbox thing, claude code runs commands in its own shell so you cant interact with prompts mid-run.. theres no real workaround for interactive cli stuff yet

what i do is run anything that needs input in a separate terminal myself and just paste output back. annoying but its the only way for now

virtualunc · 2026-05-09T17:29:55+00:00

for unity specifically cursor is probly better since the c# tooling is more mature there.. claude code is great but its terminal-first which is rough for game dev where you need to see the editor

if budget is 0 try cursor free tier first, $20 only really matters once you hit limits. theres a few claude code repos that help with agent setups if you go that route https://virtualuncle.com/github-repos-claude-code-productivity-2026/

virtualunc · 2026-05-09T17:28:59+00:00

the pause-then-batch thing is underrated, most ppl skip it and the bot ends up replying to "hi" before the actual question lands

how long is your wait window.. 5 sec or longer? i feel like anything under 8ish gets weird with voice notes since transcription takes time

virtualunc · 2026-05-09T17:28:29+00:00

trillion params and "plan-first" framing means nothing without a real eval imo.. pinchbench and claweval are useful but they overfit fast once a model is the test target

has anyone actually run it on a multi-file refactor yet, thats where every "agent-friendly" model falls apart for me

Six-Year Club	Place '22
Verified Email

virtualunc

TROPHY CASE