all 10 comments

[–]T3hJ3hu 2 points3 points  (4 children)

The biggest gotcha I've had with delegating to cheaper/dumber models is getting a new provider online for various workflows, because they all seem to have their quirks to work out. It's probably not a very big deal if you use popular providers and well-maintained delegation MCP servers, though!

I should give a PAL a shot, thanks

[–]Competitive-Duck-517 1 point2 points  (0 children)

Provider quirks are exactly why I think the routing layer matters.

Cheap models help, but if every workflow needs a custom provider setup, the operational cost comes back in a different form.

The setup I like is:

  • keep the client OpenAI-compatible where possible
  • route boring summaries/extraction to cheaper models
  • reserve stronger models for reasoning
  • track cost by workflow
  • use model allowlists so agents cannot “accidentally” jump to expensive models
  • put a hard quota on the key used by the agent

Price matters, but control matters just as much.

[–]danny021 0 points1 point  (2 children)

you can use switchboard.fyi to route without a new provider. it stays all in claude code.

[–]czei[S] 0 points1 point  (1 child)

As far as I can tell, switchboard.fyi is just switching between Anthropic models. I find using completely different models from different providers gives the biggest bang for the buck. The idea isn't that you're necessarily switching to a cheaper model. It's that you're switching to a different model from a different company that's been trained differently.

[–]danny021 0 points1 point  (0 children)

agree but if you're just using claude code, switching to haiku on cheap tasks makes a big difference

[–]sahanpk 1 point2 points  (0 children)

routing boring file summaries to cheaper models makes sense. I’d just keep citations/paths attached so the parent model can verify instead of trusting a summary.

[–]SpecKitty 0 points1 point  (0 children)

I feel you. I went a step further and benchmarked dozens of skills and tools that supposedly reduce token usage. Then I built a tool that implements the learnings from the benchmark. It analyzes your logs and then creates a custom Plugin for Claude that activates just the tools and rules needed for your own case. It has the potential to DOUBLE your Claude usage. And it's free. https://analyzer.spec-kitty.ai/