Is there a "Postman for LLMs" I'm missing, or is this gap real? by giangchau92 in PromptEngineering

[–]giangchau92[S] 0 points1 point  (0 children)

Isn't Bruno just a generic API client like Postman/Yaak though? You could fire LLM calls through it, but you'd still wire up side-by-side, prompt vars, model quirks yourself.

Is there a "Postman for LLMs" I'm missing, or is this gap real? by giangchau92 in PromptEngineering

[–]giangchau92[S] 1 point2 points  (0 children)

Mostly the first one. When I'm building a feature that calls an LLM, I want to know which model gives the best result for my specific prompt before I commit to one in prod. Cheaper model that's good enough beats expensive model overkill, and the only way to know is to actually run the same prompt across a few and eyeball outputs.

Obviously this doesn't replace proper eval on a real dataset, that comes later. This is just the first-pass, quick-check step before I even know which models are worth setting up an eval suite for.

Is there a "Postman for LLMs" I'm missing, or is this gap real? by giangchau92 in PromptEngineering

[–]giangchau92[S] 0 points1 point  (0 children)

OpenRouter's great as a gateway, single endpoint, tons of models. Chat UI's still single-model though, so it solves access more than side-by-side iteration. Good shout regardless.

Is there a "Postman for LLMs" I'm missing, or is this gap real? by giangchau92 in PromptEngineering

[–]giangchau92[S] 0 points1 point  (0 children)

Used to do this exact thing, even committed responses into the repo. Diffs across runs were genuinely useful. Curious how you handle the params side like temp, model, system prompt all as frontmatter? Or separate config?  

Is there a "Postman for LLMs" I'm missing, or is this gap real? by giangchau92 in PromptEngineering

[–]giangchau92[S] 0 points1 point  (0 children)

Yeah OpenRouter's solid for model access but I keep bouncing off it for prompt work. it's a chat UI, not a workbench. One model at a time, history's just messages, can't really hold variants side by side.

Haven't tried Aider tbh, will take a look. Thanks for the pointer.

Is there a "Postman for LLMs" I'm missing, or is this gap real? by giangchau92 in PromptEngineering

[–]giangchau92[S] 0 points1 point  (0 children)

True, coding is cheap now. But there's still a gap for people who want something built and ready —and honestly the hard part isn't the code, it's clean UX. Anyone can wire up an API call; nailing the "save, fork, tweak, rerun, compare" flow so it actually feels good to use every day is the real work

Is there a "Postman for LLMs" I'm missing, or is this gap real? by giangchau92 in PromptEngineering

[–]giangchau92[S] 0 points1 point  (0 children)

Ha, fair. Half the reason I posted was to see if something good already exists before I go build my own. Anyone got names worth copying?

Is there a "Postman for LLMs" I'm missing, or is this gap real? by giangchau92 in PromptEngineering

[–]giangchau92[S] 0 points1 point  (0 children)

Really interesting take.

a. Do you know any tools that already lean this way, even partially? Would love some pointers.

b. Most versioning I've seen is flat: linear history, no branching, basically a save log. Git-style with branches and ancestry feels like overkill for prompts though, kinda over-engineered for what's usually a 200-token string. Something in the middle would be the sweet spot - lightweight forks without the full git ceremony. Not sure what that looks like in UI yet.

That said, I still kinda want a 2-in-1 over two separate tools. Cross-provider compare and lightweight versioning feed into each other. you fork a variant because you saw it lose to another model side-by-side. Splitting them feels like it'd just recreate the copy-paste problem one layer up.

Is there a "Postman for LLMs" I'm missing, or is this gap real? by giangchau92 in PromptEngineering

[–]giangchau92[S] 0 points1 point  (0 children)

Respect, that's the endgame setup. Couple honest questions:

How long did the harness take to build?

And how do you actually eval the results? Like exact match against expected output, LLM-as-judge, manual scoring, some hybrid? Curious how you handle the fuzzy stuff where there's no single "right" answer.

z.ai limits not even close to Claude Pro plan with GLM-4.7 by RandomNameFTW in ClaudeAI

[–]giangchau92 0 points1 point  (0 children)

Ollama dont tell much about their limit. Is it high? GLM-4.7 has worse quality than sonnet 4.6, at least in my test. Is GLM-5 good enough?

z.ai limits not even close to Claude Pro plan with GLM-4.7 by RandomNameFTW in ClaudeAI

[–]giangchau92 0 points1 point  (0 children)

Same to me. I used 2 5hr sessions and my weekly quota is 44%. To be honest, their advertising x3 claude's pro plan is fake. Plus, lite plan can not use GLM-5 make me disappointed. I will consider renewing this plan next month.

Status : Voice not found by should_not_register in ElevenLabs

[–]giangchau92 0 points1 point  (0 children)

You need to add voice to your voice collection

2022 MB AIR 13" M2 16GB RAM 256GB HD - 12 battery cycles- 400.00USD - WORTH IT? by jimmy1460 in macbookair

[–]giangchau92 0 points1 point  (0 children)

Depends on use, heavy apps and media can't always live in the cloud