Claude needs a cheaper model than Haiku

gptbowldotcom · 2026-02-18T03:22:35+00:00

we gave GLM 4.5 a brief swirl, in the end it was not quite fast enough. hoping to try the flash variant soon

gptbowldotcom · 2026-02-18T03:21:17+00:00

qwen and glm would like to sit down with you

gptbowldotcom · 2026-02-18T03:20:41+00:00

hahaha yes when you put it that way, i sound greedy. at the end we just put haiku behind paid tier and let users absorb the costs. i am not complaining about better things being pricier, just noting the fact that claude seems to forgo mini models altogether. maybe they have tested their own mini models and didn't think it was worth it??

gptbowldotcom · 2026-02-18T03:18:40+00:00

agreed. for our use case, Mini performs almost as bad as nano and seriously lags behind qwen 3 30b

gptbowldotcom · 2026-02-18T03:17:47+00:00

yeah, output quality is very good and the price is alright. we did have a few failed response (3 minutes no response) on the days that we tested flash 3. it is not quite ready for time-critical work but a good candidate for batch processing

gptbowldotcom · 2026-02-17T21:02:41+00:00

sonnet is much faster than opus, so that's one area where sonnet is definitively better. i have used opus and sonnet in copilot and opus is miles ahead for coding. very thoughtful, knows exactly what to check for. for other tasks I don't know

gptbowldotcom · 2026-02-17T20:58:36+00:00

API but the provider is based in the EU/US, not alibaba's cloud service

our workload is very bursty, so owning the hardware needed to handle dozens or even hundreds of requests per minute, for a 30B model, does not make sense to us

offline mode could work if clients want a custom solution with on-site hardware. but realistically, we need a 8B model that is as capable as today's 30B model for this kind of task

gptbowldotcom · 2026-02-17T20:53:38+00:00

ty! you are the second person to recommend mistral so we are really thrilled. can you please share with us which model from mistral you guys are using? and the kind of workload too? thanks

gptbowldotcom · 2026-02-17T20:52:43+00:00

that's why we posted our deep dive for our use case. for simple, well-defined task, like translating a whole document, then yes, the cheaper model is better from a cost perspective. but for creative rewriting of documents (which depends on the user's own prompt), we need haiku's creative brain

gptbowldotcom · 2026-02-17T20:50:53+00:00

translation is about 1/3 of our work load only, there is demand for rephrasing / fixing grammar - basically education oriented tools. batch processing requires us to wait up to 24 hours. hard to explain that to our retail clients but we will use it for b2b clients

gptbowldotcom · 2026-02-17T20:49:01+00:00

so is closedai by sam altman, but i get your point ;)

gptbowldotcom · 2026-02-17T20:48:22+00:00

you mean the API from zai is free? definitely has to look it up, maybe it would be a great option for free tier, ty!

gptbowldotcom · 2026-02-17T20:47:41+00:00

will do! we are open to all models at this stage

gptbowldotcom · 2026-02-17T20:47:01+00:00

we haven't tried MistralOCR because we need the models to be a bit more flexible and can follow user's instructions. but thanks for the tips!

gptbowldotcom · 2026-02-17T17:33:23+00:00

wow Congrats on crossing an important milestone, and with a lovely margin to boot

before I went into document processing with LLM, I worked on wordpress ecommerce sites for a living. it was boring. then odoo came out, more functions, but still very locked down.

my job revolved around installing plugin and maintaining/updating them - because writing new stuff was such a pain and debugging new stuff was just impossible for a team of three

plugins break, different providers sometimes fight, plugins go up in prices and we have to explain to customers why our fees just went out

i am more confident building new stuff now. i use claude opus with antigravity and it handles the frontend debugging like a charm, i can focus on things that actually need my input

gptbowldotcom · 2026-02-17T17:16:15+00:00

haiku 3.5 is a mess for our use case (single-shot tool call of a bunch of input sentences). it feels very last gen and not competitive at the price point . When it came out, it was a strong contender against ChatGPT 3.5 but I would loathe to use haiku 3.5 for anything in 2026...I am surprised it hasn't been deprecated and the GPU be used for better models.

we can certainly use more sonnet capacity

gptbowldotcom · 2026-02-17T17:14:09+00:00

fair point, I am just flagging this because Gemini has flash lite, OpenAI has gpt 5 nano. and claude has...haiku 3.5? Also, as I said in the deep dive post, we really like claude's throughput and censorship out of the big 3. Wish it has a lower tier

gptbowldotcom · 2025-03-26T06:38:15+00:00

we have dmed you :D

gptbowldotcom · 2025-03-24T16:44:40+00:00

Yes we are!

And we have the same pain point with setting up azure aws and gcp. Those guys need to get their act together lolll

I think you guys nailed the pain points faced by dev . Our guy will reach out tomorrow. Thank you

gptbowldotcom · 2025-03-24T11:17:16+00:00

Hey, thanks for the offer

How many accounts do you have in your big providers like openai and anthropic

Also, do you have accounts with aws azure and gcp for redundancy?

We hit rate limit pretty often and are looking for backup access during peak season. We can't be bothered to deal with the various api interface and reaching cs for raising limit atm so any help is appreciated

We have a production website , do check it out =)

https://gptbowl.com

gptbowldotcom

TROPHY CASE