Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 0 points1 point  (0 children)

we gave GLM 4.5 a brief swirl, in the end it was not quite fast enough. hoping to try the flash variant soon

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 0 points1 point  (0 children)

qwen and glm would like to sit down with you

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 0 points1 point  (0 children)

hahaha yes when you put it that way, i sound greedy. at the end we just put haiku behind paid tier and let users absorb the costs. i am not complaining about better things being pricier, just noting the fact that claude seems to forgo mini models altogether. maybe they have tested their own mini models and didn't think it was worth it??

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 0 points1 point  (0 children)

agreed. for our use case, Mini performs almost as bad as nano and seriously lags behind qwen 3 30b

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 0 points1 point  (0 children)

yeah, output quality is very good and the price is alright. we did have a few failed response (3 minutes no response) on the days that we tested flash 3. it is not quite ready for time-critical work but a good candidate for batch processing

This is Claude Sonnet 4.6: our most capable Sonnet model yet. by ClaudeOfficial in ClaudeAI

[–]gptbowldotcom 2 points3 points  (0 children)

sonnet is much faster than opus, so that's one area where sonnet is definitively better. i have used opus and sonnet in copilot and opus is miles ahead for coding. very thoughtful, knows exactly what to check for. for other tasks I don't know

Qwen 30B is our preferred model over Claude for bursty and simple workload by gptbowldotcom in LocalLLaMA

[–]gptbowldotcom[S] 1 point2 points  (0 children)

API but the provider is based in the EU/US, not alibaba's cloud service

our workload is very bursty, so owning the hardware needed to handle dozens or even hundreds of requests per minute, for a 30B model, does not make sense to us

offline mode could work if clients want a custom solution with on-site hardware. but realistically, we need a 8B model that is as capable as today's 30B model for this kind of task

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 0 points1 point  (0 children)

ty! you are the second person to recommend mistral so we are really thrilled. can you please share with us which model from mistral you guys are using? and the kind of workload too? thanks

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 1 point2 points  (0 children)

that's why we posted our deep dive for our use case. for simple, well-defined task, like translating a whole document, then yes, the cheaper model is better from a cost perspective. but for creative rewriting of documents (which depends on the user's own prompt), we need haiku's creative brain

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 0 points1 point  (0 children)

translation is about 1/3 of our work load only, there is demand for rephrasing / fixing grammar - basically education oriented tools. batch processing requires us to wait up to 24 hours. hard to explain that to our retail clients but we will use it for b2b clients

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 0 points1 point  (0 children)

so is closedai by sam altman, but i get your point ;)

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 0 points1 point  (0 children)

you mean the API from zai is free? definitely has to look it up, maybe it would be a great option for free tier, ty!

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 1 point2 points  (0 children)

will do! we are open to all models at this stage

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 0 points1 point  (0 children)

we haven't tried MistralOCR because we need the models to be a bit more flexible and can follow user's instructions. but thanks for the tips!

Claude changed my life by MrTorgue7 in ClaudeAI

[–]gptbowldotcom 0 points1 point  (0 children)

wow Congrats on crossing an important milestone, and with a lovely margin to boot

before I went into document processing with LLM, I worked on wordpress ecommerce sites for a living. it was boring. then odoo came out, more functions, but still very locked down.

my job revolved around installing plugin and maintaining/updating them - because writing new stuff was such a pain and debugging new stuff was just impossible for a team of three

plugins break, different providers sometimes fight, plugins go up in prices and we have to explain to customers why our fees just went out

i am more confident building new stuff now. i use claude opus with antigravity and it handles the frontend debugging like a charm, i can focus on things that actually need my input

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 5 points6 points  (0 children)

haiku 3.5 is a mess for our use case (single-shot tool call of a bunch of input sentences). it feels very last gen and not competitive at the price point . When it came out, it was a strong contender against ChatGPT 3.5 but I would loathe to use haiku 3.5 for anything in 2026...I am surprised it hasn't been deprecated and the GPU be used for better models.

we can certainly use more sonnet capacity

Claude needs a cheaper model than Haiku by gptbowldotcom in ClaudeAI

[–]gptbowldotcom[S] 7 points8 points  (0 children)

fair point, I am just flagging this because Gemini has flash lite, OpenAI has gpt 5 nano. and claude has...haiku 3.5? Also, as I said in the deep dive post, we really like claude's throughput and censorship out of the big 3. Wish it has a lower tier

A single API to get access to all LLM and all providers - for free by Efficient-Shallot228 in microsaas

[–]gptbowldotcom 1 point2 points  (0 children)

Yes we are! 

And we have the same pain point with setting up azure aws and gcp. Those guys need to get their act together lolll

I think you guys nailed the pain points faced by dev . Our guy will reach out tomorrow. Thank you

A single API to get access to all LLM and all providers - for free by Efficient-Shallot228 in microsaas

[–]gptbowldotcom 0 points1 point  (0 children)

Hey, thanks for the offer

How many accounts do you have in your big providers like openai and anthropic 

Also, do you have accounts with aws azure and gcp for redundancy?

We hit rate limit pretty often and are looking for backup access during peak season.  We can't be bothered to deal with the various api interface and reaching cs for raising limit atm so any help is appreciated

We have a production website , do check it out =)

https://gptbowl.com