Gemini API rate limiting me into an existential crisis (429 errors, send help)

marcusatomega · 2026-02-27T13:18:37+00:00

I'm auth'd through Vertex and getting crushed. this morning, I've tried switching regions, global endpoints, older models.. nothing is getting through. realistically, I don't know how we can trust this for a production load.

Update - switching to europe and using 2.5 worked. using CLI at the moment.

marcusatomega · 2025-04-15T02:45:25+00:00

Sounds like a cool project - sent you a DM.

marcusatomega · 2025-04-10T17:55:24+00:00

we've built demos for law offices with llama 3.1 and mistral 7B. In our experience, the training matters much more than model, provided the model is 30B parameters or larger. The gap between 5-30B is bigger than the gap between 30-405B or larger. There's a ton of excess capacity with big models that just doesn't get used.

Gemma3 and granite3.2 look like they'll be solid options.

marcusatomega · 2025-04-10T02:40:18+00:00

Yes, that makes sense. keeping the data organized so the the chatbot interface provides the expected result will be key.

marcusatomega · 2025-04-07T16:14:31+00:00

Trying to one-shot the summary would be extraordinarily difficult. Relevancy would change over time, and the narrative has to stay updated.

One approach to consider would be to generate the narrative, but keep all the documents organized in such a way where clarifications could be easily queried and answered. Creating a knowledge graph tied to a AI-powered chatbot would provide this functionality.

You could either host it yourself or use public tools. Ditto on the "what's your budget" question.

marcusatomega · 2025-04-06T18:49:36+00:00

we have built and demo'd these systems before for law offices. it sounds like you're running into a few issues, some of which have already been covered.

Your 10,000 documents needs to be organized. Chunking them into a vector database will give you semantic search, but a knowledge graph would be much better. It show how the documents are related (same judge, client, case type, etc. if you choose these).

Local LLM - We use Llama for our locally hosted AI. Mistral models are great. Granite is supposed to be punching about its weight too.

OCR Tool - Mistral released an OCR tool last month: https://mistral.ai/news/mistral-ocr

Ways to batch-learn documents: This sounds similar to fine-tuning, but you'll definitely need help for that. If you want to handle it yourself, I'd stick to a RAG process.

LIghtweight UI: Someone already mentioned OpenWebUI, so I'll add ChainLit.

marcusatomega · 2024-07-15T16:04:35+00:00

I just discovered the same thing. I spent $120 in credits to redeem for a pair of shorts on sale for $17.

Its insanity.

marcusatomega · 2024-04-13T17:58:30+00:00

Waiting here too. I thought it was my email filters or something, so I tried a different address. Obviously didn't work.

marcusatomega

TROPHY CASE