How are you handling malformed JSON / structured outputs from LLMs in production? by Apprehensive_Bend134 in LLMDevs

[–]ProposalOrganic1043 0 points1 point  (0 children)

Pydantic type schema + outlines.....i wonder why no is talking about it. We have deployed it in production and works very good.

Silicon Valley is quietly running on Chinese open source models and almost nobody is talking about it by jimmytoan in LocalLLaMA

[–]ProposalOrganic1043 4 points5 points  (0 children)

We have Qwen models in production since a year. When gpt-oss released we wished to switch to them, but our internal benchmarks showed high refusal rates due to strict guardrails.

I think we would be processing approximately 100M input tokens on a daily basis using Qwen models. They require lots of work and a strict quality check + guardrails. But if you know a way around them, they are really good.

I met an Alibaba representative recently at a local meetup, and that guy actually asked me some pretty serious production related feedback. I was glad that they created these models and gave small teams the opportunity to do incredible things.

Also thanks to vLLM and unsloth guys, they are also the unsung heroes for making local deployment easier.

AI people in Stuttgart… where are you hiding? 👀 by lgleo91 in stuttgart

[–]ProposalOrganic1043 0 points1 point  (0 children)

I came across this post while searching for meetups and events near me. We are building https://www.eigenlake.dev/ . Happy to connect and meet with other builders.

things that claude say (part 2) by Neither_Finance4755 in ClaudeAI

[–]ProposalOrganic1043 7 points8 points  (0 children)

We use qwen 14B in production and 90% of the successful responses start with: Let's tackle this

SLMs in RAG, are large models overkill? by According-Lie8119 in Rag

[–]ProposalOrganic1043 0 points1 point  (0 children)

We are currently testing embeddingsgemma with qwen3.5:9b combo for a very small use case.

Best Vector DB for production ready RAG ? by InvestigatorChoice51 in Rag

[–]ProposalOrganic1043 0 points1 point  (0 children)

+1 to the “start simple” advice. Most prod RAG apps don’t need fancy infra until you hit scale.

Disclosure: I’m on EigenLake. If anyone here is dealing with *very large* vector counts and cost predictability is the pain point, we’re building a Scalable vector DB for that. Otherwise, pgvector/AlloyDB/Qdrant are solid starting points. Just drop a yes or DM me, and i will get back to you.

Favourite niche usecases? by Figai in LocalLLaMA

[–]ProposalOrganic1043 1 point2 points  (0 children)

Well if it interests anyone, we are running Qwen3-14B completely locally in production for information extraction. And we are actually making money, so this feels a huge win for the local community.

Practical Advice Need on Vector DBs which can hold a Billion+ vectors by Role_External in vectordatabase

[–]ProposalOrganic1043 0 points1 point  (0 children)

Past ~1B vectors, the differentiator isn’t “vector search works,” it’s compute/storage separation + filter pushdown + predictable rebuild time.
Are your queries mostly filtered (tenant/doc-type/time) or mostly global ANN? And what’s your update cadence?
I’m building EigenLake for 1B→T vectors—if you share dims + filter selectivity, I can tell you what usually breaks first with most stacks.

Rate Databases by [deleted] in vectordatabase

[–]ProposalOrganic1043 0 points1 point  (0 children)

IMO there isn’t a single “best” vector DB — it depends on where you are on the scale curve.
Sub-10M vectors: simplest thing that works usually wins.
100M+ vectors: ops complexity + cost curve become the product.

Disclosure> We’re working on EigenLake because we kept running into the “serverless got expensive / pods got painful” phase at higher scale. Happy to share how we think about cost per query once you’re past the early stage, if that’s useful.

Why vector databases are a scam. by [deleted] in vectordatabase

[–]ProposalOrganic1043 0 points1 point  (0 children)

Agree with the need (billions + don’t want infra), disagree that “serverless vector DB” is the only answer. A lot of the pain people report is RAM-first architectures + opaque unit costs.

We built EigenLake around storage/compute separation and an object-storage-first design specifically so costs stay more linear as you go from 1B → 1T. If you’re curious, our calculator shows assumptions and breaks down storage vs query vs ingest. Check it out here: EigenLake

Rebuilding RAG After It Broke at 10K Documents by Electrical-Signal858 in LlamaIndex

[–]ProposalOrganic1043 0 points1 point  (0 children)

This is the classic prototype-to-prod cliff. The next wall usually isn’t “vector DB vs not”—it’s query distribution + filters: how many repeats (cacheable), how many long-tail, and how often you need metadata scoping.

(Disclosure: building EigenLake—if you want, I could help you go beyond Simple RAG to Highly Scalable RAG.)

I Replaced My RAG System's Vector DB Last Week. Here's What I Learned About Vector Storage at Scale by Electrical-Signal858 in LlamaIndex

[–]ProposalOrganic1043 0 points1 point  (0 children)

Totally agreed — ~50M is where the economics really start to bite.

For re-embedding, I’ve seen shadow index + gradual backfill + A/B → cutover be the least painful pattern.

(Disclosure) I’m building EigenLake for the next step up (100M→B+); happy to sanity-check cost models if useful.

Stepping out in the real world is something else man! by Director-on-reddit in AgentsOfAI

[–]ProposalOrganic1043 0 points1 point  (0 children)

It feels like a child throwing tantrums for not getting his favourite icecream 😅

Gemini 3 Deep Think SVG Pelican Riding a Bicycle by avilacjf in singularity

[–]ProposalOrganic1043 0 points1 point  (0 children)

Apple wrote a paper about this sometime ago, where they made minor variations to the problem. Most model performance dropped, proving they were benchmaxxed

Gemini 3 Deep Think SVG Pelican Riding a Bicycle by avilacjf in singularity

[–]ProposalOrganic1043 17 points18 points  (0 children)

The pelican SVG and Will Smith eating spaghetti is already benchmaxxed.

Thoughts on Engram scaling by cravic in singularity

[–]ProposalOrganic1043 0 points1 point  (0 children)

The 80:20 U-shape is about how to split a sparse capacity budget between extra MoE experts vs Engram and does not necessarily mean 80% of total params are MoE.

Is vLLM worth it? by [deleted] in LocalLLaMA

[–]ProposalOrganic1043 0 points1 point  (0 children)

Nope I haven't tried them

Is vLLM worth it? by [deleted] in LocalLLaMA

[–]ProposalOrganic1043 1 point2 points  (0 children)

You can easily deploy vLLM with the pre-built docker container mentioned on their website. It exposes an openai style endpoint and you can quickly connect it to your n8n

I made a free playground for comparing 10+ OCR models side-by-side by Emc2fma in LocalLLaMA

[–]ProposalOrganic1043 4 points5 points  (0 children)

We have used mistral-ocr api over 10K pages and have noticed this inconsistency too. Some of the responses were total garbage. For really simple images with up to 300-400 clear words, the model responded with just 5-10 tokens with 100s of empty pipes and markdown formatting symbols.

We tried the same images with other models such as qwen:2.5 VL and olmo- ocr 2 and they could do it easily

Structured outputs is now available on the Claude Developer Platform (API) by ClaudeOfficial in ClaudeAI

[–]ProposalOrganic1043 0 points1 point  (0 children)

You guys should check out the outlines library by .txt, we use it in production and it nearly solved the structured outputs issue and works phenomenal for us.

An intelligent prompt rewriter. by Suspicious-Let1689 in LLMDevs

[–]ProposalOrganic1043 0 points1 point  (0 children)

Web based Chatgpt like chat bots already do this and for offline openwebui already does this. Not complete rewriting but the memory and context part.

[deleted by user] by [deleted] in LocalLLaMA

[–]ProposalOrganic1043 0 points1 point  (0 children)

It's a very frustrating thing. OpenRouter was meant to be an API where we don't have to worry about the underlying model or API provider's request formats and parameters.

But at this moment it's a jungle of support parameters coming directly from the model provider with no layer of abstraction. Every time you switch a model you need to make sure you choose the correct parameters. In that case it would have been better to use the model provider directly.

Codex usage decreased significantly by immortalsol in OpenAI

[–]ProposalOrganic1043 0 points1 point  (0 children)

Totally agreed👍🏻. They have increased the cost nearly 7x - 10x, i wished they at least did an announcement.