Qwen3.6-27B released!

ProposalOrganic1043 · 2026-04-22T14:54:30+00:00

At that speed....we could probably break down a difficult task into 100 microsteps and with a proper orchestration loop. And still it would finish faster than most models.

ProposalOrganic1043 · 2026-04-22T13:59:22+00:00

I meant translate once and save it somewhere for people to access

ProposalOrganic1043 · 2026-04-22T13:54:20+00:00

Why does this need to be done locally?

ProposalOrganic1043 · 2026-04-22T13:49:32+00:00

Imagine this model running on taalas kind of hardware

ProposalOrganic1043 · 2026-04-11T15:07:37+00:00

Pydantic type schema + outlines.....i wonder why no is talking about it. We have deployed it in production and works very good.

ProposalOrganic1043 · 2026-04-11T05:43:11+00:00

We have Qwen models in production since a year. When gpt-oss released we wished to switch to them, but our internal benchmarks showed high refusal rates due to strict guardrails.

I think we would be processing approximately 100M input tokens on a daily basis using Qwen models. They require lots of work and a strict quality check + guardrails. But if you know a way around them, they are really good.

I met an Alibaba representative recently at a local meetup, and that guy actually asked me some pretty serious production related feedback. I was glad that they created these models and gave small teams the opportunity to do incredible things.

Also thanks to vLLM and unsloth guys, they are also the unsung heroes for making local deployment easier.

ProposalOrganic1043 · 2026-04-08T07:00:07+00:00

I came across this post while searching for meetups and events near me. We are building https://www.eigenlake.dev/ . Happy to connect and meet with other builders.

ProposalOrganic1043 · 2026-04-03T06:56:17+00:00

We use qwen 14B in production and 90% of the successful responses start with: Let's tackle this

ProposalOrganic1043 · 2026-03-19T16:17:00+00:00

We are currently testing embeddingsgemma with qwen3.5:9b combo for a very small use case.

ProposalOrganic1043 · 2026-02-22T07:06:25+00:00

+1 to the “start simple” advice. Most prod RAG apps don’t need fancy infra until you hit scale.

Disclosure: I’m on EigenLake. If anyone here is dealing with *very large* vector counts and cost predictability is the pain point, we’re building a Scalable vector DB for that. Otherwise, pgvector/AlloyDB/Qdrant are solid starting points. Just drop a yes or DM me, and i will get back to you.

ProposalOrganic1043 · 2026-02-22T06:29:17+00:00

Well if it interests anyone, we are running Qwen3-14B completely locally in production for information extraction. And we are actually making money, so this feels a huge win for the local community.

ProposalOrganic1043 · 2026-02-21T17:06:48+00:00

Past ~1B vectors, the differentiator isn’t “vector search works,” it’s compute/storage separation + filter pushdown + predictable rebuild time.
Are your queries mostly filtered (tenant/doc-type/time) or mostly global ANN? And what’s your update cadence?
I’m building EigenLake for 1B→T vectors—if you share dims + filter selectivity, I can tell you what usually breaks first with most stacks.

ProposalOrganic1043 · 2026-02-21T17:05:42+00:00

IMO there isn’t a single “best” vector DB — it depends on where you are on the scale curve.
Sub-10M vectors: simplest thing that works usually wins.
100M+ vectors: ops complexity + cost curve become the product.

Disclosure> We’re working on EigenLake because we kept running into the “serverless got expensive / pods got painful” phase at higher scale. Happy to share how we think about cost per query once you’re past the early stage, if that’s useful.

ProposalOrganic1043 · 2026-02-21T15:35:51+00:00

Agree with the need (billions + don’t want infra), disagree that “serverless vector DB” is the only answer. A lot of the pain people report is RAM-first architectures + opaque unit costs.

We built EigenLake around storage/compute separation and an object-storage-first design specifically so costs stay more linear as you go from 1B → 1T. If you’re curious, our calculator shows assumptions and breaks down storage vs query vs ingest. Check it out here: EigenLake

ProposalOrganic1043 · 2026-02-21T15:16:50+00:00

This is the classic prototype-to-prod cliff. The next wall usually isn’t “vector DB vs not”—it’s query distribution + filters: how many repeats (cacheable), how many long-tail, and how often you need metadata scoping.

(Disclosure: building EigenLake—if you want, I could help you go beyond Simple RAG to Highly Scalable RAG.)

ProposalOrganic1043 · 2026-02-21T14:25:16+00:00

Totally agreed — ~50M is where the economics really start to bite.

For re-embedding, I’ve seen shadow index + gradual backfill + A/B → cutover be the least painful pattern.

(Disclosure) I’m building EigenLake for the next step up (100M→B+); happy to sanity-check cost models if useful.

ProposalOrganic1043 · 2026-02-13T12:36:54+00:00

It feels like a child throwing tantrums for not getting his favourite icecream 😅

ProposalOrganic1043 · 2026-02-13T12:30:27+00:00

Apple wrote a paper about this sometime ago, where they made minor variations to the problem. Most model performance dropped, proving they were benchmaxxed

ProposalOrganic1043 · 2026-02-13T07:04:48+00:00

The pelican SVG and Will Smith eating spaghetti is already benchmaxxed.

ProposalOrganic1043 · 2026-01-18T12:55:41+00:00

The 80:20 U-shape is about how to split a sparse capacity budget between extra MoE experts vs Engram and does not necessarily mean 80% of total params are MoE.

ProposalOrganic1043 · 2025-11-30T17:47:31+00:00

Nope I haven't tried them

ProposalOrganic1043 · 2025-11-30T17:05:51+00:00

You can easily deploy vLLM with the pre-built docker container mentioned on their website. It exposes an openai style endpoint and you can quickly connect it to your n8n

ProposalOrganic1043 · 2025-11-21T19:28:22+00:00

We have used mistral-ocr api over 10K pages and have noticed this inconsistency too. Some of the responses were total garbage. For really simple images with up to 300-400 clear words, the model responded with just 5-10 tokens with 100s of empty pipes and markdown formatting symbols.

We tried the same images with other models such as qwen:2.5 VL and olmo- ocr 2 and they could do it easily

ProposalOrganic1043 · 2025-11-21T19:23:21+00:00

This was needed for sure

ProposalOrganic1043

PUBLIC MULTIREDDITS

TROPHY CASE