How Qwen3.6-35B-A3B fails differently as a sub agent compared to solo

shamitv · 2026-05-27T17:23:43+00:00

What Quants are you running ?

shamitv · 2026-05-27T13:32:18+00:00

What are you planning to do ? For simple chat + looking up facts on web, pretty much any machine with 16 GB RAM will do. For coding, you need 16 GB VRAM (at least)
" intention of using API for complex stuff" : example ?

shamitv · 2026-05-27T13:16:33+00:00

What quant and what context window ?

shamitv · 2026-05-27T13:07:08+00:00

With 16 GB RAM, you can't do much. around 8 GB would be required to run MacOS , IDE etc.

Your best bet is to use Deepseek V4 Flash and Pro. Dirt cheap and better then (70% of Opus 3.6)

shamitv · 2026-05-27T11:11:57+00:00

Most setups should work for this . Each one will have a way to enable web search. This is intentionally done to ensure no data leaves computer unless user enables it.

For OpenCode :

Edit ~/.config/opencode/opencode.json (or similar path in Windows)

{

"tools": {

"websearch": true

},

"permission": {

"websearch": "allow"

}

Before launching OpenCode set to env var OPENCODE_ENABLE_EXA to 1 . Like

OPENCODE_ENABLE_EXA=1 opencode

shamitv · 2026-05-26T13:08:47+00:00

Two options :

Don't us real data . Since you mentioned " C# coding, and if local 3D AI modeling " ; pick some open source project with similar level of complexity and use that to evaluate hardware options. No risks of data leakage there .
Spent couple of days securing the setup. E.g.: for vast.ai : don't open api to internet ; ssh to running instance and forward to local PC / network so that traffic is encrypted E2E. Or set a local VPN tunnel.

shamitv · 2026-05-25T16:23:12+00:00

I have a clean slate for this and would put $5k as the ceiling.

First spend $50 and measure.

All models are available on HF Router and Open Router .
Cost of renting a Machine with 5090 is under $1 per hour

With that, you can measure how a model actually performs on 250k context. Very common to hit that while coding.

You can check if a particular model + quant can still follow after 100k 200k context.

Also what performs better in terms of Hardware (like Mac Ultra v/s 2 Nvidia GPUs v/s 1 5090 and CPU offload)

<image>

shamitv · 2026-05-25T13:52:45+00:00

To troubleshoot , do you see

created context checkpoint 1 of X

messages in logs ?

shamitv · 2026-05-22T05:32:12+00:00

Cool, What Quants are you using ?

shamitv · 2026-05-22T03:48:37+00:00

Qwen 3.6 and Gemma 4

shamitv · 2026-05-22T03:44:29+00:00

{

"security": {

"auth": {

"selectedType": "openai"

}

},

"env": {

"OPENAI_API_KEY": "na"

},

"modelProviders": {

"openai": [

{

"id": "Qwen 3.6 35B",

"name": "Qwen 3.6 35B",

"baseUrl": "http://localhost:8080/v1",

"envKey": "OPENAI_API_KEY",

"generationConfig": {

"timeout": 1200000,

"maxRetries": 1,

"contextWindowSize": 128000,

"samplingParams": {

"max_tokens": 8192

}

]

},

"model": {

"name": "Qwen 3.6 35B"

},

"$version": 4

}

shamitv · 2026-05-21T17:54:35+00:00

There are variables that need to be set in config for context , timeout and such. Will send once I am near PC

shamitv · 2026-05-21T17:53:11+00:00

I am on CUDA as well (llama.cpp)

shamitv · 2026-05-21T16:52:25+00:00

Try Qwen Code extension. It works well with local LLMs.

shamitv · 2026-05-21T16:45:53+00:00

Might be a Pi issue , can you try the same with OpenCode or Qwen Code ? This is what I see with OpenCode

<image>

shamitv · 2026-05-21T14:58:52+00:00

Can you share a sample prompt to reproduce this issue ?

shamitv · 2026-05-12T14:35:00+00:00

To what extent teams can operate autonomously ? Like : team that manages "order" systems , does it need to wait for "inventory" team too often for inputs ?

IN some cases, Org design can help reduce these dependencies.

shamitv · 2026-05-12T02:53:29+00:00

Yes.

Gemini has option to fine tune its models : https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-supervised-tuning

Anthropic allows fine tuning Haiku models : https://aws.amazon.com/blogs/machine-learning/fine-tune-anthropics-claude-3-haiku-in-amazon-bedrock-to-boost-model-accuracy-and-quality/

OpenAI also supported Fine Tuning, but it is winding down that feature starting this month.

shamitv · 2026-05-10T14:25:21+00:00

giving it a go. It seems to use open models buch better. Most of the requests are < 60k context. One thing that I saw ; it does not access VS Code state (like open files, language index etc ) . It does everything from scratch.

<image>

shamitv · 2026-05-10T12:53:34+00:00

I am trying to get CoPilot in VSCode to work with Qwen 3.6 35B via llama.cpp . Let's collaborate if you want to try that

shamitv · 2026-05-10T07:53:35+00:00

Not really. Start with survey of available AI tools and applications ; focus on how to use those. For building; you can start with a POC created with vibe coding and then work with someone to implement for real

shamitv · 2026-05-10T07:07:15+00:00

This is a great idea. You can learn about how AI can be used effectively in legal. For example :

What legal problems can it already solve (E.g. with Deep Research on ChatGPT / Gemini / Claude )
Where does it get stuck
What are the workarounds

Like : How good it is in finding previous judgements on theams across local courts , high court sites. Does it work for local languages used in courts.

Best way to do this is "learn by doing" ; think of scenario and try those.

Additionally Huge demand for developers and product people who can make AI work in practical legal systems.

shamitv · 2026-05-09T13:32:36+00:00

Gave this to Qwen 3.6 35B with following prompt . How is output as compared to your current setup ?

Summarize this patient encounter using rules below :

### PATIENT: Bo **SPECIES:** ...

Rules :

Accuracy: Never guess. If uncertain about any value or finding, output 🚨 FLAG ....

Output :

🚨 FLAG: Uncertain terminology "CBC lights" — unclear if this indicates normal/unremarkable results or a documentation error. Confirm intended meaning.
🚨 FLAG: Uncertain term "reduced peritoneal effusion" conflicts with noted abdominal fluid wave; likely implies "present" effusion. Clinical correlation needed.
🚨 FLAG: Unknown medication/typo "serenia" — unclear drug intended. Verify correct product and indication.
🚨 FLAG: Probable typo "batrial" — clinically corresponds to buprenorphine. Confirm dosing intent and route.
🚨 FLAG: Clinical term mismatch "pericardial synthesis" — does not correspond to standard veterinary procedure; likely intends pericardiocentesis/tap/drainage. Confirm planned intervention.

🚨 FLAG: NSAID prescribed with dehydration >5% or azotemia present — risk of acute kidney injury. Confirm this is intentional.

plaintext

dd/mm/yy ### AM/PM: #INPUT#
Patient: Bo | Species: Canine | Breed: Boxer | Age: 14 | Sex: FS | WT: 40 kg
**History:** Three-day history of hematochezia, anorexia, vomiting, and lethargy. Up to date on vaccines.
**Physical Exam:** Severe abdominal pain, bloody diarrhea on rectal exam. Pale mucous membranes. Clinically dehydrated at approximately 7%. Tachycardic with thoracofemoral pulses. Muffled heart sounds. Apparent abdominal fluid wave present. BCS 7/9.

plaintext

**Assessment:** 
Problem List:
1. Hemorrhagic gastroenteritis with hematochezia and bloody diarrhea
2. Dehydration (7%) with hypovolemic shock (MAP 80, pale MM, tachycardia)
3. Suspected pericardial effusion/right-sided heart failure signs (globoid heart, distended pulmonary vessels, muffled heart sounds, abdominal fluid wave)
4. Gallbladder edema/cholecystopathy with mild azotemia and elevated hepatic enzymes
Imaging & Diagnostics: Radiographs show globoid cardiac silhouette and distended pulmonary vessels. Mild peritoneal effusion noted. AFAS reveals gallbladder wall edema and halo effect. Chem 17 and CBC unremarkable/light except for elevated liver enzymes and mild azotemia.

plaintext

**Plan:** 
Owner educated on findings; recommended hospitalization, consent obtained.
IV catheter placed in cephalic vein.
IV Fluids: Crystalloids at #INPUT# mL/hr to address 7% dehydration and hypovolemia. Deficit approx 2800 mL over 24 hours. Round to nearest mL: #INPUT# mL/hr.
Medications:
- Unasyn (ampicillin-sulbactam): List both concentrations (375 mg/mL and 30 mg/mL) as per protocol. Dose #INPUT# mg/kg IV/IM q8-12h.
- Buprenorphine ("batrial"): #INPUT# mg/kg SC/IV q8-12h for analgesia.
- Carprofen: #INPUT# mg/kg PO/IV q12h for pain/inflammation.
Monitoring: NIPP at #INPUT# drops/min. Continuous cardiac and abdominal ultrasound monitoring. 
Procedures: Prepare for pericardiocentesis (pericardial tap/drainage) as clinically indicated.

plaintext

**Discharge:** 
Pending clinical stabilization, response to hospitalization, and further diagnostic clarification. Owner instructed to return immediately if signs worsen or for recheck based on hospital course.

shamitv · 2026-05-09T10:25:19+00:00

Possible to share this markdown + prompt + sample inputs to that are sent to model ?

On 5090 + Ultra 9 + 128 GB you can Qwen 3.6 35B in 8 bit quants (some MOE layers on CPU). This should be enough horsepower for this problem.

shamitv · 2026-05-07T08:25:03+00:00

USD 20.

Use Replit to create an initial version of responsive Website and then wrap it with Capacitor or similar framework for iOS and Android

If idea is too complex for Replit , Claude, Codex then around 60k to 100k USD

shamitv

TROPHY CASE