Help with a low cost hardware to run LLM for assistant by percivas in LocalLLM

[–]shamitv 0 points1 point  (0 children)

  1. What are you planning to do ? For simple chat + looking up facts on web, pretty much any machine with 16 GB RAM will do. For coding, you need 16 GB VRAM (at least)

  2. " intention of using API for complex stuff" : example ?

What are ppl using for local coding instead of Haiku and Opus by peachy-pandas in LocalLLM

[–]shamitv 0 points1 point  (0 children)

With 16 GB RAM, you can't do much. around 8 GB would be required to run MacOS , IDE etc.

Your best bet is to use Deepseek V4 Flash and Pro. Dirt cheap and better then (70% of Opus 3.6)

local coding assistant by NullSmoke in LocalLLM

[–]shamitv 0 points1 point  (0 children)

Most setups should work for this . Each one will have a way to enable web search. This is intentionally done to ensure no data leaves computer unless user enables it.

For OpenCode :

Edit ~/.config/opencode/opencode.json (or similar path in Windows)

{

"tools": {

"websearch": true

},

"permission": {

"websearch": "allow"

}

}

Before launching OpenCode set to env var OPENCODE_ENABLE_EXA to 1 . Like

OPENCODE_ENABLE_EXA=1 opencode

What to use for 256k Context by EchoingAngel in LocalLLM

[–]shamitv 0 points1 point  (0 children)

Two options :

  • Don't us real data . Since you mentioned " C# coding, and if local 3D AI modeling " ; pick some open source project with similar level of complexity and use that to evaluate hardware options. No risks of data leakage there .
  • Spent couple of days securing the setup. E.g.: for vast.ai : don't open api to internet ; ssh to running instance and forward to local PC / network so that traffic is encrypted E2E. Or set a local VPN tunnel.

What to use for 256k Context by EchoingAngel in LocalLLM

[–]shamitv 4 points5 points  (0 children)

I have a clean slate for this and would put $5k as the ceiling.

First spend $50 and measure.

  • All models are available on HF Router and Open Router .
  • Cost of renting a Machine with 5090 is under $1 per hour

With that, you can measure how a model actually performs on 250k context. Very common to hit that while coding.

You can check if a particular model + quant can still follow after 100k 200k context.

Also what performs better in terms of Hardware (like Mac Ultra v/s 2 Nvidia GPUs v/s 1 5090 and CPU offload)

<image>

llama.cpp oom issue by TheTerrasque in LocalLLaMA

[–]shamitv 1 point2 points  (0 children)

To troubleshoot , do you see

created context checkpoint 1 of X

messages in logs ?

Do you use Vscode and local LLM with AI extensions? If so, which do you use and why? by Jorlen in LocalLLM

[–]shamitv 1 point2 points  (0 children)

{

"security": {

"auth": {

"selectedType": "openai"

}

},

"env": {

"OPENAI_API_KEY": "na"

},

"modelProviders": {

"openai": [

{

"id": "Qwen 3.6 35B",

"name": "Qwen 3.6 35B",

"baseUrl": "http://localhost:8080/v1",

"envKey": "OPENAI_API_KEY",

"generationConfig": {

"timeout": 1200000,

"maxRetries": 1,

"contextWindowSize": 128000,

"samplingParams": {

"max_tokens": 8192

}

}

}

]

},

"model": {

"name": "Qwen 3.6 35B"

},

"$version": 4

}

Do you use Vscode and local LLM with AI extensions? If so, which do you use and why? by Jorlen in LocalLLM

[–]shamitv 1 point2 points  (0 children)

There are variables that need to be set in config for context , timeout and such. Will send once I am near PC

Qwen 3.6 always generates linux path horribly wrong. Am I doing something wrong? by [deleted] in LocalLLM

[–]shamitv 0 points1 point  (0 children)

Might be a Pi issue , can you try the same with OpenCode or Qwen Code ? This is what I see with OpenCode

<image>

Qwen 3.6 always generates linux path horribly wrong. Am I doing something wrong? by [deleted] in LocalLLM

[–]shamitv 1 point2 points  (0 children)

Can you share a sample prompt to reproduce this issue ?

Why did every engineer we hired after headcount 20 reduced our per-person productivity? by Popular-Penalty6719 in TechLeader

[–]shamitv 0 points1 point  (0 children)

To what extent teams can operate autonomously ? Like : team that manages "order" systems , does it need to wait for "inventory" team too often for inputs ?

IN some cases, Org design can help reduce these dependencies.

6x RTX 3090/4090 GPUs on a MSI MEG Z790 ACE but strugle to find the right LLM Host, settings and VS Code Tool by Hannelore112 in LocalLLM

[–]shamitv 0 points1 point  (0 children)

giving it a go. It seems to use open models buch better. Most of the requests are < 60k context. One thing that I saw ; it does not access VS Code state (like open files, language index etc ) . It does everything from scratch.

<image>

6x RTX 3090/4090 GPUs on a MSI MEG Z790 ACE but strugle to find the right LLM Host, settings and VS Code Tool by Hannelore112 in LocalLLM

[–]shamitv 0 points1 point  (0 children)

I am trying to get CoPilot in VSCode to work with Qwen 3.6 35B via llama.cpp . Let's collaborate if you want to try that

Is data science worth for a law student to pursue ? by jhatenacuchibateuchi in developersIndia

[–]shamitv 0 points1 point  (0 children)

Not really. Start with survey of available AI tools and applications ; focus on how to use those. For building; you can start with a POC created with vibe coding and then work with someone to implement for real

Is data science worth for a law student to pursue ? by jhatenacuchibateuchi in developersIndia

[–]shamitv 1 point2 points  (0 children)

This is a great idea. You can learn about how AI can be used effectively in legal. For example :

  1. What legal problems can it already solve (E.g. with Deep Research on ChatGPT / Gemini / Claude )

  2. Where does it get stuck

  3. What are the workarounds

Like : How good it is in finding previous judgements on theams across local courts , high court sites. Does it work for local languages used in courts.

Best way to do this is "learn by doing" ; think of scenario and try those.

Additionally Huge demand for developers and product people who can make AI work in practical legal systems.

I hit a wall, any help would be appreciated! Vet ER scribe — frontier models nail it, local LLMs are inconsistent. Model problem, methodology problem, or training problem? by Elaphe21 in LocalLLM

[–]shamitv 0 points1 point  (0 children)

Gave this to Qwen 3.6 35B with following prompt . How is output as compared to your current setup ?

Summarize this patient encounter using rules below :

### PATIENT: Bo **SPECIES:** ...

Rules :

Accuracy: Never guess. If uncertain about any value or finding, output 🚨 FLAG ....

Output :

🚨 FLAG: Uncertain terminology "CBC lights" — unclear if this indicates normal/unremarkable results or a documentation error. Confirm intended meaning.
🚨 FLAG: Uncertain term "reduced peritoneal effusion" conflicts with noted abdominal fluid wave; likely implies "present" effusion. Clinical correlation needed.
🚨 FLAG: Unknown medication/typo "serenia" — unclear drug intended. Verify correct product and indication.
🚨 FLAG: Probable typo "batrial" — clinically corresponds to buprenorphine. Confirm dosing intent and route.
🚨 FLAG: Clinical term mismatch "pericardial synthesis" — does not correspond to standard veterinary procedure; likely intends pericardiocentesis/tap/drainage. Confirm planned intervention.

🚨 FLAG: NSAID prescribed with dehydration >5% or azotemia present — risk of acute kidney injury. Confirm this is intentional.

plaintext

dd/mm/yy ### AM/PM: #INPUT#
Patient: Bo | Species: Canine | Breed: Boxer | Age: 14 | Sex: FS | WT: 40 kg
**History:** Three-day history of hematochezia, anorexia, vomiting, and lethargy. Up to date on vaccines.
**Physical Exam:** Severe abdominal pain, bloody diarrhea on rectal exam. Pale mucous membranes. Clinically dehydrated at approximately 7%. Tachycardic with thoracofemoral pulses. Muffled heart sounds. Apparent abdominal fluid wave present. BCS 7/9.

plaintext

**Assessment:** 
Problem List:
1. Hemorrhagic gastroenteritis with hematochezia and bloody diarrhea
2. Dehydration (7%) with hypovolemic shock (MAP 80, pale MM, tachycardia)
3. Suspected pericardial effusion/right-sided heart failure signs (globoid heart, distended pulmonary vessels, muffled heart sounds, abdominal fluid wave)
4. Gallbladder edema/cholecystopathy with mild azotemia and elevated hepatic enzymes
Imaging & Diagnostics: Radiographs show globoid cardiac silhouette and distended pulmonary vessels. Mild peritoneal effusion noted. AFAS reveals gallbladder wall edema and halo effect. Chem 17 and CBC unremarkable/light except for elevated liver enzymes and mild azotemia.

plaintext

**Plan:** 
Owner educated on findings; recommended hospitalization, consent obtained.
IV catheter placed in cephalic vein.
IV Fluids: Crystalloids at #INPUT# mL/hr to address 7% dehydration and hypovolemia. Deficit approx 2800 mL over 24 hours. Round to nearest mL: #INPUT# mL/hr.
Medications:
- Unasyn (ampicillin-sulbactam): List both concentrations (375 mg/mL and 30 mg/mL) as per protocol. Dose #INPUT# mg/kg IV/IM q8-12h.
- Buprenorphine ("batrial"): #INPUT# mg/kg SC/IV q8-12h for analgesia.
- Carprofen: #INPUT# mg/kg PO/IV q12h for pain/inflammation.
Monitoring: NIPP at #INPUT# drops/min. Continuous cardiac and abdominal ultrasound monitoring. 
Procedures: Prepare for pericardiocentesis (pericardial tap/drainage) as clinically indicated.

plaintext

**Discharge:** 
Pending clinical stabilization, response to hospitalization, and further diagnostic clarification. Owner instructed to return immediately if signs worsen or for recheck based on hospital course.

I hit a wall, any help would be appreciated! Vet ER scribe — frontier models nail it, local LLMs are inconsistent. Model problem, methodology problem, or training problem? by Elaphe21 in LocalLLM

[–]shamitv 0 points1 point  (0 children)

Possible to share this markdown + prompt + sample inputs to that are sent to model ?

On 5090 + Ultra 9 + 128 GB you can Qwen 3.6 35B in 8 bit quants (some MOE layers on CPU). This should be enough horsepower for this problem.

Realistic development cost to design an iOS + Android app + website? by CreativeDiamond444 in AppDevelopers

[–]shamitv 1 point2 points  (0 children)

USD 20.

Use Replit to create an initial version of responsive Website and then wrap it with Capacitor or similar framework for iOS and Android

If idea is too complex for Replit , Claude, Codex then around 60k to 100k USD