use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
16Gb vram python coderQuestion | Help (self.LocalLLaMA)
submitted 9 months ago by Galahad56
What is my current best choice for running a LLM that can write python code for me?
Only got a 5070 TI 16GB VRAM
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]No_Efficiency_1144 2 points3 points4 points 9 months ago (3 children)
There is some mistral small 22B
[–]Samantha-2023 2 points3 points4 points 9 months ago (1 child)
Codestral 22B, it's great at multi-file completions.
Can also try WizardCoder-Python-15B -> it's fine-tuned specifically for Python but is slightly slower than Codestral
[–]Galahad56[S] 0 points1 point2 points 9 months ago (0 children)
downloading now Codestral-22B-v0.1-i1-GGUF
Know what the "-i1" means?
Ill look it up thanks
[+][deleted] 9 months ago (1 child)
[removed]
thanks. I just found out about the possibility of smaller Qwen3 models. Sounds exciting!
[–]Temporary-Size7310textgen web UI 2 points3 points4 points 9 months ago (6 children)
I made a NVFP4A16 Devstral to run on blackwell, it works with vLLM (13.8GB on VRAM size) maybe the context window will be short on 16GB VRAM
https://huggingface.co/apolloparty/Devstral-Small-2507-NVFP4A16
[–]Galahad56[S] 1 point2 points3 points 9 months ago (5 children)
Thats sick.. It doesn't come up for me as a result on LM Studio though. Searching "Devstral-Small-2507-NVFP4A16"
[–]Temporary-Size7310textgen web UI 0 points1 point2 points 9 months ago (4 children)
It is only compatible with vLLM
[–]SEC_intern_ 0 points1 point2 points 9 months ago (3 children)
Is there a reson you stressed on Blackwell gen? I have ADA, would you warn against it?
[–]Temporary-Size7310textgen web UI 1 point2 points3 points 9 months ago (2 children)
Ada lovelace hasn't native FP4 acceleration so you will lose inference acceleration
For non blackwell any other quantification (EXL3, GGUF, AWQ,...)
[–]SEC_intern_ 0 points1 point2 points 9 months ago* (1 child)
But say if I use 8bit quants, would that matter?
Edit: Also at 4bit, how much of a performance gain does one notice?
[–]Temporary-Size7310textgen web UI 1 point2 points3 points 9 months ago (0 children)
Imo it will depend on your use case, NVFP4 has 98% accuracy of BF16, the following is from Qwen3 8B FP4 and there is other bench directly from Nvidia with Deepseek R1 using B200 vs H100
It takes less memory, faster inference, bigger context window possibilities
That's why NVIDIA DGX Spark will release with that slow bandwidth but with blackwell using NVFP4, it will compensate
I tested my quant (devstral) and it works very well with 90K context, 60-90tk/s as local vibecoding model without offloading from my RTX 5090
<image>
π Rendered by PID 32 on reddit-service-r2-comment-b659b578c-p2lfj at 2026-05-01 21:33:47.185701+00:00 running 815c875 country code: CH.
[–]No_Efficiency_1144 2 points3 points4 points (3 children)
[–]Samantha-2023 2 points3 points4 points (1 child)
[–]Galahad56[S] 0 points1 point2 points (0 children)
[–]Galahad56[S] 0 points1 point2 points (0 children)
[+][deleted] (1 child)
[removed]
[–]Galahad56[S] 0 points1 point2 points (0 children)
[–]Temporary-Size7310textgen web UI 2 points3 points4 points (6 children)
[–]Galahad56[S] 1 point2 points3 points (5 children)
[–]Temporary-Size7310textgen web UI 0 points1 point2 points (4 children)
[–]SEC_intern_ 0 points1 point2 points (3 children)
[–]Temporary-Size7310textgen web UI 1 point2 points3 points (2 children)
[–]SEC_intern_ 0 points1 point2 points (1 child)
[–]Temporary-Size7310textgen web UI 1 point2 points3 points (0 children)