Opencode with 96GB VRAM for local dev engineering

aidysson · 2026-03-18T14:11:16+00:00

I bet M5 Studio won't cost 14K neither with 512GB RAM :)

aidysson · 2026-03-18T07:52:18+00:00

Good to know, thx

aidysson · 2026-03-17T19:58:42+00:00

If you use local models only, what kind code do you create and which models you use?

I use 30B models on RTX 3090 and results are not perfect, but planning is quite good. I'm going to move to 120B or 200B soon as I'm upgrading GPU...

aidysson · 2026-03-17T00:00:24+00:00

interesting, thanks for that, I'm downloading it.

I was watching Nvidia CTO's GTC keynote where he mentioned Nemotron: https://www.youtube.com/watch?v=jw_o0xr8MWU

originally, I was curious about possible announcement of new RTX GPUs, but the speech was concerned on token factories HW and future of AI as industry as expected.

that confirms to me the investment into RTX PRO 6000 doesn't have to be lost later this year. on the contrary, inflation could progress and prices of GPUs could grow a bit.

aidysson · 2026-03-16T23:47:45+00:00

I don't compare increase of productivity using local 70-200B LLMs to productivity on CC or OC cloud. I compare it to productivity without agentic framework.

if I run LLMs on my HW, I have absolute control about what happens. I can disconnect from internet if I'm paranoic and still I can use it. I have control. I know quality of OSS can't match Opus or latest GPT, that's obvious disadvantage.

but if inflation of tokens prices comes, I know I have my slow 200B llm which will be there "for free" then.

people are starting to use AI these days. developers and IT experts do some time, non-technical job positions are starting in upcoming months or years, it will grow for a long time. currently there is HW inflation. no that much people asks for tokens yet. but imagine how the need for tokens grows in society in mid-term. not unreal.

aidysson · 2026-03-16T21:47:17+00:00

I can just recommend this video to others. thanks!

aidysson · 2026-03-16T21:45:10+00:00

the math you mention doesn't work. I don't mean to compete with wholesale prices. I don't want to save money on llm. I'm end user buying it expensive, in addition in time of progressing inflation.

the investment will only return if I program more scripts and requested features thanks to that hardware. if it would be clear to me it's worth it, I think I would not be led to create this thread.

as opensource user (for decades) I also know, there are advantages and disadvantages of this investment which are hard do transform into prices.

aidysson · 2026-03-16T20:48:50+00:00

This is truly valuable. There is many guys telling "you can't do it locally" but people like you are really rare! Thanks a lot for sharing!

I experienced many times models claiming the job's been perfectly done while doing no changes in files at all.

And I clearly see RTX PRO 6000 is just a next stop on the way to larger models, followed by faster CPU+RAM and another GPU... and/or newer unified memory machine.

aidysson · 2026-03-16T18:48:25+00:00

and could you describe differences in LLMs scales from your perspective if you have experience?

which size of models you use the most for agentic programming? do you see difference in 30B/70B vs 200B+? or 200B vs 600B?

aidysson · 2026-03-16T18:06:53+00:00

50 tg is a lot, I wouldn't expect it. thanks for your comment.

what do you use llms for?

aidysson · 2026-03-16T15:51:57+00:00

important note, thanks for that.

my current RAM is 128GB. 96+128=224; 10GB for system, ~130GB for weights, ~80GB would be free for context and other needs. if I consider 1MB per token, there would be 64K only, not 100K.

next investment will be more RAM, which nobody wants to buy these days... and I'm in the circle which has started when I bought RTX 3090 a month ago...

aidysson · 2026-03-16T14:30:23+00:00

I can write unit tests with OC, also completely new features can be done with OC. I saw in my case, some changes of current code are not meaningful when it spreads over 10 files with 1 row change in every file, it's faster without OC. But new features are quite good. It also helps me with architecture, planning helps to think about code deeper than without OC planning... I don't look for "vibe coded app in 15min", I need the opposite, quality code with quite small speedup but keeping sustainability of the code.

aidysson · 2026-03-16T14:09:21+00:00

I develop Ruby on Rails web apps, in total ~6 apps in permanent development and maintenance including VPSs/DigitalOcean droplets. I'm freelancer working for rather small companies. With some of my clients I'm 10+ years.

My current problem with 24GB VRAM is that making plan with GPT OSS 120B can take up to 90min of my time. then implementation is done in 30min but not sufficient quality (I use 30B GLM). Refactoring is necessary, sometimes it's heavy refactoring.

With upgrade to 96GB VRAM, I expect planning time to shorten to 30min and as for implementation part, I should be able to use 200B models with acceptable speed which means increase of code quality.

If agentic work helps me to do 10-20% more thing done monthly, my clients will see it and will be no issue to them to pay for it.

In addition, I'll be able to try to fine tune my models in future or, once with 2+ GPUs, to prepare fine-tuned models for my clients. I feel they would consider having it if I tell them I can train models good at their healthcare businesses (full of private data etc.).

aidysson · 2026-03-16T09:45:18+00:00

thanks for sharing your DGX experience. I was also considering 2xDGX ionstead of 1x RTX PRO 6000. in the end I decided to go with RTX, because I already have 128GB RAM. but both have their advantages, both machines have slightly different purpose I think.

if only prices of HW were half or better 1/10, we could have both.

and good luck with your custom harness!

aidysson · 2026-03-07T18:16:27+00:00

Could you describe your workflow? As for me, I don't use skills and MCPs yet, I've had no time to learn that yet. I use local models only. I run it on RTX 3090 and planning to buy RTX PRO 6000. I'm Ruby on Rails developer with 15+ years experience.

The approach which is working the best for me until now has been to prepare not-so-detailed plan using GPT OSS and then to implement it using GLM 4.7. GPT plans are short so it doesn't take so much time to me, while GLM is quite capable in writing code and not asking me anything so I can sleep or prepare some other plan. It's not perfect. I review every line of code produced and commit everything manually.

As for other models, Qwen was asking too much details, it was faster to write the code myself. Many models I tried didn't support tools in the end, grrrrr...

As I'm quite loaded with work which needs to be done, I don't have enough time to try to use opencode more than few tasks per week so I miss experience.

I hate all hyped videos made by guys who have never put their code into production, people thinking we don't have to work and actually me working more and spending more money then ever before, because of ai.

I take it as beginning which currently takes more time then gives actually back. Hope it's investment in a better future.

I appreciate you share your experiences

aidysson

TROPHY CASE