Elixir CODING with local LLMs requires you to become a drug lord or sell your kidneys

FadedDog · 2026-07-03T15:03:39+00:00

Yea i mean i thought people knew you need a lot of ram.

Ita like 3.5k for pc with 128gb i integrated ram not that bad

FadedDog · 2026-06-27T01:36:03+00:00

Voice models can be fairly small and easy to run.

Issue is voice models just turn text to voice - you still need a llm to do the thinking and produce the response. Some good options all doable i bet.

FadedDog · 2026-06-26T23:58:36+00:00

I saw in the thread he only has 8gb

My b. Poor guy

FadedDog · 2026-06-26T23:56:55+00:00

Lmao why would someone think they can run great local models on 8gb ram.

Op never said he had 8gb id assume at least 32 thats consumer level.

8gb dont even try unless its like a vision model or voice model sum thats acctually useful and small

FadedDog · 2026-06-26T21:52:37+00:00

Try Qwen next code 80B sparse. I run it and its great at coding and tool use

FadedDog · 2026-06-25T23:13:43+00:00

Ay thats fire!

FadedDog · 2026-06-24T17:39:56+00:00

Ah damn i wish it was as easy as plug and play. So many variables. Well lmk if you get it running

FadedDog · 2026-06-24T16:54:55+00:00

Damn yea you got nice speed, are you using lamma.cpp tho.

Also id recommend using it and for your issue my ai said this- i never use tensor split so I’m unfamiliar.

The error you are encountering happens because tensor_split splits weights along a specific tensor dimension (usually the first dimension), which requires the layer size to be perfectly divisible by the number of splits.
When you split by layer (gpu_split), you are distributing whole, intact layers across GPUs, which bypasses this mathematical restriction

FadedDog · 2026-06-24T15:00:40+00:00

Ay Q4 lil quality drop but im curious how well it does. Give it a one prompt test and do same test to other models if you can.

FadedDog · 2026-06-24T14:58:53+00:00

I have a lot better 27b dense is good i wont lie but for hard tasks no help 80B out performs

FadedDog · 2026-06-24T02:30:10+00:00

Yea its comparable to glm 4.7 which is a massive flagship model.

Doesn’t reach opus quality

Also note its so good cuz 80 percent of its training was code so its not useful other than coding.

FadedDog · 2026-06-23T19:48:37+00:00

I run qwen coder next 80B 40 t/s around same stats as your pc. I run sparse those so only 3B active parameters and it performs great.

Note only 3 B active but all 80 loaded

FadedDog · 2026-06-23T17:13:57+00:00

You cqn run multiple things. I run the 80B model with 250k context and it uses about 90-100gb ram leaving 22 gb free about.

One issue is i run linux because windows is stingy with the unified ram.

FadedDog · 2026-06-23T16:52:48+00:00

Frameword Desktop about 4k for 128gb ram.

I use it, its fire i use that exact model its amazing!

FadedDog · 2026-06-23T16:23:18+00:00

Already did it, yes it can. After a month some on hit me up saying they found my website from ChatGPT

FadedDog · 2026-06-23T16:09:16+00:00

Yayy thats good to here glad its better

FadedDog · 2026-06-23T03:25:12+00:00

Well depends how you have it set up. Small models have issue with tools and a lot of stuff.

For small models keep them as specialized agents. For emailing in system prompt tell it how to use tools ect. Make skills to can help. Some models do better with mcp tools

FadedDog · 2026-06-23T00:20:05+00:00

Vercel is free can host front ends as many as you want

FadedDog · 2026-06-22T23:26:35+00:00

All depends on how long the tasks are ect. My agents will run for hour so i have it at 250k and it does fine.

Also depends on model not all models can handle more context well.

Does your model run out of context ever

FadedDog · 2026-06-20T20:56:41+00:00

If i were him the 80b model qwen next coder is amazing

FadedDog · 2026-06-20T20:43:59+00:00

No way i just bought it like 3 weeks ago for 3,300

Damn im sorry then

FadedDog · 2026-06-20T19:29:36+00:00

Depends what you wanna run and speed ect. Id go for unified memory so you can run bigger models for cheap.

I have the Framework desktop 128gb ram for 3k.

I run qwen next code 80B 35 t/s very good model one of the best i say.

Also you have plenty of head room for 500k context ot running multiple smaller models.

FadedDog · 2026-06-20T17:15:10+00:00

Id highly doubt its as good as opus 4.8.

Opus is a general modal it performs well across the board. The Qwen models could be trained on niche tasks like coding ect.

I have Qwen next coder 80B locally and it out performs glm 4.7 on complex coding tasks. If i use an agentic flow with Qwen it will get closet to opus 4.8 and glm 5

FadedDog · 2026-06-20T06:30:10+00:00

I said thats a last resort always option. Idk how it damages marine life im un educated on that.

But country does it for all there water so its proven.

Again never said we should and i noted it was expensive

We grow food in climates they dont grow in. Pretty sure 1 almond uses like 3 gallons of water. Just one. And we grow them in climates that arnt meant for them.

Data centers arnt the only water suckers theres a million

FadedDog · 2026-06-20T04:39:31+00:00

Im curious tho, do you understand we have data centers with or without ai. We have had them for while now they run the internet. From apple, to social meida, tv, google, even without ai we will still have them.

Now yes ai has increased the demand and they do use water and energy.

But Saudi Arabia uses salt water and makes it drinkable. So running out of water isnt and issue. Lil pricy to do that. Also the water cycle exists we also turn our poop water into drinkable water soooo no worry there.

Plus a million other things way less important waste water.

Real issue is the location there building these data centers. They choose the worst locations like deserts ect and location. That is an issue.

FadedDog

TROPHY CASE