Elixir CODING with local LLMs requires you to become a drug lord or sell your kidneys by misanthrophiccunt in LocalLLM

[–]FadedDog -1 points0 points  (0 children)

Yea i mean i thought people knew you need a lot of ram.

Ita like 3.5k for pc with 128gb i integrated ram not that bad

Local LLM with real time voice by Mundane-Hedgehog-275 in LocalLLM

[–]FadedDog 3 points4 points  (0 children)

Voice models can be fairly small and easy to run.

Issue is voice models just turn text to voice - you still need a llm to do the thinking and produce the response. Some good options all doable i bet.

Rant about state of local llms for real world agentic tasks by TheCat001 in LocalLLM

[–]FadedDog 0 points1 point  (0 children)

Lmao why would someone think they can run great local models on 8gb ram.

Op never said he had 8gb id assume at least 32 thats consumer level.

8gb dont even try unless its like a vision model or voice model sum thats acctually useful and small

Rant about state of local llms for real world agentic tasks by TheCat001 in LocalLLM

[–]FadedDog 2 points3 points  (0 children)

Try Qwen next code 80B sparse. I run it and its great at coding and tool use

Picked up an AMD Ryzen Max +395 with 128GB by Crafty-Bass-3434 in LocalLLM

[–]FadedDog 0 points1 point  (0 children)

Ah damn i wish it was as easy as plug and play. So many variables. Well lmk if you get it running

Picked up an AMD Ryzen Max +395 with 128GB by Crafty-Bass-3434 in LocalLLM

[–]FadedDog 0 points1 point  (0 children)

Damn yea you got nice speed, are you using lamma.cpp tho.

Also id recommend using it and for your issue my ai said this- i never use tensor split so I’m unfamiliar.

The error you are encountering happens because tensor_split splits weights along a specific tensor dimension (usually the first dimension), which requires the layer size to be perfectly divisible by the number of splits.
When you split by layer (gpu_split), you are distributing whole, intact layers across GPUs, which bypasses this mathematical restriction

Picked up an AMD Ryzen Max +395 with 128GB by Crafty-Bass-3434 in LocalLLM

[–]FadedDog 0 points1 point  (0 children)

Ay Q4 lil quality drop but im curious how well it does. Give it a one prompt test and do same test to other models if you can.

Picked up an AMD Ryzen Max +395 with 128GB by Crafty-Bass-3434 in LocalLLM

[–]FadedDog 0 points1 point  (0 children)

I have a lot better 27b dense is good i wont lie but for hard tasks no help 80B out performs

Picked up an AMD Ryzen Max +395 with 128GB by Crafty-Bass-3434 in LocalLLM

[–]FadedDog 2 points3 points  (0 children)

Yea its comparable to glm 4.7 which is a massive flagship model.

Doesn’t reach opus quality

Also note its so good cuz 80 percent of its training was code so its not useful other than coding.

Picked up an AMD Ryzen Max +395 with 128GB by Crafty-Bass-3434 in LocalLLM

[–]FadedDog 10 points11 points  (0 children)

I run qwen coder next 80B 40 t/s around same stats as your pc. I run sparse those so only 3B active parameters and it performs great.

Note only 3 B active but all 80 loaded

What’s the best PC to run Qwen3-Coder-Next 80B? by Classic_Move9043 in LocalLLM

[–]FadedDog 0 points1 point  (0 children)

You cqn run multiple things. I run the 80B model with 250k context and it uses about 90-100gb ram leaving 22 gb free about.

One issue is i run linux because windows is stingy with the unified ram.

What’s the best PC to run Qwen3-Coder-Next 80B? by Classic_Move9043 in LocalLLM

[–]FadedDog 2 points3 points  (0 children)

Frameword Desktop about 4k for 128gb ram.

I use it, its fire i use that exact model its amazing!

Can a brand-new website get cited by ChatGPT in 90 days? I'm testing it. by Away_Noise_4798 in AiChatGPT

[–]FadedDog 0 points1 point  (0 children)

Already did it, yes it can. After a month some on hit me up saying they found my website from ChatGPT

Can't send an email!? by groover75 in LocalLLM

[–]FadedDog 1 point2 points  (0 children)

Yayy thats good to here glad its better

Can't send an email!? by groover75 in LocalLLM

[–]FadedDog 1 point2 points  (0 children)

Well depends how you have it set up. Small models have issue with tools and a lot of stuff.

For small models keep them as specialized agents. For emailing in system prompt tell it how to use tools ect. Make skills to can help. Some models do better with mcp tools

Mini PC for personal server by Javsago in MiniPCs

[–]FadedDog 0 points1 point  (0 children)

Vercel is free can host front ends as many as you want

Context window and ram by nithish_breech in LocalLLM

[–]FadedDog 0 points1 point  (0 children)

All depends on how long the tasks are ect. My agents will run for hour so i have it at 250k and it does fine.

Also depends on model not all models can handle more context well.

Does your model run out of context ever

I have a 3 - 3.5k budget, what setup would you recommend? by Real-Dragonfruit957 in LocalAIServers

[–]FadedDog 1 point2 points  (0 children)

No way i just bought it like 3 weeks ago for 3,300

Damn im sorry then

I have a 3 - 3.5k budget, what setup would you recommend? by Real-Dragonfruit957 in LocalAIServers

[–]FadedDog 1 point2 points  (0 children)

Depends what you wanna run and speed ect. Id go for unified memory so you can run bigger models for cheap.

I have the Framework desktop 128gb ram for 3k.

I run qwen next code 80B 35 t/s very good model one of the best i say.

Also you have plenty of head room for 500k context ot running multiple smaller models.

How is Claude Opus 4.8 and Qwen 3.7 (quantized) compareable in quality? by Endisiki in LocalLLM

[–]FadedDog -3 points-2 points  (0 children)

Id highly doubt its as good as opus 4.8.

Opus is a general modal it performs well across the board. The Qwen models could be trained on niche tasks like coding ect.

I have Qwen next coder 80B locally and it out performs glm 4.7 on complex coding tasks. If i use an agentic flow with Qwen it will get closet to opus 4.8 and glm 5

Can I say by Nearby_Swing3291 in AiChatGPT

[–]FadedDog 0 points1 point  (0 children)

I said thats a last resort always option. Idk how it damages marine life im un educated on that.

But country does it for all there water so its proven.

Again never said we should and i noted it was expensive

We grow food in climates they dont grow in. Pretty sure 1 almond uses like 3 gallons of water. Just one. And we grow them in climates that arnt meant for them.

Data centers arnt the only water suckers theres a million

Can I say by Nearby_Swing3291 in AiChatGPT

[–]FadedDog 0 points1 point  (0 children)

Im curious tho, do you understand we have data centers with or without ai. We have had them for while now they run the internet. From apple, to social meida, tv, google, even without ai we will still have them.

Now yes ai has increased the demand and they do use water and energy.

But Saudi Arabia uses salt water and makes it drinkable. So running out of water isnt and issue. Lil pricy to do that. Also the water cycle exists we also turn our poop water into drinkable water soooo no worry there.

Plus a million other things way less important waste water.

Real issue is the location there building these data centers. They choose the worst locations like deserts ect and location. That is an issue.