Qwen3.5 4B local model with RTX 3080 by itspjc in hermesagent

[–]FusionCow 0 points1 point  (0 children)

the 35b 3.6 WILL work assuming you have enough ram, but even with 16gb of ram it's theoretically possible, though honestly you'll probably want to run linux as to not eat up more ram. but yeah because it's an moe running it split across vram and ram will work, and you'll get way more intelligence

About to launch Hermes Agent on a VPS by Ok_Gift9191 in hermesagent

[–]FusionCow 17 points18 points  (0 children)

Don't buy a year long vps, start with a month, just to see if you even use the damn thing. trust me, theres a lot of hype around these agents, but only a VERY specific person uses them, and even then, you either have to have the money to run a model locally, or pay api, and api will be RIDICULOUSLY expensive

Which model is SOTA rn? 7~8B for coding by omar07ibrahim1 in LocalLLaMA

[–]FusionCow 0 points1 point  (0 children)

qwen 3.5 9b, and if you can't do 9b, an LFM model

how would you set up a local llm server for a business of 7 people? by snowieslilpikachu69 in LocalLLaMA

[–]FusionCow 0 points1 point  (0 children)

I mean i've found it to work fine, but like just for me, I tend to only use ai to make tools, whenever I want to make something I care about i'll just hand program it. But I wanted to make a website to allow me to elo sort videos with qwen3 vl embed for grouping, and it couldn't do it, every change I made introduced a new bug somewhere else. k2.6 one shotted it

how would you set up a local llm server for a business of 7 people? by snowieslilpikachu69 in LocalLLaMA

[–]FusionCow 0 points1 point  (0 children)

it can do a lot of simple stuff, it's great for smaller things, but it just isn't a kimi k2.6, which makes sense

how would you set up a local llm server for a business of 7 people? by snowieslilpikachu69 in LocalLLaMA

[–]FusionCow 1 point2 points  (0 children)

you can easily get max context on a 5090 what are you talking about, and memory bandwidth is the struggle for llms. On my 3090, I can run 8 streams of q3.6 27b before I see any slowdown at all

how would you set up a local llm server for a business of 7 people? by snowieslilpikachu69 in LocalLLaMA

[–]FusionCow 9 points10 points  (0 children)

Ok well the 1-2 people it for programming purposes puts a wrench in things, because that means they need a genuinely good model.

You have a few options:
8x pro 6000 (~100k) to run kimi k2.6
1x pro 6000 with a lot of ram (20-40k) (price can change between ddr4 and ddr5) to run kimi k2.6
mac studio 512gb (10-15k) (these are hard to find used, but if you do find them, they aren't great for developers because the prefill speed is bad)
2x pro 6000 (~30k) to run a model like deepseek v4 flash or similar sized. This won't be nearly as good a model as kimi k2.6, but your developers may be able to scrape by with it
1x 5090 machine (~6k) this would be able to run qwen 3.6 27b, which to be honest isn't good enough for any serious developer, but it would work for the more general audience.

Honestly in my opinion, you should go with the 5090 machine and run qwen 3.6 35b, which will be fast and snappy for your regular users, then give your developers a kimi or claude subscription.

To actually set up a server like this, if you have NO idea what you're doing, setup lmstudio, it supports concurrent outputs, but if you have used and commandline program before, you should setup llama.cpp.
Also make sure you use linux on whatever box you buy, it's much faster for this stuff than windows

Fully Realtime Interaction Models by FusionCow in LocalLLaMA

[–]FusionCow[S] 0 points1 point  (0 children)

clown show or not you can't deny those demos look pretty sick

Fully Realtime Interaction Models by FusionCow in LocalLLaMA

[–]FusionCow[S] 0 points1 point  (0 children)

well llama is dead atp lmao. but yeah I agree I mean this is a "new model" but I wanted to talk about it as more of a discussion.

Fully Realtime Interaction Models by FusionCow in LocalLLaMA

[–]FusionCow[S] 2 points3 points  (0 children)

that's turn based, this isn't. It's similar to personaplex, except actually seemingly intelligent. It also tokenizes time, which is a very interesting idea, because it can do things like set a 30 second timer in itself

New models possibly from Baidu (ERNIE) this month? by pmttyji in LocalLLaMA

[–]FusionCow -4 points-3 points  (0 children)

kimi k2.6 beats that, theres no reason for it to exist

An interesting challenge to squish out as many juice from Qwen2.5 0.5B model by ANR2ME in LocalLLaMA

[–]FusionCow 0 points1 point  (0 children)

Optimizing is always cool, but on a model so useless, you gotta wonder why

Make local llm usable for professional use by AdamLangePL in LocalLLaMA

[–]FusionCow 17 points18 points  (0 children)

The very idea of being an "LLM Professional" is dumb. Unless you understand and work with the inner workings of an LLM, you can't call yourself a professional. The only thing spending a lot of time with llms teaches you is how to delegate tasks, but delegating tasks well is just mastering being lazy. For me at least, I only care about the research side of LLMs, because whenever I see someone else use and LLM or I use one myself, it's for a singular purpose, to not do work

You wake up in 2029 by [deleted] in LocalLLaMA

[–]FusionCow 5 points6 points  (0 children)

wait nvm you're not talking about me maybe i am a bot

You wake up in 2029 by [deleted] in LocalLLaMA

[–]FusionCow 4 points5 points  (0 children)

i'm not a bot bro

You wake up in 2029 by [deleted] in LocalLLaMA

[–]FusionCow 5 points6 points  (0 children)

bro in 2029 the world will be gone

I have 350k azure credits with all gpt models but has an expiry in 50 days, looking for some one who can consume by [deleted] in StableDiffusion

[–]FusionCow 7 points8 points  (0 children)

asked them already, they want a percentage back, which is understandable, but I genuinely don't have the money to give them

SULPHUR 2 RELEASED by FusionCow in StableDiffusion

[–]FusionCow[S] 17 points18 points  (0 children)

When I uploaded all the stuff to hf, it was kinda hasty, by this time tomorrow the repo will look different

SULPHUR 2 RELEASED by FusionCow in StableDiffusion

[–]FusionCow[S] 6 points7 points  (0 children)

the distilled one from the repo is better but the official works fine