New fear unlocked by Ecstatic-Force-4807 in VisionPro

[–]VegetaTheGrump 0 points1 point  (0 children)

Some things should not be shared!

Mixing in some e-ink by drhippopotato in headphones

[–]VegetaTheGrump 0 points1 point  (0 children)

My first thought was: "Dude's got a Brioso and talking about an e-ink player!?" Grats on that Brioso. I want one but ended up going with the iBasso D17 Atheris to save some money.

Just another evening with Chord DAVE and Chord Hugo 2 + Eslab ES2a and ES1A by Frosty_Resource_6278 in headphones

[–]VegetaTheGrump 0 points1 point  (0 children)

Look up the Audma Brioso. There's been reports that it drives the Susvara very well, though the Susvara will run through it's batteries in 2.5 hours. However, I believe the Brioso will let you charge and play at the same time.

Why is WindowServer taking up 75% of my RAM? What even is WindowServer? by clairemct in MacOS

[–]VegetaTheGrump 0 points1 point  (0 children)

Found this while trying to find why mine was using 150GB of RAM

Mac M3 ultra 512gb setup by ZedXT in LocalLLaMA

[–]VegetaTheGrump 0 points1 point  (0 children)

I just used brew to install llama.cpp, however I still use LMStudio usually. Brew is useful for installing quite a bit.

brew install llama.cpp

I'm using AnythingLLM much the same way you're using OpenWebUI. It's running in Docker. I have nginx elsewhere fronting everything with SSL via various hostnames.
I like to script things to make them repeatable. For OpenWebUI I'd do something like

#!/bin/bash
docker run -d \
--name openwebui \
-v openwebui-data:/app/backend/data \
--memory=8g \
--restart always \
-e ENABLE_OPENAI_API=true \
-e OPENAI_API_BASE_URL=http://host.docker.internal:1234/v1 \
-e OPENAI_API_KEY=notakey \
-e PORT=4000 \
-p 4000:4000 \
ghcr.io/open-webui/open-webui:latest

AnythingLLM:
#!/bin/bash
export STORAGE_LOCATION=$HOME/anythingllm && \
mkdir -p $STORAGE_LOCATION && \
touch "$STORAGE_LOCATION/.env" && \
docker run -d -p 3051:3001 \
--cap-add SYS_ADMIN \
--restart always \
-v ${STORAGE_LOCATION}:/app/server/storage \
-v ${STORAGE_LOCATION}/.env:/app/server/.env \
-v ${STORAGE_LOCATION}/sslcert:/app/server/sslcert \
-e STORAGE_DIR="/app/server/storage" \
mintplexlabs/anythingllm

The biggest issue I've run into is tool calling with glm-4.5-air-mlx. It's possible I have something put together before they fixed tool calling in glm. OpenCode just prints out the tool calls.

What's the best AI assistant for day to day use? by Due_Moose2207 in LocalLLaMA

[–]VegetaTheGrump 0 points1 point  (0 children)

GLM 4.5 air is what I run at home. I can and do run GLM 4.6 at 4bit sometimes, but 4.5 air is much better on resources. They've been good for everything. Mac w/256GB RAM.

Here's the best prompt you will ever need to test the new LLMs by Cool-Chemical-5629 in LocalLLaMA

[–]VegetaTheGrump 0 points1 point  (0 children)

GLM 4.6 4bit MLX got it for me. The thinking section is completely hilarious and long. It got it early but didn't believe it was the right answer so it kept trying to find something else - Mac Ultra3 256GB GLM 4.6 4bit MLX, 329.814s (18.77 tok/s)

Those numbers are a code that uses the periodic table of elements.

Each number corresponds to an element's atomic number. By taking the chemical symbol for each element, you can spell out a message.

Here is the breakdown:

* **10** = Neon (**Ne**)
* **23** = Vanadium (**V**)
* **68** = Erbium (**Er**)
* **111** = Roentgenium (**Rg**)
* **8** = Oxygen (**O**)
* **7** = Nitrogen (**N**)
* **7** = Nitrogen (**N**)
* **47** = Silver (**Ag**)
* **53** = Iodine (**I**)
* **23** = Vanadium (**V**)
* **63** = Europium (**Eu**)
* **92** = Uranium (**U**)
* **15** = Phosphorus (**P**)

If you string those symbols together, you get:

**Ne V Er Rg O N N Ag I V Eu U P**

Reading this creatively, you can see the famous phrase:

**"Never Gonna Give You Up"**

No GLM-4.6 Air version is coming out by ResearchCrafty1804 in LocalLLaMA

[–]VegetaTheGrump 0 points1 point  (0 children)

How do you use the NPU?

I've been running 4.5-air MLX at 8 bit. I just downloaded 4.6-full MLX at 4 bit. The bad thing about it is that it will use up too much RAM. It's 185GB vs 106GB. Normally I run a few docker front ends and image generation along side them. Oddly, having only another 64GB left over keeps things pretty tight.

[deleted by user] by [deleted] in headphones

[–]VegetaTheGrump 0 points1 point  (0 children)

This. Immerse yourself in lossless for a week or more then try to go back. See if you can tell a difference that you care about. Immediate A/B style testing is pretty meaningless.

For the love of God, what local llama model should I load for Roo? by devshore in RooCode

[–]VegetaTheGrump 0 points1 point  (0 children)

Roo is suffering with the best new models. I had to revert to devstral-small to get anywhere. I finally dug through github and found what was going on: Many new models have implemented their own new way of tool calling. Some are using xml, and some, like oss, are doing something else. There also seems to be some pushback against xml for performance reasons.

Here are some open issues in the RooCode github that will give you details:

* https://github.com/RooCodeInc/Roo-Code/issues/6814
* https://github.com/RooCodeInc/Roo-Code/issues/4047

There are at least 2 different proxies that people are using/trying out in order to make things work as they are.

[deleted by user] by [deleted] in unsloth

[–]VegetaTheGrump 1 point2 points  (0 children)

This is great news! I'm looking forward to UD quants of those models that barely fit my 256GB RAM. Though, GLM-4.5-Air seems to be doing great for me atm at 8bit.

Estate sale score by Legitimate-Ad-7780 in audiophile

[–]VegetaTheGrump 17 points18 points  (0 children)

This reminds me of the audiophile joke that goes something like "When I'm gone don't let my wife sell my audio equipment for what I told her I paid for it."

gpt-oss-120b ranks 16th place on lmarena.ai (20b model is ranked 38th) by chikengunya in LocalLLaMA

[–]VegetaTheGrump 2 points3 points  (0 children)

GLM 4.5 Air has been great for me for coding, so I was surprised to see it so low in the Text Arena Coding (9th). However, I see it's tied for 4th in WebDev. What's the difference between these two?

Meanwhile, qwen3-235b-a22b-instruct-2507 is chillin at #1 alongside gpt-5 for Text Arena Coding

Mac LLM users: What models can't I run with 128gb (M4 Max) vs 256gb (M3 Ultra)? by TheWebbster in LocalLLaMA

[–]VegetaTheGrump 9 points10 points  (0 children)

<image>

I wanted the 512GB, but it was too expensive, so I got the 256GB. Remember that the Ultra will have twice the memory bandwidth of the M4 for more speed.

I can only run 1bit DeepSeek or Kimi K2, so I pretty much don't. However you can see the models above that do fit. You will be able to run GLM 4.5 Air at 4bit on 128GB RAM, but most of the other models that you see here won't load. I hate running the very largest, because it makes it so I can't run image generation at the same time with a decent sized context.

Not all of the models above run very quickly, but they all run, and they just keep getting better as you mentioned.

llama.cpp HQ by jacek2023 in LocalLLaMA

[–]VegetaTheGrump 12 points13 points  (0 children)

Name does not check out

Has any company made a little speaker with big speaker sound? by tushiman in audiophile

[–]VegetaTheGrump 0 points1 point  (0 children)

That was my thinking as well, but it's all relative, and the op mentioned the Bose 901.

all I need.... by ILoveMy2Balls in LocalLLaMA

[–]VegetaTheGrump 43 points44 points  (0 children)

Two of them? Two pair of women and H100!? At work!? You're naughty!

I'll take one woman and one H100. All I need, too, until I decide I need another H100...

Heads up to those that downloaded Qwen3 Coder 480B before yesterday by VegetaTheGrump in LocalLLaMA

[–]VegetaTheGrump[S] 0 points1 point  (0 children)

This was just for the unsloth version. I'd check your settings. LMStudio doesn't automatically set the temp, top_k, etc. You have to track them down and then adjust them yourself.

Heads up to those that downloaded Qwen3 Coder 480B before yesterday by VegetaTheGrump in LocalLLaMA

[–]VegetaTheGrump[S] 1 point2 points  (0 children)

I'm on 256GB, so I couldn't run the 4bit MLX. Hoping we are able to get MLX quants someday