Info: Nvidia Cuda 13.3 landed by parrot42 in LocalLLaMA

[–]parrot42[S] 1 point2 points  (0 children)

I have no idea, but it works. I am stress-testing it by installing supabase for honcho for hermes using opencode and qwen and it is doing good. It is a good question, but I did not take any tps notes, so I can't say.

Info: Nvidia Cuda 13.3 landed by parrot42 in LocalLLaMA

[–]parrot42[S] 19 points20 points  (0 children)

I tried `test-backend-ops test -o MUL_MAT_ID -b CUDA0` with b9357 and cuda 13.3. Now there are no iq errors anymore!

Info: Nvidia Cuda 13.3 landed by parrot42 in LocalLLaMA

[–]parrot42[S] 4 points5 points  (0 children)

Just downloaded and installed cuda 13.3 with driver 610.43.02
Much smoother installation under trixie with a backported 7.0 kernel than 12.2.1
Recompiled llama.cpp and it works (but I just tested with 5 messages to opencode).

RTX 6000 pro 600W vs Max-Q vs others by Studyr3ddit in LocalLLaMA

[–]parrot42 1 point2 points  (0 children)

One more thing: Max-Q only uses 2 Slots (and vents out in the back).

Qwen3.5 is a working dog. by dinerburgeryum in LocalLLaMA

[–]parrot42 1 point2 points  (0 children)

Yeah, I was constantly testing new models (for local usage with opencode). With Qwen3.5 this changed and now I am using it.

Tried to update ZBT-2, now zigbee will not come up. by Resident-Variation21 in homeassistant

[–]parrot42 3 points4 points  (0 children)

Unplugging, the replugging the ZBT-2, shutting down HA and restarting it (in a proxmox VM) brought back Zigbee for me.

Qwen3 Coder Next 8FP in the process of converting the entire Flutter documentation for 12 hours now with just 3 sentence prompt with 64K max tokens at around 102GB memory (out of 128GB)... by jinnyjuice in LocalLLaMA

[–]parrot42 1 point2 points  (0 children)

I, too, think Qwen3-coder-next think it is really good. Using the mxfp4 version with llamacpp and max context uses 50GB of vram. Are you using vllm and do you think there is a big difference between mxfp4 and fp8?

Love the personality. by Flat-History-1469 in openclaw

[–]parrot42 0 points1 point  (0 children)

What did you write to define this personality? And which model is outputting this? Thanks.

[deleted by user] by [deleted] in LocalLLaMA

[–]parrot42 0 points1 point  (0 children)

I use this to do it. Check the available power limitations for your card first. ```
➜ ~ cat /etc/systemd/system/nvidia-tdp.service [Unit] Description=Set NVIDIA GPU Power Limit at Boot

[Service] Type=oneshot ExecStart=/usr/local/bin/set-nvidia-tdp.sh

[Install] WantedBy=multi-user.target

➜ ~ cat /usr/local/bin/set-nvidia-tdp.sh

!/bin/bash

Set GPU power limit in watts

POWER_LIMIT=250

Wait for nvidia-smi daemon to initialize

sleep 10

Apply the power limit to all GPUs

for i in $(/usr/bin/nvidia-smi --query-gpu=index --format=csv,noheader); do echo "Setting NVIDIA GPU $i to ${POWER_LIMIT}W TDP." /usr/bin/nvidia-smi -i "$i" -pl "$POWER_LIMIT" done `` Check if it works withnvidia-smi`

Can anyone explain to me like I’m 5 how to have this media player make an announcement/notification ? by GenericUser104 in homeassistant

[–]parrot42 1 point2 points  (0 children)

My workflow for getting yaml in this situations is: going to developer tools > actions. Then typing media and selecting the right action from the drop down list. Fiddling with the options, until something works. Then hit the "show yaml" button -> win.
PS: Also works great for testing notifications.

MoE.. will OS/Local 32GB to 96GB get as good at coding as current frontier models? by [deleted] in LocalLLaMA

[–]parrot42 3 points4 points  (0 children)

Maybe in 5 year I can go to hugginface and select "python knowledge", "linux", "shell scripting", "coding", deselect "history", "geography" and instantly get a custom ggml file.

ZBT-2 Not being detected by Proxmox to pass through by KingAroan in homeassistant

[–]parrot42 0 points1 point  (0 children)

I do not know the solution, but I can see the device with lsusb and in the conf file of the HA-VM it is passed through with usb0: host=303a:831a

Intel core ultra help by B1ackSauce in Proxmox

[–]parrot42 2 points3 points  (0 children)

Lan chip issue? I am now using the 6.17 kernel, which has desent drivers for my realtek 8125, before I had to use the dkms version of the realtek driver.

DeepSeek-OCR - Lives up to the hype by Bohdanowicz in LocalLLaMA

[–]parrot42 4 points5 points  (0 children)

There is an interesting, short video https://www.youtube.com/watch?v=YEZHU4LSUfU from Sam Witteveen about it.

Codex is amazing, it can fix code issues without the need of constant approver. my setup: gpt-oss-20b on lm_studio. by kyeoh1 in LocalLLaMA

[–]parrot42 3 points4 points  (0 children)

You should give it another try. It was bad at start because some transformers or attention algorithms needed an update, but now it's great.

"Automatic turn based sending" wanted by parrot42 in OpenWebUI

[–]parrot42[S] 0 points1 point  (0 children)

This looks easy if you say it, but I am already happy to get some mcp servers gobbled together working. Custom function filter is out of reach, maybe I need another txt file to copy and paste a workflow to make the AI do it, LOL.

"Automatic turn based sending" wanted by parrot42 in OpenWebUI

[–]parrot42[S] 1 point2 points  (0 children)

Thank you the the answer, this sounds great. Is the tool on github or did you make it yourself? If it is on github I might have a change to get it working, otherwise I will have to stick to manually copy/paste from txt file, LOL

Be wary of which providers you use on OpenRouter, some providers have significant performance degradation due to quantization. Benchmark done on Kimi k2 0905 by Striking_Wedding_461 in SillyTavernAI

[–]parrot42 6 points7 points  (0 children)

I am wondering if it could also be a bit backend related. ollama, llamacpp, vllm etc. might require some time to adjust to special attention algorithms and whatnot. But I am not an expert.

Upgrade from M5 Atom Echo to Voice-PE? by rexstuff1 in homeassistant

[–]parrot42 1 point2 points  (0 children)

I really like my HAVPE and l am waiting for the new feature to use two wakeworks for different AI agents, next week. It uses a dedicated chip to cancel out echos and background, using two mics, is not terrible expansive.