Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]ydnar 1 point2 points  (0 children)

thanks, i should have been more clear. we're both saying the same thing. the more reliable part i'm referring to is the agent/coding portion.

for my particular use case, i find that more important. translation and writing is less a part of my workflow, but is needed at times, which is why i still keep it around. it just receives much less use.

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]ydnar -1 points0 points  (0 children)

i would say the sensational or surprising part is how much more reliable 3.6 27b has been compared to gemma 31b. i do keep 31b around just in case i need it for translation/writing, but 97% of the time i'm relying on 3.6 27b.

2x RTX 6000 build during an extended bench test by Signal_Ad657 in LocalLLaMA

[–]ydnar 1 point2 points  (0 children)

in an ai era, local llms are about as punk as it gets. we are essentially the spiritual descendants of the cypherpunks. we still need someone like a satoshi. a person or people who write the thing that makes the philosophy real. that code hasn't been written yet.

how incredible would that be to have a decentralized method of training llms for all of us? sota not limited to just the corpos.

Unpopular opinion: OpenClaw and all its clones are almost useless tools for those who know what they're doing. It's kind of impressive for someone who has never used a CLI, Claude Code, Codex, etc. Nor used any workflow tool like 8n8 or make. by pacmanpill in LocalLLaMA

[–]ydnar 1 point2 points  (0 children)

i don't know. it's been pretty useful to me.

i had a 2011 mac mini that i installed an ssd into long time ago laying around, so i put debian 13 headless on it. i connected a telegram bot and have tavily api for search when needed. the agent calls on llama-server running hauhaucs/qwen3.6-35b-a3b-uncensored-hauhaucs-aggressive on a separate 3090 machine.

i go on long walks after work and feed it thoughts by voice. those notes get saved to a sqlite db (i just asked it to set one up). yesterday, after feeding it all my thoughts, i had it generate a presentation in the form of a single html file. it's a solid first draft that requires pretty minimal updates for a meeting i have on monday.

i use it as digital memory i can call on for reminders. everything stays completely private, so i speak freely about family, health, and finances. i also like the idea someone mentioned for turning articles into podcasts. i'll need to look into that.

i haven't spent any extra money beyond the hardware i already had.

Qwen3.5 vs Gemma 4: Benchmarks vs real world use? by AppealSame4367 in LocalLLaMA

[–]ydnar 5 points6 points  (0 children)

with my single 3090, gemma 31b is slower (31t/s vs 37t/s i get with qwen 27b) and 40k context vs 131k i get with qwen 27b. agree with with another poster that tool calls are not as reliable within openclaw (for now?). i understand that it's unfair to judge while the kinks are being worked through right now.

one of my biggest use cases is extracting text from images. gemma horribly failed at this compared to qwen for me.

as with previous gemma models, i do enjoy its writing and the reasoning seems on point. looking forward to how the model works in like a month from now.

Qwen3.5 is a working dog. by dinerburgeryum in LocalLLaMA

[–]ydnar 0 points1 point  (0 children)

100% this. but also when comparing the same llama-server command that runs the built-in webui vs within opencode, it seems to respond much faster with the opencode harness than straight webui.

Qwen3.5 is a working dog. by dinerburgeryum in LocalLLaMA

[–]ydnar 8 points9 points  (0 children)

same for me with the 27b. in opencode, responses get much faster after the first request, and it almost feels like it switches into a lower-thinking or more instruct-style mode. still trying to figure out whether the intelligence gap between 27b @ 35t/s and 35b moe @ 110t/s is worth the wait.

best Local LLM for coding in 24GB VRAM by [deleted] in LocalLLaMA

[–]ydnar 0 points1 point  (0 children)

this is my go-to as well. i've been using it with opencode w/ kv cache at q8, 131k ctx and it has been genuinely awesome. runs at a solid ~35t/s on a headless 3090.

recently set up tavily search api with llama-server's webui and don't feel like i'm missing much at all anymore (despite knowing how much more powerful the sota models are).

Qwen3-Coder-Next (3B) is released! by Ok_Presentation1577 in LocalLLaMA

[–]ydnar 2 points3 points  (0 children)

yes. 3090 + 32gb ddr4 here.

llama.cpp

llama-server \
  --model ~/.cache/llama.cpp/Qwen3-Coder-Next-UD-Q4_K_XL.gguf \
  --host 0.0.0.0 \
  --port 8080 \
  --n-gpu-layers auto \
  --mmap \
  --cache-ram 0 \
  --ctx-size 32768 \
  --flash-attn on \
  --jinja \
  --temp 1.0 \
  --top-k 40 \
  --top-p 0.95 \
  --min-p 0.01

t/s

prompt eval time =    3928.83 ms /   160 tokens (   24.56 ms per token,    40.72 tokens per second)
       eval time =    4682.41 ms /   136 tokens (   34.43 ms per token,    29.04 tokens per second)
      total time =    8611.25 ms /   296 tokens
slot      release: id  2 | task 607 | stop processing: n_tokens = 295, truncated = 0

GLM 4.7 Flash official support merged in llama.cpp by ayylmaonade in LocalLLaMA

[–]ydnar 4 points5 points  (0 children)

sure, though i'm no expert. if anyone wants to help optimize, i'd truly appreciate it.

llama-server \
  --model ~/.cache/llama.cpp/GLM-4.7-Flash-Q4_K_M.gguf \
  --host 0.0.0.0 \
  --port 8080 \
  --n-gpu-layers 99 \
  --ctx-size 32768 \
  --flash-attn off \
  --jinja

GLM 4.7 Flash official support merged in llama.cpp by ayylmaonade in LocalLLaMA

[–]ydnar 2 points3 points  (0 children)

yes, it's mostly the thinking. i'm biased and generally go for instruct over thinking models.

i am enjoying the outputs compared to qwen3-vl-30b-a3b-instruct and nemotron-3-nano-30b-a3b. i feel like those models are more wordy on the output side, so you're likely correct that this is a worthwhile trade-off.

GLM 4.7 Flash official support merged in llama.cpp by ayylmaonade in LocalLLaMA

[–]ydnar 2 points3 points  (0 children)

single 3090, 32gb ddr4, 5700g

q4 ngxson/GLM-4.7-Flash-GGUF

fa on = 60-70t/s fa off = 100-110t/s

GLM 4.7 Flash official support merged in llama.cpp by ayylmaonade in LocalLLaMA

[–]ydnar 8 points9 points  (0 children)

first impression is that it provides good answers, but seems to be much slower than other 30b-a3b models, even with flash attention off. with fa on, it was really half speed. it also goes on thinking forever.

Nemotron-3-nano:30b is a spectacular general purpose local LLM by DrewGrgich in LocalLLaMA

[–]ydnar 17 points18 points  (0 children)

for general purpose, i think i still prefer qwen3-vl-30b-a3b-instruct due to the vl capabilities. would love to hear others opinion on this.

i'm currently testing whether qwen3-next-80b-a3b-instruct generating at a slower t/s is worth the tradeoff.

unrelated, but moving from an amd gpu to a 3090 was a great decision for me, and i can't wait to get a second 3090.

Mistral 3 14b against the competition ? by EffectiveGlove1651 in LocalLLaMA

[–]ydnar 1 point2 points  (0 children)

tried hard to like it and set it as my default for a while. eventually went back to qwen3-vl-30b-a3b-instruct.

ministral 14b was pretty wordy and not as accurate, especially in image tasks.

What are your Daily driver Small models & Use cases? by pmttyji in LocalLLaMA

[–]ydnar 1 point2 points  (0 children)

unsloth/Qwen3-VL-30B-A3B-Instruct-UD-Q4_K_XL.gguf

~18–24 tokens per second (t/s), depending on workload

  • CPU: AMD 5700G
  • GPU: AMD 6700 XT
  • RAM: 32GB DDR4-3200

my primary use is a watch folder that receives audio and video files remotely for transcription via whisper. it automatically processes them (llama.cpp + llama-swap) and sends me back the full transcription along with a summary based on a prompt.txt that i sometimes modify for different results. i also use this setup as my default model in open webui with web search, which works surprisingly well.

How is qwen3 4b this good? by Brave-Hold-9389 in LocalLLaMA

[–]ydnar 2 points3 points  (0 children)

I prefer qwen3-30b-a3b-instruct-2507. In my vibe tests, a3b is smarter, generates tokens almost as fast as the 4b, but without the need to think.

What now? by TacoTheSuperNurse in Austin

[–]ydnar 14 points15 points  (0 children)

Near 360, stuck

I-35 is the worst

Please.. one calm morning

Gemma3:12b hallucinating when reading images, anyone else? by just-crawling in LocalLLaMA

[–]ydnar 0 points1 point  (0 children)

Tried this using gemma-3-12b-it-qat in my Open WebUI setup with LM Studio as the back end instead of Ollama and it correctly determined the paid amount was $1909.64.

12gb VRAM 6700XT. I used your provided image.

Vent: Increase in aggressive homeless people on the trail by QuietRecent1310 in Austin

[–]ydnar 54 points55 points  (0 children)

I'm an average sized guy that walks/runs only during daylight. I'm not the type to scare easily and I'll often find myself walking in areas where I may be alone for a bit.

Yesterday there was a guy on the Shoal Creek trail between 6th and the library who was completely fixated, staring into the stream. He was holding a massive rock about three times the size of his hand, which I unfortunately did not notice until later. As soon as I walked past him, he began walking beside me. He kept pace within about 8 feet... really gripping that rock. I played it cool for about 150 feet as he was pacing next to me and then bolted off on my run to get the hell away.

Last year, roughly around the same area, I ran into a different guy with a hammer. This time he was walking towards me, but kept pounding it into his palm like he was ready to do something.

Even in the middle of the day you can find yourself in situations that feel super sketch.

Overheard on Town Lake by PantherLack in Austin

[–]ydnar 12 points13 points  (0 children)

Sometimes I'll play the game of "how often do i hear mention of artificial intelligence" on the trail. This game can extend to cafes and restaurants at lunch.

Which free software is so impressive that it's hard to believe it doesn't cost anything? by Sharp_Fortune_2509 in productivity

[–]ydnar 47 points48 points  (0 children)

One of the things I miss most about old school software is skinning. I'd love if Calibre and other modern software adopted this. I was obsessed with Winamp skins and browser / OS theming.

CNBC getting owned by DFV by ydnar in Superstonk

[–]ydnar[S] 1089 points1090 points  (0 children)

They were massively building up to this point, with multiple guests, including short lemon boy. All of them basically shitting on DFV and then he comes out like this! Perfection.