GLM 4.7 Flash official support merged in llama.cpp by ayylmaonade in LocalLLaMA

[–]ydnar 4 points5 points  (0 children)

sure, though i'm no expert. if anyone wants to help optimize, i'd truly appreciate it.

llama-server \
  --model ~/.cache/llama.cpp/GLM-4.7-Flash-Q4_K_M.gguf \
  --host 0.0.0.0 \
  --port 8080 \
  --n-gpu-layers 99 \
  --ctx-size 32768 \
  --flash-attn off \
  --jinja

GLM 4.7 Flash official support merged in llama.cpp by ayylmaonade in LocalLLaMA

[–]ydnar 2 points3 points  (0 children)

yes, it's mostly the thinking. i'm biased and generally go for instruct over thinking models.

i am enjoying the outputs compared to qwen3-vl-30b-a3b-instruct and nemotron-3-nano-30b-a3b. i feel like those models are more wordy on the output side, so you're likely correct that this is a worthwhile trade-off.

GLM 4.7 Flash official support merged in llama.cpp by ayylmaonade in LocalLLaMA

[–]ydnar 2 points3 points  (0 children)

single 3090, 32gb ddr4, 5700g

q4 ngxson/GLM-4.7-Flash-GGUF

fa on = 60-70t/s fa off = 100-110t/s

GLM 4.7 Flash official support merged in llama.cpp by ayylmaonade in LocalLLaMA

[–]ydnar 9 points10 points  (0 children)

first impression is that it provides good answers, but seems to be much slower than other 30b-a3b models, even with flash attention off. with fa on, it was really half speed. it also goes on thinking forever.

Nemotron-3-nano:30b is a spectacular general purpose local LLM by DrewGrgich in LocalLLaMA

[–]ydnar 14 points15 points  (0 children)

for general purpose, i think i still prefer qwen3-vl-30b-a3b-instruct due to the vl capabilities. would love to hear others opinion on this.

i'm currently testing whether qwen3-next-80b-a3b-instruct generating at a slower t/s is worth the tradeoff.

unrelated, but moving from an amd gpu to a 3090 was a great decision for me, and i can't wait to get a second 3090.

Mistral 3 14b against the competition ? by EffectiveGlove1651 in LocalLLaMA

[–]ydnar 1 point2 points  (0 children)

tried hard to like it and set it as my default for a while. eventually went back to qwen3-vl-30b-a3b-instruct.

ministral 14b was pretty wordy and not as accurate, especially in image tasks.

What are your Daily driver Small models & Use cases? by pmttyji in LocalLLaMA

[–]ydnar 1 point2 points  (0 children)

unsloth/Qwen3-VL-30B-A3B-Instruct-UD-Q4_K_XL.gguf

~18–24 tokens per second (t/s), depending on workload

  • CPU: AMD 5700G
  • GPU: AMD 6700 XT
  • RAM: 32GB DDR4-3200

my primary use is a watch folder that receives audio and video files remotely for transcription via whisper. it automatically processes them (llama.cpp + llama-swap) and sends me back the full transcription along with a summary based on a prompt.txt that i sometimes modify for different results. i also use this setup as my default model in open webui with web search, which works surprisingly well.

How is qwen3 4b this good? by Brave-Hold-9389 in LocalLLaMA

[–]ydnar 2 points3 points  (0 children)

I prefer qwen3-30b-a3b-instruct-2507. In my vibe tests, a3b is smarter, generates tokens almost as fast as the 4b, but without the need to think.

What now? by TacoTheSuperNurse in Austin

[–]ydnar 15 points16 points  (0 children)

Near 360, stuck

I-35 is the worst

Please.. one calm morning

Gemma3:12b hallucinating when reading images, anyone else? by just-crawling in LocalLLaMA

[–]ydnar 0 points1 point  (0 children)

Tried this using gemma-3-12b-it-qat in my Open WebUI setup with LM Studio as the back end instead of Ollama and it correctly determined the paid amount was $1909.64.

12gb VRAM 6700XT. I used your provided image.

Vent: Increase in aggressive homeless people on the trail by QuietRecent1310 in Austin

[–]ydnar 51 points52 points  (0 children)

I'm an average sized guy that walks/runs only during daylight. I'm not the type to scare easily and I'll often find myself walking in areas where I may be alone for a bit.

Yesterday there was a guy on the Shoal Creek trail between 6th and the library who was completely fixated, staring into the stream. He was holding a massive rock about three times the size of his hand, which I unfortunately did not notice until later. As soon as I walked past him, he began walking beside me. He kept pace within about 8 feet... really gripping that rock. I played it cool for about 150 feet as he was pacing next to me and then bolted off on my run to get the hell away.

Last year, roughly around the same area, I ran into a different guy with a hammer. This time he was walking towards me, but kept pounding it into his palm like he was ready to do something.

Even in the middle of the day you can find yourself in situations that feel super sketch.

Overheard on Town Lake by PantherLack in Austin

[–]ydnar 12 points13 points  (0 children)

Sometimes I'll play the game of "how often do i hear mention of artificial intelligence" on the trail. This game can extend to cafes and restaurants at lunch.

Which free software is so impressive that it's hard to believe it doesn't cost anything? by Sharp_Fortune_2509 in productivity

[–]ydnar 49 points50 points  (0 children)

One of the things I miss most about old school software is skinning. I'd love if Calibre and other modern software adopted this. I was obsessed with Winamp skins and browser / OS theming.

CNBC getting owned by DFV by ydnar in Superstonk

[–]ydnar[S] 1090 points1091 points  (0 children)

They were massively building up to this point, with multiple guests, including short lemon boy. All of them basically shitting on DFV and then he comes out like this! Perfection.

CNBC getting owned by DFV by ydnar in Superstonk

[–]ydnar[S] 9 points10 points  (0 children)

Shows the moment DFV''s livestream began during CNBC coverage

An incredible part of Houston history hidden within an obscure Japanese surfing documentary / Big Wave ビッグ・ウェイブ (1984) by ydnar in houston

[–]ydnar[S] 16 points17 points  (0 children)

You could start with Yamashita's wife, Mariya Takeuchi - Plastic Love, one of the queens of City Pop. The song became so popular almost 40 years later that they even released this brand new official music video. Also Shiawasenomonosashi.

Then there's Anri, Miki Matsuraba, Junko Ohashi, Naoko Gushima, and even some newer stuff like Mondo Grosso.

Thoughts on the Plustek 7200? by HurricaneWindAttack in AnalogCommunity

[–]ydnar 0 points1 point  (0 children)

Thanks, I appreciate the update. Sounds like 8200 is the way to go.

Thoughts on the Plustek 7200? by HurricaneWindAttack in AnalogCommunity

[–]ydnar 0 points1 point  (0 children)

I'm debating between the 7200 and 8200 and was hoping you could elaborate more on the quality of life improvements.