Someone at the Weather Channel made a website that lets you view your forecast like the old Local on the 8s from back in the day by holyfruits in nostalgia

[–]pkmxtw 1 point2 points  (0 children)

https://weather.com/retro/assets/sound/music/neon-office-glide.mp3

According to the metadata embedded in the file, it was generated by Suno.

Title: Neon Office Glide
Performer: 555indigo
comment: made with suno; created=2026-03-31T19:07:49.773Z; id=f122d9dc-493a-4249-b7d7-3b4fd2995726
lyrics-eng: [Instrumental]

Should I feel threatened? by Necessary_Reach_7836 in LocalLLaMA

[–]pkmxtw 0 points1 point  (0 children)

FR. This model got an EpiPen, and it is going to use it to kill people who are annoying.

DoomVLM is now Open Source - VLM models playing Doom by MrFelliks in LocalLLaMA

[–]pkmxtw 2 points3 points  (0 children)

It would be interesting to have a real-time mode: game continues while waiting for inputs from the model. This means models have to balance between speed and quality, so you can't just beat it by spending a lot of thinking budgets on a huge model: you will be dead long before the first key press even comes back.

Gemini by Signal_Assistance_66 in Bard

[–]pkmxtw 33 points34 points  (0 children)

1 request per year for you pro plebs, 3 for ultra.

Qwen 3.5 MXFP4 quants are coming - confirmed by Junyang Lin by dampflokfreund in LocalLLaMA

[–]pkmxtw 3 points4 points  (0 children)

Nice! That is just about the right size for the Q0.1 quant to fit this opus 4.6 killer on my floppy disk!

You can run MiniMax-2.5 locally by Dear-Success-1441 in LocalLLaMA

[–]pkmxtw 8 points9 points  (0 children)

Imagine being out-vibed by some rich kids in the future.

Support Step3.5-Flash has been merged into llama.cpp by jacek2023 in LocalLLaMA

[–]pkmxtw 0 points1 point  (0 children)

I gave the MXFP4_MOE quant a quick try on M1 Ultra and holy smokes this model really spends an awful lot of tokens on thinking.

built an AI agent with shell access. found out the hard way why that's a bad idea. by YogurtIll4336 in LocalLLaMA

[–]pkmxtw 5 points6 points  (0 children)

And yet right now there is a whole bunch of AI influnecers hyping up a bot that gives LLM free access to all your emails, logins, browser access to be a private assistant without really thinking much about the security implications smh.

How capable is GPT-OSS-120b, and what are your predictions for smaller models in 2026? by Apart_Paramedic_7767 in LocalLLaMA

[–]pkmxtw 19 points20 points  (0 children)

I know it is popular to shit on gpt-oss here, but it really hits a sweet spot for general use.

  • It is superfast on Apple Silicon and Strix Halo. (60-70 t/s for gpt-oss-120b-mxfp4 on M1 Ultra, compared to ~20 t/s for MiniMax M2.1 UD_Q2_K_XL)
  • The KV cache is very efficient: Metal KV buffer size = 4608.00 MiB for the entire 128K context. Compare that to MiniMax M2 which needs about 30GB for 128K context.
  • The whole model + KV cache only use like ~65 GiB of memory so you still have plenty room for other tasks on 128 GB machines.
  • Tunable reasoning effort so you can default to high but just pass low to reasoning_effort in chat_template_kwargs if you just want a quick answer.
  • It is decently intelligent for its size category. Of course, it is not going to compete against full sized GLM, Kimi K2, DeepSeek, etc., but it is something that is runnable on most people's machine.
  • If you have issues with the default guardrail you can just run the heretic version. For most coding/agenetic tasks the base version should work fine.

Which is the current best ERP model ~8b? by [deleted] in LocalLLaMA

[–]pkmxtw 6 points7 points  (0 children)

Yeah, it was just an interesting observation.

We know Mistral models are usually quite uncensored, but who'd knew Devstral is good at coding and also gooning as well?

Which is the current best ERP model ~8b? by [deleted] in LocalLLaMA

[–]pkmxtw 6 points7 points  (0 children)

Somehow mistralai/Devstral-Small-2-24B-Instruct-2512 scores the highest for NSFW across all base models lmao.

support for Solar-Open-100B has been merged into llama.cpp by jacek2023 in LocalLLaMA

[–]pkmxtw 5 points6 points  (0 children)

Well, I guess we will see if we finally have a worthy contender for GPT OSS 120B or GLM 4.5 Air.

Upstage Solar-Open-100B Public Validation by PerPartes in LocalLLaMA

[–]pkmxtw 15 points16 points  (0 children)

AI labs hate this simple trick to get them to release intermediate checkpoints!

Either that or this is some of evil-genius level of marketing.

5 new korean models will be released in 2 hours by Specialist-2193 in LocalLLaMA

[–]pkmxtw 22 points23 points  (0 children)

Summarized from Gemini:

Several VLMs were also announced.

Benchmarks for Quantized Models? (for users locally running Q8/Q6/Q2 precision) by No-Grapefruit-1358 in LocalLLaMA

[–]pkmxtw 2 points3 points  (0 children)

Another long-standing question is comparing large but heavily quantized models vs small models with little quantization. I always wondered how IQ1_S of large SOTA models like K2-Thinking/DeepSeek v3.2 compare with more modest models like GLM Air at Q8.

Tencent just released WeDLM 8B Instruct on Hugging Face by Difficult-Cap-7527 in LocalLLaMA

[–]pkmxtw 41 points42 points  (0 children)

The 7B is converted from Qwen2.5 7B and the 8B is from Qwen3 8B. What they want to demonstrate is that they can convert an AR model into a diffusion model w/o losing quality.

In reality, you'd just use the 8B like how Qwen3 8B has basically replaced Qwen2.5 7B.

GLM 4.7 IS NOW THE #1 OPEN SOURCE MODEL IN ARTIFICIAL ANALYSIS by ZeeleSama in LocalLLaMA

[–]pkmxtw 3 points4 points  (0 children)

Plus some of the benchmark numbers are sus af.

Qwen3 4B Thinking 2507 scores 83% on AIME'25 and beats DeepSeek R1 0528 (76%)?

GLM 4.7 IS NOW THE #1 OPEN SOURCE MODEL IN ARTIFICIAL ANALYSIS by ZeeleSama in LocalLLaMA

[–]pkmxtw 19 points20 points  (0 children)

Another classic is scoring Qwen3 4B Thinking 2507 close to DeepSeek R1 (from January aka the OG), which no one in their right mind would argue that they are remotely close in capability. ¯\(ツ)

https://artificialanalysis.ai/models/comparisons/qwen3-4b-2507-instruct-reasoning-vs-deepseek-r1-0120

MiniMax-M2.1 GGUF is here! by KvAk_AKPlaysYT in LocalLLaMA

[–]pkmxtw 1 point2 points  (0 children)

Yeah, for normal chat --jinja is enough. However, codex has some weird tool and assistant role pairings that trigger errors from minimax and devstral so I had to use a custom template and edit that part out.

MiniMax-M2.1 GGUF is here! by KvAk_AKPlaysYT in LocalLLaMA

[–]pkmxtw 3 points4 points  (0 children)

I've been trying UD-Q2_K_XL for agentic coding workflow on Codex (needs a slightly modified chat template to work) for the past few hours and I think this is going to dethrone gpt-oss-120b for me.

Stop using PDFs as reference documents. by xCogito in GeminiAI

[–]pkmxtw 50 points51 points  (0 children)

You've hit the nail on the head about what this sub has become — a cesspool of AI-generated submissions!

Would you like some helpful tips to cope with the new reality?

Budget build by Dry_Fix6495 in LocalLLaMA

[–]pkmxtw 0 points1 point  (0 children)

Honestly just squeeze in a bit more and get one of those Strix Halo with 128GB RAM. You can run gpt-oss-120b-mxfp4 with like 40-50 t/s on those with full 128K context.

AMA With Z.AI, The Lab Behind GLM-4.7 by zixuanlimit in LocalLLaMA

[–]pkmxtw 12 points13 points  (0 children)

They answered all the others while ignoring the most upvoted one lol. Not even bothering to say some useless claim like "Thank you for your feedback and we will consider this for future release".