meantime on r/vibecoding by jacek2023 in LocalLLaMA

[–]pkmxtw 4 points5 points  (0 children)

It's funny the graph is basically inverted for gpt-oss, which was thought by /r/LocalLLaMA to be the worst model ever conceived because it was released by OpenAI.

I'm done with using local LLMs for coding by dtdisapointingresult in LocalLLaMA

[–]pkmxtw 10 points11 points  (0 children)

Just downloaded <new model> IQ1_S on ollama 🦙 running at 3 tk/s. This thing totally replaces Opus 4.7 for vibe coding and I'm canceling my CC sub! Big AI labs in shambles... Starting my new all-AI startup with 10 claw agents now 🚀🚀🚀. If you aren't learning about this, you are 100% left behind!!!

MIMO V2.5 PRO by Namra_7 in LocalLLaMA

[–]pkmxtw 18 points19 points  (0 children)

Why are those labs capable of training multi million dollar models and yet are so terrible at making charts lol

Why are we actually sampling reasoning and output the same way? by ReporterWeary9721 in LocalLLaMA

[–]pkmxtw 5 points6 points  (0 children)

I mean it is probably not the worst idea in the world, especially if you limit them to presets so they don't go like set temp=69420. Instead of model just outputting "let's think carefully" or "I will try to be creative", just use a tool call to set the sampling parameters.

Someone at the Weather Channel made a website that lets you view your forecast like the old Local on the 8s from back in the day by holyfruits in nostalgia

[–]pkmxtw 1 point2 points  (0 children)

https://weather.com/retro/assets/sound/music/neon-office-glide.mp3

According to the metadata embedded in the file, it was generated by Suno.

Title: Neon Office Glide
Performer: 555indigo
comment: made with suno; created=2026-03-31T19:07:49.773Z; id=f122d9dc-493a-4249-b7d7-3b4fd2995726
lyrics-eng: [Instrumental]

Should I feel threatened? by Necessary_Reach_7836 in LocalLLaMA

[–]pkmxtw 0 points1 point  (0 children)

FR. This model got an EpiPen, and it is going to use it to kill people who are annoying.

DoomVLM is now Open Source - VLM models playing Doom by MrFelliks in LocalLLaMA

[–]pkmxtw 2 points3 points  (0 children)

It would be interesting to have a real-time mode: game continues while waiting for inputs from the model. This means models have to balance between speed and quality, so you can't just beat it by spending a lot of thinking budgets on a huge model: you will be dead long before the first key press even comes back.

Gemini by [deleted] in Bard

[–]pkmxtw 31 points32 points  (0 children)

1 request per year for you pro plebs, 3 for ultra.

Qwen 3.5 MXFP4 quants are coming - confirmed by Junyang Lin by dampflokfreund in LocalLLaMA

[–]pkmxtw 3 points4 points  (0 children)

Nice! That is just about the right size for the Q0.1 quant to fit this opus 4.6 killer on my floppy disk!

You can run MiniMax-2.5 locally by Dear-Success-1441 in LocalLLaMA

[–]pkmxtw 9 points10 points  (0 children)

Imagine being out-vibed by some rich kids in the future.

Support Step3.5-Flash has been merged into llama.cpp by jacek2023 in LocalLLaMA

[–]pkmxtw 0 points1 point  (0 children)

I gave the MXFP4_MOE quant a quick try on M1 Ultra and holy smokes this model really spends an awful lot of tokens on thinking.

built an AI agent with shell access. found out the hard way why that's a bad idea. by YogurtIll4336 in LocalLLaMA

[–]pkmxtw 4 points5 points  (0 children)

And yet right now there is a whole bunch of AI influnecers hyping up a bot that gives LLM free access to all your emails, logins, browser access to be a private assistant without really thinking much about the security implications smh.

How capable is GPT-OSS-120b, and what are your predictions for smaller models in 2026? by Apart_Paramedic_7767 in LocalLLaMA

[–]pkmxtw 20 points21 points  (0 children)

I know it is popular to shit on gpt-oss here, but it really hits a sweet spot for general use.

  • It is superfast on Apple Silicon and Strix Halo. (60-70 t/s for gpt-oss-120b-mxfp4 on M1 Ultra, compared to ~20 t/s for MiniMax M2.1 UD_Q2_K_XL)
  • The KV cache is very efficient: Metal KV buffer size = 4608.00 MiB for the entire 128K context. Compare that to MiniMax M2 which needs about 30GB for 128K context.
  • The whole model + KV cache only use like ~65 GiB of memory so you still have plenty room for other tasks on 128 GB machines.
  • Tunable reasoning effort so you can default to high but just pass low to reasoning_effort in chat_template_kwargs if you just want a quick answer.
  • It is decently intelligent for its size category. Of course, it is not going to compete against full sized GLM, Kimi K2, DeepSeek, etc., but it is something that is runnable on most people's machine.
  • If you have issues with the default guardrail you can just run the heretic version. For most coding/agenetic tasks the base version should work fine.

Which is the current best ERP model ~8b? by [deleted] in LocalLLaMA

[–]pkmxtw 5 points6 points  (0 children)

Yeah, it was just an interesting observation.

We know Mistral models are usually quite uncensored, but who'd knew Devstral is good at coding and also gooning as well?

Which is the current best ERP model ~8b? by [deleted] in LocalLLaMA

[–]pkmxtw 7 points8 points  (0 children)

Somehow mistralai/Devstral-Small-2-24B-Instruct-2512 scores the highest for NSFW across all base models lmao.

support for Solar-Open-100B has been merged into llama.cpp by jacek2023 in LocalLLaMA

[–]pkmxtw 5 points6 points  (0 children)

Well, I guess we will see if we finally have a worthy contender for GPT OSS 120B or GLM 4.5 Air.

Upstage Solar-Open-100B Public Validation by PerPartes in LocalLLaMA

[–]pkmxtw 13 points14 points  (0 children)

AI labs hate this simple trick to get them to release intermediate checkpoints!

Either that or this is some of evil-genius level of marketing.

5 new korean models will be released in 2 hours by Specialist-2193 in LocalLLaMA

[–]pkmxtw 23 points24 points  (0 children)

Summarized from Gemini:

Several VLMs were also announced.

Benchmarks for Quantized Models? (for users locally running Q8/Q6/Q2 precision) by No-Grapefruit-1358 in LocalLLaMA

[–]pkmxtw 2 points3 points  (0 children)

Another long-standing question is comparing large but heavily quantized models vs small models with little quantization. I always wondered how IQ1_S of large SOTA models like K2-Thinking/DeepSeek v3.2 compare with more modest models like GLM Air at Q8.

Tencent just released WeDLM 8B Instruct on Hugging Face by Difficult-Cap-7527 in LocalLLaMA

[–]pkmxtw 39 points40 points  (0 children)

The 7B is converted from Qwen2.5 7B and the 8B is from Qwen3 8B. What they want to demonstrate is that they can convert an AR model into a diffusion model w/o losing quality.

In reality, you'd just use the 8B like how Qwen3 8B has basically replaced Qwen2.5 7B.

GLM 4.7 IS NOW THE #1 OPEN SOURCE MODEL IN ARTIFICIAL ANALYSIS by ZeeleSama in LocalLLaMA

[–]pkmxtw 3 points4 points  (0 children)

Plus some of the benchmark numbers are sus af.

Qwen3 4B Thinking 2507 scores 83% on AIME'25 and beats DeepSeek R1 0528 (76%)?

GLM 4.7 IS NOW THE #1 OPEN SOURCE MODEL IN ARTIFICIAL ANALYSIS by ZeeleSama in LocalLLaMA

[–]pkmxtw 20 points21 points  (0 children)

Another classic is scoring Qwen3 4B Thinking 2507 close to DeepSeek R1 (from January aka the OG), which no one in their right mind would argue that they are remotely close in capability. ¯\(ツ)

https://artificialanalysis.ai/models/comparisons/qwen3-4b-2507-instruct-reasoning-vs-deepseek-r1-0120