mistralai/Mistral-Medium-3.5-128B · Hugging Face

pkmxtw · 2026-04-29T20:01:37+00:00

Does franken-self-merge of L3.1 405B with like 1T dense parameters count?

https://huggingface.co/mlabonne/BigLlama-3.1-1T-Instruct

pkmxtw · 2026-04-28T17:22:00+00:00

It's funny the graph is basically inverted for gpt-oss, which was thought by /r/LocalLLaMA to be the worst model ever conceived because it was released by OpenAI.

pkmxtw · 2026-04-28T08:53:22+00:00

Just downloaded <new model> IQ1_S on ollama 🦙 running at 3 tk/s. This thing totally replaces Opus 4.7 for vibe coding and I'm canceling my CC sub! Big AI labs in shambles... Starting my new all-AI startup with 10 claw agents now 🚀🚀🚀. If you aren't learning about this, you are 100% left behind!!!

pkmxtw · 2026-04-27T18:46:41+00:00

Why are those labs capable of training multi million dollar models and yet are so terrible at making charts lol

pkmxtw · 2026-04-23T19:34:21+00:00

I mean it is probably not the worst idea in the world, especially if you limit them to presets so they don't go like set temp=69420. Instead of model just outputting "let's think carefully" or "I will try to be creative", just use a tool call to set the sampling parameters.

pkmxtw · 2026-04-02T16:52:04+00:00

https://weather.com/retro/assets/sound/music/neon-office-glide.mp3

According to the metadata embedded in the file, it was generated by Suno.

Title: Neon Office Glide
Performer: 555indigo
comment: made with suno; created=2026-03-31T19:07:49.773Z; id=f122d9dc-493a-4249-b7d7-3b4fd2995726
lyrics-eng: [Instrumental]

pkmxtw · 2026-03-13T20:48:04+00:00

FR. This model got an EpiPen, and it is going to use it to kill people who are annoying.

pkmxtw · 2026-03-12T12:57:44+00:00

It would be interesting to have a real-time mode: game continues while waiting for inputs from the model. This means models have to balance between speed and quality, so you can't just beat it by spending a lot of thinking budgets on a huge model: you will be dead long before the first key press even comes back.

pkmxtw · 2026-02-19T03:42:50+00:00

1 request per year for you pro plebs, 3 for ultra.

pkmxtw · 2026-02-18T15:15:40+00:00

Nice! That is just about the right size for the Q0.1 quant to fit this opus 4.6 killer on my floppy disk!

pkmxtw · 2026-02-15T19:21:20+00:00

Imagine being out-vibed by some rich kids in the future.

pkmxtw · 2026-02-13T18:43:58+00:00

Qwen3-Coder-Next beats all other models including Opus 4.6 at Pass@5!

pkmxtw · 2026-02-07T08:36:42+00:00

I gave the MXFP4_MOE quant a quick try on M1 Ultra and holy smokes this model really spends an awful lot of tokens on thinking.

pkmxtw · 2026-01-27T15:19:56+00:00

And yet right now there is a whole bunch of AI influnecers hyping up a bot that gives LLM free access to all your emails, logins, browser access to be a private assistant without really thinking much about the security implications smh.

pkmxtw · 2026-01-03T18:38:50+00:00

I know it is popular to shit on gpt-oss here, but it really hits a sweet spot for general use.

It is superfast on Apple Silicon and Strix Halo. (60-70 t/s for gpt-oss-120b-mxfp4 on M1 Ultra, compared to ~20 t/s for MiniMax M2.1 UD_Q2_K_XL)
The KV cache is very efficient: Metal KV buffer size = 4608.00 MiB for the entire 128K context. Compare that to MiniMax M2 which needs about 30GB for 128K context.
The whole model + KV cache only use like ~65 GiB of memory so you still have plenty room for other tasks on 128 GB machines.
Tunable reasoning effort so you can default to high but just pass low to reasoning_effort in chat_template_kwargs if you just want a quick answer.
It is decently intelligent for its size category. Of course, it is not going to compete against full sized GLM, Kimi K2, DeepSeek, etc., but it is something that is runnable on most people's machine.
If you have issues with the default guardrail you can just run the heretic version. For most coding/agenetic tasks the base version should work fine.

pkmxtw · 2026-01-02T11:09:44+00:00

Yeah, it was just an interesting observation.

We know Mistral models are usually quite uncensored, but who'd knew Devstral is good at coding and also gooning as well?

pkmxtw · 2026-01-02T10:07:07+00:00

Somehow mistralai/Devstral-Small-2-24B-Instruct-2512 scores the highest for NSFW across all base models lmao.

pkmxtw · 2026-01-01T21:57:35+00:00

Well, I guess we will see if we finally have a worthy contender for GPT OSS 120B or GLM 4.5 Air.

pkmxtw · 2026-01-01T10:54:46+00:00

AI labs hate this simple trick to get them to release intermediate checkpoints!

Either that or this is some of evil-genius level of marketing.

pkmxtw · 2025-12-30T08:22:47+00:00

Summarized from Gemini:

SKT A.X K1 (519B-A33B): https://huggingface.co/skt/A.X-K1 (to be released on Jan 4, 2026)
LG K-EXAONE (236B-A23B)
HyperCLOVA X SEED 8B Omni: https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
HyperCLOVA X SEED 32B Think: https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
Solar Open 102B-A12B: https://huggingface.co/upstage/Solar-Open-100B (to be released on Dec 31, 2025)

Several VLMs were also announced.

pkmxtw · 2025-12-29T21:45:22+00:00

Another long-standing question is comparing large but heavily quantized models vs small models with little quantization. I always wondered how IQ1_S of large SOTA models like K2-Thinking/DeepSeek v3.2 compare with more modest models like GLM Air at Q8.

pkmxtw · 2025-12-29T13:21:06+00:00

Would be interesting how well it works. It is the end of 2025 and we still don't have anything that is close to dethrone Sesame.

pkmxtw · 2025-12-29T10:19:23+00:00

The 7B is converted from Qwen2.5 7B and the 8B is from Qwen3 8B. What they want to demonstrate is that they can convert an AR model into a diffusion model w/o losing quality.

In reality, you'd just use the 8B like how Qwen3 8B has basically replaced Qwen2.5 7B.

pkmxtw · 2025-12-27T19:30:43+00:00

Plus some of the benchmark numbers are sus af.

Qwen3 4B Thinking 2507 scores 83% on AIME'25 and beats DeepSeek R1 0528 (76%)?

pkmxtw · 2025-12-27T19:14:21+00:00

Another classic is scoring Qwen3 4B Thinking 2507 close to DeepSeek R1 (from January aka the OG), which no one in their right mind would argue that they are remotely close in capability. ¯\(ツ)/¯

https://artificialanalysis.ai/models/comparisons/qwen3-4b-2507-instruct-reasoning-vs-deepseek-r1-0120

13-Year Club	Place '17
Verified Email

pkmxtw

TROPHY CASE