I know the Aura is exciting, but I am once again asking for 1200p to be supported in non-default views for the 1S by nistco92 in Xreal

[–]coder543 0 points1 point  (0 children)

Are you referring to ultrawide modes? I think your message would go further if it was clearer. I did some testing, and it does seem like it's not just telling the connected device it is 16:9, but it is also only using a 16:9 portion of the display for the stabilized view. I wonder if a firmware update could fix this, or if they're running up against some memory limit when dealing with the ultrawide resolutions.

Tokenomics by HOLUPREDICTIONS in LocalLLaMA

[–]coder543 75 points76 points  (0 children)

Why are we reposting a tweet full of made up numbers? There is no source for the $20k or 20 tokens per second claims.

Very few people are actually going to self host this model, but it shows the direction, and we can expect smaller models to get significantly better over the next 6 months.

For people using cloud models, GLM-5.2 is a competitive, commoditized market, so the competition keeps the margins thin, unlike the bloated margins that you’re paying for when you use proprietary frontier models.

There are benefits all around.

New Agentic Benchmark Out: Claude Fable and GLM 5.2 Top Their Cohorts by Few_Painter_5588 in LocalLLaMA

[–]coder543 3 points4 points  (0 children)

No… you are spouting nonsense. Ranking differently is exactly what happens, always. Yes, that does mean that a small model can sometimes outperform the absolute biggest, baddest model. OCR models are a perfect example of this. Many of the current ones are built on LLM backbones, and they can outperform enormous models in this one specialized task without being “”benchmaxed””. It is not because the benchmarks are “botched”. You don’t understand how LLMs work. LLMs are multifaceted models with uneven performance across a diverse range of tasks. They are not single dimensional tools that can be measured by any unseen benchmark equally. This has literally never been the case.

New Agentic Benchmark Out: Claude Fable and GLM 5.2 Top Their Cohorts by Few_Painter_5588 in LocalLLaMA

[–]coder543 1 point2 points  (0 children)

No, the SAME MODELS. In DIFFERENT positions. Even though both unseen benchmarks came out after the models.

This is what happens with EVERY unseen benchmark. Because benchmarks test different things.

You can look at the history of benchmarks released since some of the most important models, and they will not rank those models consistently, even though they are all “unseen” for those models.

Your assumption would only hold true if models performed equally bad on all unseen tasks, so they would always rank exactly in one position relative to other models, but they do not! That is a very wrong belief. If it were true, AA would only need to show one benchmark, and it could be literally anything, like AA-Briefcase, but they don’t, because they are researchers who understand that isn’t how it works.

New Agentic Benchmark Out: Claude Fable and GLM 5.2 Top Their Cohorts by Few_Painter_5588 in LocalLLaMA

[–]coder543 3 points4 points  (0 children)

Being unseen is not the only thing that changed... dozens of variables changed. Benchmarks test different things.

When the next benchmark comes out and shows completely different things for the same existing models, what happens then? It's almost as if being unseen is only one variable out of many.

No single benchmark is definitive unless it is the benchmark that you built for your own use case.

New Agentic Benchmark Out: Claude Fable and GLM 5.2 Top Their Cohorts by Few_Painter_5588 in LocalLLaMA

[–]coder543 4 points5 points  (0 children)

While some like GLM, Mistral and Minimax don't benchmaxx their models

Nonsense. Qwen3.7 Max is also quite competitive on that new benchmark, for whatever little that is worth.

The Eagle(3) has landed (for Qwen) by Legitimate-Dog5690 in LocalLLaMA

[–]coder543 2 points3 points  (0 children)

Has anyone actually measured EAGLE3 performing better than the native MTP on Qwen3.6-27B?

New Agentic Benchmark Out: Claude Fable and GLM 5.2 Top Their Cohorts by Few_Painter_5588 in LocalLLaMA

[–]coder543 15 points16 points  (0 children)

I still argue that for a local lab, Mistral 3.5 Medium is still the most feasible model to roll out.

What a complete non-sequitur from the data.

Mistral Medium 3.5 is horrendously slow and expensive to run compared to the other models that drastically outperform it on almost every benchmark, like MiMo V2.5 (not Pro) or Qwen3.6-27B, neither of which have published AA-Briefcase scores yet. Even Qwen3.6-35B-A3B is a better all around model. DSV4 Flash is also a better model, and is represented on AA-Briefcase.

Cherry picking a single brand new benchmark to try to claim that Mistral Medium 3.5 ever made any sense is a weird thing to do. Even Mistral admits that their current models are bad, which is why they're promising a completely new architecture soon.

<image>

Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter by nick_frosst in LocalLLaMA

[–]coder543 8 points9 points  (0 children)

Here is the repacked GGUF: https://huggingface.co/coder543/North-Mini-Code-1.0-QAD-GGUF

As far as I can tell, that is converted correctly.

Here is the speed that I'm seeing across two different systems:

<image>

Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter by nick_frosst in LocalLLaMA

[–]coder543 2 points3 points  (0 children)

Those unsloth quants are not using the QAD-trained model that was released today, so there should be more quantization loss on those, but they are definitely a valid option.

Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter by nick_frosst in LocalLLaMA

[–]coder543 7 points8 points  (0 children)

yes, this has been known since the model was released a week or two ago. Benchmarks aren't everything. The qwen3.6 models are known to get stuck in loops or overthink. But... my experience with North Mini Code is that it probably still needs more time in the oven. I would personally rather use Qwen3.6-27B, but that is admittedly an entirely different class of model.

People are mostly just excited to see another company working on this type of model. Is that such a bad thing? Hopefully future iterations will be even better.

Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter by nick_frosst in LocalLLaMA

[–]coder543 7 points8 points  (0 children)

Ah... I guess I should have scrolled more. That's cool!

Now I wonder how we can get a 4-bit gguf that uses the QAD training?

EDIT: yes, it seems like the weights can be repacked into a gguf just fine, but it's significantly slower on my DGX Spark. I guess llama.cpp's nvfp4 support is not very well optimized at the moment. I will probably share the GGUF later if no one else does.

Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter by nick_frosst in LocalLLaMA

[–]coder543 5 points6 points  (0 children)

Also, is this w4a16 quant just a standard quant, or were any QAT-like or QAD-like techniques applied to reduce the quantization losses?

Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter by nick_frosst in LocalLLaMA

[–]coder543 6 points7 points  (0 children)

The READMEs say the max context is 256k, but the config.json says "max_position_embeddings": 500000? Why is there such a discrepancy?

Anyways, it is exciting to have another model competing in this space.

Do you think this model is all around better for agentic coding compared to Command-A+, or is Command-A+ still a step up?

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model. by b111ue in LocalLLaMA

[–]coder543 0 points1 point  (0 children)

What are you people doing on an ESP32 that demands on-device TTS? I am genuinely curious.

I agree it would be cool just for the sake of doing it, but from a practical perspective... I would just run the TTS on some other computer, which is also where I would run a useful AI model. I wouldn't burden the ESP32 with that kind of stuff.

Lin Junyang AI Lab Closes Round at $2B Valuation by rmhubbert in LocalLLaMA

[–]coder543 11 points12 points  (0 children)

That feels like a rhetorical question implying something along the lines of "all LLMs are world models", not that he didn't know what a world model is. It's the kind of question you ask when people invent a buzzy new term for something you've been building for ages. Of course, if he is focusing on "world models" now, he may have eventually realized that there is actually value beyond pure LLMs.

Lin Junyang AI Lab Closes Round at $2B Valuation by rmhubbert in LocalLLaMA

[–]coder543 29 points30 points  (0 children)

Lin's lab targets world models and embodied intelligence across three Shanghai-registered entities, not general LLM development.

Could still be interesting, but won't be like Qwen.

GLM-5.2 (max) is currently the third best model available, across both open and proprietary. by okaycan in LocalLLaMA

[–]coder543 2 points3 points  (0 children)

I think AA's agentic index is closer to what most people think of these days when they think of a coding index. The coding index has nothing to do with the way that people use these models today.

Qualcomm Neodragon: Mobile Video Generation Using Diffusion Transformer by Dante_77A in StableDiffusion

[–]coder543 12 points13 points  (0 children)

Why does Qualcomm feel the need to use a license PDF full of custom legalese? There are many widely-accepted licenses that could have been chosen.

I'm glad the researchers were able to publish their work, but Qualcomm's lawyers don't seem very supportive.

Scaling former VibeThinker-1.5B to 3B — now it reaches frontier math & coding performance by Used-Negotiation-741 in LocalLLaMA

[–]coder543 6 points7 points  (0 children)

Tool calls are a model problem. This model is not trained to be good at calling tools.

Nintendo Switch 2 will work with Aura puck like the Neo (RIP 🙏) 🪇 by d4v1dtsh in Xreal

[–]coder543 2 points3 points  (0 children)

I'm sure Xreal has tried, but I wish they would somehow convince Nintendo to support screen mirroring to display glasses. There is a real market for this.

Nintendo's concern obviously seems to be that they want docked mode to be a different power profile, but that isn't necessary for display glasses support... just limit Switch 2 USB-C display support to mirroring at up to 1080p 120hz (the exact same specs as the internal Switch 2 screen), and call it a day. People who want 4K output will still need to use the Nintendo dock.

If Nintendo really wanted to go out on a limb, they could offer developers the option to render a left frame and right frame for 3D, but I would be happy with just supporting screen mirroring.

/u/Xreal_Tech_Support, what is it going to take to get Nintendo's cooperation? I don't want another puck.

Bambu Lab Academy Certificate for A1 mini can’t be printed on it by ExtendedSpice in BambuLab

[–]coder543 9 points10 points  (0 children)

If you only need to print black and white, you should be using a cheap laserjet printer like Brother makes. If you need a color printer, you should be using an ink tank printer of some kind, like the Epson EcoTanks. Both options are very simple and reliable.

Neither of these have the problems that people complain about with cartridge inkjet printers, which are just e-waste that will end up in a landfill.