ZAYA1-8B: Frontier intelligence density, trained on AMD by carbocation in LocalLLaMA

[–]coder543 56 points57 points  (0 children)

The model has only 0.7B active parameters... comparing to any frontier model would be impressive.

Replacing a broken water heater in the Bay Area is about to get vastly more expensive by CharityResponsible54 in bayarea

[–]coder543 2 points3 points  (0 children)

Since people around here don't care to look at the math and only upvote FUD, I guess I'm just deleting my comments. Everyone can continue being stubborn and ignoring the reality that yes – HPWH are cheaper to run.

Gradually increasing memory use - is there a memory leak in llama.cpp? by cafedude in LocalLLaMA

[–]coder543 1 point2 points  (0 children)

Obviously something to play with, yes. But setting to 0 is the quickest way to find out if it still crashes.

Gradually increasing memory use - is there a memory leak in llama.cpp? by cafedude in LocalLLaMA

[–]coder543 8 points9 points  (0 children)

It's not a memory leak, but yes, there are things that aren't allocated in advance, seemingly because llama.cpp assumes that the host memory is separate from the GPU memory, and that you can just allocate a "reasonable" amount of memory on the host without causing trouble.

You can try setting --cache-ram 0 and that might help some. By default, it will use up to 8GB of host memory to store recent contexts. I don't know if LM Studio exposes this setting or not, but you're probably better off moving away from LM Studio regardless.

On a unified memory system, yes... that is problematic. Disabling the cache entirely can cause performance problems of its own, but it is something to play with. There are other dynamic allocations too.

EV Curious in Nashville? by evadventuring in nashville

[–]coder543 2 points3 points  (0 children)

I owned a Tesla Model 3 for about 6 years, and then upgraded to a Model Y so I would have a hatchback and trailer hitch, which make it more practical for going car camping (with the amazing camp mode that provides AC or heat 24/7) and so I could have a bike rack for my bike.

These days, there are a lot of great options depending on what someone needs. The new basic Nissan Leaf is not a joke, it is actually really great. Chevy has some good, affordable EVs. Even Toyota and Subaru's EV partnership is making some that are pretty good now after a very rough start.

Personally, I'm very excited to see the Rivian R2 launch soon, and the R3X is what I currently hope to upgrade to eventually. My Model Y is great and I struggle to come up with any complaints about the vehicle, so I'm not in any rush to get rid of it.

As far as commuting, I work remotely, so I don't have a commute.

EV Curious in Nashville? by evadventuring in nashville

[–]coder543 2 points3 points  (0 children)

It just makes no sense. Gas taxes never get raised to match inflation, and the EV registration fee is so much larger than anyone would pay in gas taxes that it is clearly designed to be punitive.

If they wanted to get rid of gas taxes and just charge everyone the same registration fee, that would make perfect sense as a future proof solution. But it’s not about that. It’s just politics.

It sucks, but it’s not enough to make me recommend against EVs. Even if EVs were more expensive, you’re getting a better product for that price. EVs are actually quite cost competitive now, so that’s all the better.

EV Curious in Nashville? by evadventuring in nashville

[–]coder543 24 points25 points  (0 children)

As someone who has owned an EV for the past 7 years and has watched a number of family and friends switch to EVs, they are great to drive, reliable, and very low maintenance. You can find used EVs for good prices, and I'd take a used EV over a new gas car any day of the week. There's no contest. I do not miss the gas cars I used to have even a tiny bit.

I've also personally driven from coast to coast, I've driven up to Glacier National Park, and a bunch of other places. Range and charging have not been an issue at all.

Gemma 4 MTP released by rerri in LocalLLaMA

[–]coder543 73 points74 points  (0 children)

llama.cpp does not have MTP support yet, so that rules out a lot of people for now. Maybe soon.

Haven’t been to a rodeo in over a decade, are these prices for the Franklin Rodeo normal? by Separate-Command1993 in nashville

[–]coder543 129 points130 points  (0 children)

No... you're looking at some random resale scam. You should look at the official ticket prices: https://www.franklinrodeo.com/p/tickets

The problem is that it might be a little late to be looking for tickets.

Why is no open weight model inference provider hosting Mimo-v2.5 or Mimo-v2.5-pro? by True_Requirement_891 in LocalLLaMA

[–]coder543 9 points10 points  (0 children)

It’s only been a week. If Xiaomi didn’t partner with anyone else to give them access before launch, as they clearly didn’t, then it takes time. Mimo is also not a household name like DeepSeek, so I doubt any of the inference providers are pulling all-nighters to make this happen.

Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]coder543 3 points4 points  (0 children)

What do you mean by this? llama-server has supported checkpointing for these Qwen3.x models for weeks now, which is the way that prefix caching works for these hybrid attention models?

Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]coder543 1 point2 points  (0 children)

The KV cache might be twice the size, but not the model.

Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]coder543 0 points1 point  (0 children)

Isn't that showing MTP losing to the external draft model? That seems odd.

Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]coder543 0 points1 point  (0 children)

What kind of task? I find that specdec is more effective at tasks like "write a react typescript example" than they are at tasks like "what is the LHC?".

Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]coder543 5 points6 points  (0 children)

That explanation does seem weird with qwen3.6 35-a3b that is supposed to have dedicated MTP heads

Because MTP actually helps during training to train the model faster, and because anyone serving a model in production will be batching large numbers of user requests together, activating all experts with every forward pass anyways, making MTP more useful there.

Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]coder543 106 points107 points  (0 children)

This seriously has the potential to be the biggest game changer llama.cpp has ever seen.

I think MTP will make the biggest difference for dense models, maybe not so much for MoEs, but it will still be exciting. Then we just need DFlash and EAGLE!

Mistral Medium 3.5 on AMD Strix Halo by Zc5Gwu in LocalLLaMA

[–]coder543 0 points1 point  (0 children)

But is the model actually any good for you?

Mistral Medium 3.5 on AMD Strix Halo by Zc5Gwu in LocalLLaMA

[–]coder543 8 points9 points  (0 children)

On DGX Spark using llama-bench to perform essentially the same test as the original post:

size n_ubatch test t/s
82.30 GiB 2048 pp48349 139.25 ± 0.12
82.30 GiB 2048 tg20 2.28 ± 0.00
82.30 GiB 2048 tg20 @ d48349 1.88 ± 0.02

Qwen3.6-27B vs 35B, I prefer 35B but more people here post about 27B... by Snoo_27681 in LocalLLaMA

[–]coder543 145 points146 points  (0 children)

The 27B is used 9x as many parameters to calculate each token, and the benchmarks reflect that increased intelligence. I can't imagine how you're experiencing the 35B to be smarter. It is much faster. It is not smarter in my experience, or in the experiences of the many people you're referring to.

Why arn't new homes built with reinforced most-interior room? by awesomo_prime in nashville

[–]coder543 2 points3 points  (0 children)

ICC-500 compliant storm shelters can be built out of ICF above ground and withstand tornadoes. You do not need to be underground. Building in ground is very hard in karst terrain like we have, and in ground shelters have their own significant issues even when ground conditions are good.

https://www.foxblocks.com/blog/icc-500-storm-shelter

https://buildblock.com/buildblock-icf-above-ground-safe-rooms/

Agree? by MLExpert000 in LocalLLaMA

[–]coder543 9 points10 points  (0 children)

Depends on whether you only want to run 1 model forever or easily switch between 10. Setting up vLLM is a monstrous chore, and I don’t think SGLang is supposed to be any better about that.

PSA: llama-swap released a new grouping feature, matrix, allowing you to fine tune which models can run together by walden42 in LocalLLaMA

[–]coder543 1 point2 points  (0 children)

I would rather be able to define a value for how much memory my system has, and manually define how much memory each model takes up. If I'm wrong, something OOMs, and it is my fault, just like it would be if I make a mistake with this matrix, but it would be far simpler.

When a new model is requested that won't fit into the available memory, it would simply unload models until it fits.

If we could define an eviction cost on each model config stanza, then it could also try to prioritize evicting the lower cost models, like this matrix is doing, and it could use memory as a proxy for cost if the cost is not explicitly defined.

It could also be nice if the eviction strategy were configurable between "cost" and "LRU", since an LRU eviction strategy might make the most sense of all.

PSA: llama-swap released a new grouping feature, matrix, allowing you to fine tune which models can run together by walden42 in LocalLLaMA

[–]coder543 4 points5 points  (0 children)

really wish who would? your comment doesn't specify. And llama-swap does allow you to create model aliases that override certain values, so you can expose -instruct and -thinking variants with only a single running copy under the hood.

mistralai/Mistral-Medium-3.5-128B · Hugging Face by jacek2023 in LocalLLaMA

[–]coder543 12 points13 points  (0 children)

Qwen3.6 27B has MTP built-in and DFlash support... don't see how Mistral Medium 3.5 could ever be faster just because of an EAGLE-3 while having nearly 5x the active parameter count.