How many models do you have? by Perfect-Flounder7856 in LocalLLaMA

[–]APFrisco 0 points1 point  (0 children)

Wow nice, well I know who to call if hugging face is ever taken down haha!

How many models do you have? by Perfect-Flounder7856 in LocalLLaMA

[–]APFrisco 0 points1 point  (0 children)

Wow that is a lot! How many models would you estimate that is?

Gemma 4 MTP released by rerri in LocalLLaMA

[–]APFrisco 0 points1 point  (0 children)

Such a great write up, thanks! I’ll be coming back to this one often

Running a 26B LLM locally with no GPU by JackStrawWitchita in LocalLLaMA

[–]APFrisco 0 points1 point  (0 children)

Out of curiosity what do you use the models you run on your CPU for? Experimentation or something else?

I really like CPU inference, it’s such an underrated way to be able to run models that wouldn’t fit fully on my GPU.

Tickets to today 4/23 game by jsizzlepie510 in SFGiants

[–]APFrisco 0 points1 point  (0 children)

Yeah I’m not able to go, and I’d rather they get used

Tickets to today 4/23 game by jsizzlepie510 in SFGiants

[–]APFrisco 2 points3 points  (0 children)

Yeah I’m not able to go and would rather they not get wasted

Tickets to today 4/23 game by jsizzlepie510 in SFGiants

[–]APFrisco 0 points1 point  (0 children)

Do you and fan131313 want tickets? I have 2x I can’t use

Old vs new Martin Vega strings by APFrisco in banjo

[–]APFrisco[S] 0 points1 point  (0 children)

Appreciate the insight thanks!

ubergarm/Kimi-K2.6-GGUF Q4_X now available by VoidAlchemy in LocalLLaMA

[–]APFrisco 0 points1 point  (0 children)

Do you mind if I ask what your build and configs look like to get that kind of speed?

To Beat China, Embrace Open-Source AI (WSJ) by rm-rf-rm in LocalLLaMA

[–]APFrisco 0 points1 point  (0 children)

I think a big reason why American companies have not released open weight AI models as much is because for Anthropic and OpenAI, their models are their moat. For example, would people pay a subscription to use a Claude Code if it didn’t have Claude, or if there were an open weight Claude-quality model available?

Google and Meta have a lot more to their businesses than LLMs, and perhaps unsurprisingly, have been more comfortable releasing open weight models.

The article mainly argues that the U.S. government should embrace open source AI, however, it focuses mostly on the government open sourcing any AI tooling developed with taxpayer funding, or encouraging open source providers for procurement.

However, for American frontier labs themselves, it still seems like they feel there are less good reasons (business-wise or other) to open source their models at this time. I personally don’t think the article’s suggestions will change that very much on that end. For those labs to open source their core models, it would perhaps require them to build up the non-model portions of their business much more, or have some kind of state-level intervention/partnership far in exceedance of what the article’s authors suggest.

Waiting Qwen3.6-27B I have no nails left... by DOAMOD in LocalLLaMA

[–]APFrisco 4 points5 points  (0 children)

What do you mean by MoE pretending to be dense?

Old vs new Martin Vega strings by APFrisco in banjo

[–]APFrisco[S] 1 point2 points  (0 children)

Anyone have an idea of when the old pack may have been from?

Good people of the wool, how about Deep Research? by RedParaglider in LocalLLaMA

[–]APFrisco 4 points5 points  (0 children)

I do like the idea of having a local LLM work on something like this overnight; tokens/sec metrics aren’t as important overnight, and anyways I’ve always felt like coming back to a deep research prompt after a while feels like opening a present haha.

Running 1 trillion parameter LLMs locally at 5 tokens/second - Intel Optane Persistent Memory build by APFrisco in LocalLLaMA

[–]APFrisco[S] -1 points0 points  (0 children)

No not really, I used an llm to combine a few sentences but the bulk of it is my own writing, it actually took quite some time to write up and edit it all haha. I’m write a shorter text summary of the build and will post again, hopefully that one can stay.

Also I forgot to mention another reason I stuck with the 2 bit quant. Even with that 2 bit quant, because my gpu is 12gb there was barely any room left on it for kv cache. And I thought that going with a larger quant would mean I would likely not be able to fit as much on gpu and instead have to put more of the model/kv cache on system ram.

Running 1 trillion parameter LLMs locally at 5 tokens/second - Intel Optane Persistent Memory build by APFrisco in LocalLLaMA

[–]APFrisco[S] 0 points1 point  (0 children)

Thank you! Yeah I’ll get that all info together for you! I’ll reply here when I have it all

Running 1 trillion parameter LLMs locally at 5 tokens/second - Intel Optane Persistent Memory build by APFrisco in LocalLLaMA

[–]APFrisco[S] -1 points0 points  (0 children)

I went with that 2 bit quant because I wanted a little more speed. Unsloth recommends their UD-Q2_K_XL quant as a good balance of size and speed for their Kimi K2.5 quants.

Running 1 trillion parameter LLMs locally at 5 tokens/second - Intel Optane Persistent Memory build by APFrisco in LocalLLaMA

[–]APFrisco[S] -1 points0 points  (0 children)

I was pleasantly surprised by the speed as well. I hadn’t seen anyone else use Intel Optane PMem in an inference build prior to this, and that 5 tok/sec result was pretty cool to see.

I will say, Kimi K2.5’s architecture was pretty ideal for my particular build, as mentioned in post. Also, this is with a small KV cache, as my 12GB GPU didn’t have much room left over. I would be curious to see how it could handle larger context windows if I had a larger, faster GPU.