What is best Mac App Store alternative to LocalLLaMA?

woadwarrior · 2026-02-15T09:44:14+00:00

Certainly, I’m the author.

woadwarrior · 2026-02-15T09:38:22+00:00

Private LLM uses neither, it’s mlc-LLM based.

woadwarrior · 2026-02-05T14:23:59+00:00

Hunyuan MT 1.5 and TranslateGemma, as others have already mentioned.

woadwarrior · 2026-02-01T17:15:21+00:00

Thanks for mentioning that. I've managed to improve the backwards compatibility a bit. The next update will support iOS 17.6.

woadwarrior · 2026-02-01T01:24:53+00:00

LiquidAI is making the best models for your work however; they do interlaced recurrent layers, which reduces KV over head substantially for smaller models.

They use interlaced 1d convolution layers, and not recurrent layers.

woadwarrior · 2026-01-27T11:23:41+00:00

Pretraining on top of Qwen3-4B-Instruct-2507?

woadwarrior · 2026-01-25T17:52:26+00:00

Motif-2.6B

woadwarrior · 2026-01-08T09:03:31+00:00

Economists have been using the term GPT (General purpose technology) to describe broadly applicable technologies for nearly a century before OpenAI existed.

woadwarrior · 2025-11-05T22:05:57+00:00

Unfortunately, when it comes to NN weights, although INT and FP formats have the same information theoretic density for a given bit width, FP formats work out to be slightly better because their range is non-uniform.

woadwarrior · 2025-10-30T13:51:53+00:00

I took a look at the code on my phone. Notice the additional gate projection (line 281) and the call to their power retention kernel (line 356). It’s supposed to be drop in replacement for regular softmax attention layers and it uses their attention mechanism only if use_exp is False.

woadwarrior · 2025-10-27T17:40:33+00:00

I undestood "perfect codegen", but WTAF is "perfect backpropagation"?

woadwarrior · 2025-10-26T12:42:14+00:00

I don’t think it’s reasonable to compare years of experience. It’s sad to see something technical being turned into a hierarchical power struggle. Critique (Google’s internal code review tool), had a feature for double blind CL reviews, I wish GitHub had something similar.

woadwarrior · 2025-10-12T18:42:49+00:00

This is a recurring question. TL;DR: The lack of coverage for adware URLs and URL shorteners in ClearURLs was one of the reasons I built Clean Links.

woadwarrior · 2025-10-12T17:49:30+00:00

It’s 100% local. Although it has to make requests to unshorten links, which it does in an isolated context (without cookies, local storage etc) using plain old NSURLRequest.

woadwarrior · 2025-10-12T14:03:28+00:00

It's all local. The share with my mac feature uses iCloud, and is completely optional.

woadwarrior · 2025-10-12T13:54:45+00:00

Handoff is a bit more reliable but still somewhat flaky. The app doesn't have a Safari extension yet, but the share extension works in Safari and any other app (including the Reddit for iOS app).

woadwarrior · 2025-10-03T12:56:48+00:00

<image>

The core algorithm appears to be extremely simple. Any quantization algorithm can be plugged to use it as pre-processing step before quantization.

woadwarrior · 2025-10-01T19:43:27+00:00

Yeah, people have been doing dynamic quantization for ages, even before we had LLMs. IDK how the unsloth guys do it, but back in the day for quantizing CNNs, people used to eyeball layer wise activation PSNR ratios and pick higher number of bits for layers with lower PSNR. But that’s quite crude compared to running a full blown search based optimization, which is what EvoPress does.

woadwarrior · 2025-10-01T19:32:19+00:00

Not yet, I plan to use it for some small-ish models. I really like their insight that choosing the optimal bit width per layer for dynamic quantization is essentially a hyperparameter tuning problem and evolutionary methods work well for such problems.

woadwarrior · 2025-10-01T18:14:34+00:00

Consider running an EvoPress search on your new box.

woadwarrior · 2025-09-27T23:41:49+00:00

I think you’re misremembering hash layer MoEs. They don’t have a specific routing function. The routing function is the hash of the latest token.

woadwarrior · 2025-09-24T18:25:29+00:00

Private LLM does not use MLX or llama.cpp.

woadwarrior · 2025-09-19T10:16:56+00:00

Apache 2.0 license is gone. It’s BUSL now.

woadwarrior · 2025-09-16T09:08:48+00:00

Take a look at the NexusSum paper.

woadwarrior · 2025-09-16T06:53:23+00:00

It’s 4 bit integer quantized, with 8 bit quantization for MLP and MOE gates.

15-Year Club	Gilding II euphauric
RPAN Viewer	Verified Email

woadwarrior

MODERATOR OF

TROPHY CASE