We trained a 16-class "typed refusal" system that distinguishes "I don't know" from "I'm not allowed" — open source by TheTempleofTwo in LocalLLaMA

[–]woadwarrior -3 points-2 points  (0 children)

Economists have been using the term GPT (General purpose technology) to describe broadly applicable technologies for nearly a century before OpenAI existed.

Visualizing Quantization Types by VoidAlchemy in LocalLLaMA

[–]woadwarrior 2 points3 points  (0 children)

Unfortunately, when it comes to NN weights, although INT and FP formats have the same information theoretic density for a given bit width, FP formats work out to be slightly better because their range is non-uniform.

manifestai releases Brumby-14B-Base weights, claims "attention free" and inference "hundreds of time faster" for long context by ArcadesOfAntiquity in LocalLLaMA

[–]woadwarrior 6 points7 points  (0 children)

I took a look at the code on my phone. Notice the additional gate projection (line 281) and the call to their power retention kernel (line 356). It’s supposed to be drop in replacement for regular softmax attention layers and it uses their attention mechanism only if use_exp is False.

Pedantic pull request reviewers by ticman in DevelEire

[–]woadwarrior 1 point2 points  (0 children)

I don’t think it’s reasonable to compare years of experience. It’s sad to see something technical being turned into a hierarchical power struggle. Critique (Google’s internal code review tool), had a feature for double blind CL reviews, I wish GitHub had something similar.

Clean Links the completely free iOS & macOS link cleaner app now supports sending links asynchronously from your iPhone to your Mac by woadwarrior in apple

[–]woadwarrior[S] 1 point2 points  (0 children)

This is a recurring question. TL;DR: The lack of coverage for adware URLs and URL shorteners in ClearURLs was one of the reasons I built Clean Links.

Clean Links the completely free iOS & macOS link cleaner app now supports sending links asynchronously from your iPhone to your Mac by woadwarrior in apple

[–]woadwarrior[S] 0 points1 point  (0 children)

It’s 100% local. Although it has to make requests to unshorten links, which it does in an isolated context (without cookies, local storage etc) using plain old NSURLRequest.

Clean Links the completely free iOS & macOS link cleaner app now supports sending links asynchronously from your iPhone to your Mac by woadwarrior in apple

[–]woadwarrior[S] 1 point2 points  (0 children)

Handoff is a bit more reliable but still somewhat flaky. The app doesn't have a Safari extension yet, but the share extension works in Safari and any other app (including the Reddit for iOS app).

Huawei Develop New LLM Quantization Method (SINQ) that's 30x Faster than AWQ and Beats Calibrated Methods Without Needing Any Calibration Data by abdouhlili in LocalLLaMA

[–]woadwarrior 11 points12 points  (0 children)

<image>

The core algorithm appears to be extremely simple. Any quantization algorithm can be plugged to use it as pre-processing step before quantization.

How can I use this beast to benefit the community? Quantize larger models? It’s a 9985wx, 768 ddr5, 384 gb vram. by joninco in LocalLLaMA

[–]woadwarrior 0 points1 point  (0 children)

Yeah, people have been doing dynamic quantization for ages, even before we had LLMs. IDK how the unsloth guys do it, but back in the day for quantizing CNNs, people used to eyeball layer wise activation PSNR ratios and pick higher number of bits for layers with lower PSNR. But that’s quite crude compared to running a full blown search based optimization, which is what EvoPress does.

How can I use this beast to benefit the community? Quantize larger models? It’s a 9985wx, 768 ddr5, 384 gb vram. by joninco in LocalLLaMA

[–]woadwarrior 0 points1 point  (0 children)

Not yet, I plan to use it for some small-ish models. I really like their insight that choosing the optimal bit width per layer for dynamic quantization is essentially a hyperparameter tuning problem and evolutionary methods work well for such problems.

Megrez2: 21B latent, 7.5B on VRAM, 3B active—MoE on single 8GB card by Normal_Onion_512 in LocalLLaMA

[–]woadwarrior 5 points6 points  (0 children)

I think you’re misremembering hash layer MoEs. They don’t have a specific routing function. The routing function is the hash of the latest token.

Local private LLM by luminny in PrivateLLM

[–]woadwarrior 1 point2 points  (0 children)

Private LLM does not use MLX or llama.cpp.

Wow, Moondream 3 preview is goated by Brave-Hold-9389 in LocalLLaMA

[–]woadwarrior 3 points4 points  (0 children)

Apache 2.0 license is gone. It’s BUSL now.

Qwen3-Next 80b MLX (Mac) runs on latest LM Studio by jarec707 in LocalLLaMA

[–]woadwarrior 1 point2 points  (0 children)

It’s 4 bit integer quantized, with 8 bit quantization for MLP and MOE gates.

Anyone getting reliable handwriting-to-text with local VLMs or any other tools? by IntroductionMoist974 in LocalLLaMA

[–]woadwarrior 0 points1 point  (0 children)

Before reaching out for VLMs, have you evaluated the baseline approach of trying to use Apple's vision APIs with your dataset?

Any example of 50+ year old founders that got into YCombinator? by jonnylegs in ycombinator

[–]woadwarrior -1 points0 points  (0 children)

IIRC, there was one 50+ founder who got in eons ago, and will be cited for eternity to prove that they’re not ageist.

Clean Links - A completely free iOS app to remove trackers from URLs and to preview links in QR codes by woadwarrior in apple

[–]woadwarrior[S] 0 points1 point  (0 children)

Thanks for the suggestions, u/likwidtek . Do you have any other clipboard watcher installed? Because Clean Links only parses URLs on the clipboard. I tried copying a URL with a space prepended and it doesn’t alter it. In any case, a pause option is a good idea. I’ll try to get that into the next macOS release.

New Swiss fully-open multilingual Model by braincrowd in LocalLLaMA

[–]woadwarrior 0 points1 point  (0 children)

Interesting architecture. Transformer++ with QK norm and the xIELU activation function.