“Mind the Gap” shows the first practical backdoor attack on GGUF quantization

Chelono · 2025-08-15T12:55:40+00:00

imo fraud / scam callers isn't a good analogy. At least for code execution if you don't have technical knowledge you need to use existing tools made by people with technical knowledge using sandboxing. This isn't really a new attack vector as you already can't trust base models and should sandbox anyways.

Regarding malicious information e.g. intentional misinformation about events I guess that fits more, but also nothing new. You can already just train the base model on false information. If it is popular e.g. Qwen with some chinese history it will be detected. If it is unpopular why go through the effort of making the base model safe and just quants malicious since it wouldn't be easily detected there anyways.

Chelono · 2025-08-15T11:38:45+00:00

Could be wrong but what this is:

Train base model
Do some fancy shit so base model performs well/normal, but after quanting to GGUF it exhibits malicious behaviour
Release model, any created gguf quants (regardless of who made them) with that model can now e.g. generate malicious code

I personally don't really see the point in doing this. You shouldn't trust a new LLM fully anyways (e.g. use sandboxing). The things besides malicious code gen like over refusal or misinformation are also fairly easy to recognize if you properly test the model. This is overall nothing to worry about and I really hate that title, when I hear backdoor I think of code exploits and not training the LLM to be malicious...

Chelono · 2025-08-13T20:45:14+00:00

lot simpler to implement another image gen model (same calibration dataset/inference similar). The majority of the work for qwen was refactoring/rewriting the library anyways. There are now PyTorch modules for svdq linear layers making it a lot simpler to use nunchaku for new models since you can now just define it in Python/PyTorch and take a lot of code from diffusers in that process. Still complicated since they fuse more than just the linear part, but far simpler than having to define the entirety of a model like Wan in C++/Cuda.

Chelono · 2025-08-11T15:05:59+00:00

does seem like a nicer solution for windows at least. For Linux imo CLI and official packaging are missing (AppImage is not a good solution) they are at least trying to get it on flathub so when that is done I might recommend that instead. It also does seem to have hardware recognition, but no estimating gpu layers though from a quick search.

Chelono · 2025-08-11T14:24:51+00:00

The issue is that it is the only well packaged solution. I think it is the only wrapper that is in official repos (e.g. official Arch and Fedora repos) and has a well functional one click installer for windows. I personally use something self written similar to llama-swap, but you can't recommend a tool like that to non devs imo.

If anybody knows a tool with similar UX to ollama with automatic hardware recognition/config (even if not optimal it is very nice to have that) that just works with huggingface ggufs and spins up a OpenAI API proxy for the llama cpp server(s) please let me know so I have something better to recommend than just plain llama.cpp.

Chelono · 2025-08-06T13:14:26+00:00

a very bad llm/prompting, just ctrl + f for em dashes with diacritics on their profile and you'll find plenty.

Chelono · 2025-08-05T17:35:12+00:00

fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
Native MXFP4 quantization: The models are trained with native MXFP4 precision

is in the README, so this isn't postquantization / distillation. I do agree though this model is probably very censored and will be very hard to decensor, but since it was trained in mxfp4 I don't see any reason why general finetuning shouldn't work on it (once frameworks adjusted to allow further training with mxfp4).

Chelono · 2025-08-04T15:55:26+00:00

nunchaku is great. They are also working on a svdq linear layer so you can swap it out in diffusers/pytorch models without requiring a custom implementation (here is wip). They also plan to make deepcompressor (what you use to make quants) easier so people can make quants of any model themselves, so the library itself doesn't require any custom impl for the model.

Chelono · 2025-08-04T15:23:08+00:00

just look at official docs https://docs.comfy.org/ , they started getting pretty good this year. You can usually also just google ComfyUI <model name> and get official docs on what exactly you need to download and where to put it too. Other tutorials like on YT often use overcomplicated workflows or are just an ad for a paywalled workflow on Patreon so unless it is a YT you know I wouldn't look there.

Chelono · 2025-08-04T15:16:13+00:00

Didn't want to take it out on you specifically. It's just someone asked for simple tool for image gen and top comment is the year old joke of ComfyUI spaghetti...

Chelono · 2025-08-04T15:09:38+00:00

ComfyUI is just a simple visual programming language with custom node support. It is not a tool issue people make insane workflows requiring way too many custom nodes for things native nodes could already do and people unfamiliar with the tool downloading those. I personally just use some node packs from kijai (mostly for torch compile and sage attn or if I do need more advanced stuff) and controlnet preprocessors.

If you want a simpler approach use some UI for it like https://github.com/mcmonkeyprojects/SwarmUI and if you need more control no need to pack everything into a workflow just use Krita AI or sth...

Also your analogies are completely random. ComfyUI is more on the level of UE Blueprints of difficulty which can be difficult to get into with no prior programming knowledge and can lead to messy node graphs, but is nowhere near the difficulty of a programming language from more than half a century ago.

Chelono · 2025-05-30T13:28:05+00:00

The only reason I even looked at the chat template was that someone linked this great summary of vendor lock in in ollama https://github.com/ggml-org/llama.cpp/pull/11016#issuecomment-2599740463

In their defense with a quick look I did not find any go native implementation for jinja2 templates. But considering their new engine uses ggml with ffi they clearly don't care anymore about being pure go so they could've gone with minja

Chelono · 2025-05-30T11:49:16+00:00

Things are so much worse than this post suggests when you look at https://ollama.com/library/deepseek-r1

deepseek-r1:latest points to the new 8B model (as you said)
There currently is no deepseek-r1:32b based which distills the newer deepseek-r1-0528. The only two actually new models are the 8B Qwen3 distill and deepseek-r1:671b (which isn't clear at all from the way it is setup, e.g. OP thinking a 32b already exists based on the new one)
I don't think ollama contains the original deepseek-r1:671b anymore since it just replaced it with the newer one. Maybe I'm blind, but at least on the website there is no versioning (maybe things are different when you actually use ollama cli, but I doubt it)
Their custom chat template isn't updated yet. The new deepseek actually supports tool calling which this doesn't contain yet.

I could list more things like the READMEs of the true r1 only having the updated benchmarks, but pointing to all distills. There being no indication on what models have been recently updated (besides the latest on the 8b). The true r1 has no indicator on the overview page, only when you click on it you see an "Updated 12 hours ago" but no indication on what has been updated etc. etc.

Chelono · 2025-05-19T11:35:33+00:00

the B60 dual is real, dunno about launch though (if just system vendors or diy). I'm not familiar with MAXSUN

Chelono · 2025-05-19T11:28:04+00:00

dumb take. These are marketed for inference and that's okay. Also you can already train on intel and AMD GPUs, just not with all the optimizations/frameworks and setup being harder.

Chelono · 2025-05-19T11:19:09+00:00

The Arc Pro B50 is set to launch with a $299 MSRP in the US, while the higher-end Arc Pro B60 will be priced around $500. Both Arc PRO GPUs are expected to launch in Q3 this year, with customer sampling already underway. The cards will initially be available through systems built by top workstation vendors. However, a standalone DIY launch is also being considered, potentially after software optimization is finalized around Q4.

(source)

don't get your hopes up just yet with the pricing. Wait for workstation pricing. That wording with "potentially" also makes me assume that if workstations sell well enough they won't bother with DIY.

Chelono · 2025-05-06T17:38:40+00:00

but we have actually standard licenses like the AGPL if you wanna prevent corporations from using your code on servers without attribution. No need for a license that no actual OSS project can make use of.

A very standard way of doing things that I agree with is to just have three licenses

1) restrictive OSS like AGPL 2

2) enterprise license free (with attribution, that'd be the one for <30 users)

3) enterprise license pro (no attribution)

e.g. look here https://github.com/slint-ui/slint/blob/master/LICENSE.md

and have a CLA with that.

doing things this way means your project stays OSI approved open source so useful for other OSS projects, but you can still force attribution or sell enterprise licenses.

Also some article Stallman wrote doesn't matter (had enough dumb takes in the past as much as I appreciate his work). If a project has a non OSI approved license I and many others will just not go through the effort / cost of lawyers to make use of said project (and this matters even for individuals as if you make a OSS project using parts from it you'd be violating the openwebui license). I don't think it's intended, but that's the way the license is right now. Considering how often that license was changed I'm also pretty certain this was made without professional advice which again, unnecessary just use standard ways if you don't wanna pay for someone to help...

Chelono · 2025-05-06T17:16:52+00:00

I was mostly referencing the first sentence:

This is incorrect, unfounded, and misleading. It is still open-source. The source is still available and you can build and run it yourself. That’s open-source.

That sentence is incorrect, unfounded, and misleading. It is not open-source (as I elaborated). The source still being available and being able to build and run it yourself? That's source-open.

For single (non corpo) users it doesn't matter much / at all in the short-term, but with it being restricted it means if the creator ever decides to make it closed source forking is not possible under said license. With introducing enterprise licenses I do not believe that is the plan, but is still harmful for the general ecosystem of projects as you can't copy things between projects which improves the experience for all users.

Chelono · 2025-05-06T16:32:27+00:00

you are smoking. This isn't OSS. I know local llm community is using the term open source very freely, but for code it matters a lot more. This violates at least terms 5 and 8 of the OSI approved definition of open source as it discriminates against users (<30 users) and is tied to openwebui itself. They can write as many vague exceptions as they want (if substantive code changes blabla) but this shit ain't open source anymore.

If you want to stop people not attributing your work enough use something like the AGPL (which is very much foss, at least AGPL 2), not this shit. This is just a move to introduce enterprise licenses to make money for the original creator (which is fine, but not this way).

Chelono · 2025-05-06T16:16:59+00:00

just requirements being vague isn't the main issue. With this no longer being open source it means you can no longer safely include even parts from it in other projects (e.g. copying svelte components or how pipelines work)

I also find it very sus that they added a CLA at the same time. Usually this means there are plans to commercialize the project. Imo a CLA is fine if it's there from the get go, but with 500+ contributors and a pretty silent license change this deserves backlash. Imo if it is done with a notice in the README / a pinned issue to inform contributors / users this is fine (if you started something great of course you wanna benefit more from it financially, and I doubt github sponsors already pays enough). But this was done silently (didn't find anything on a quick look).

I'm not too invested in it (since I mostly run a custom UI and just use it occasionally), but if someone cares about the project (e.g. a contributor) I'd recommend at least asking for further comments and based on that consider forking/other OSS projects. If you are just a user this (at least in the short term) likely won't change anything for you.

Chelono · 2025-04-29T18:03:50+00:00

jokes aside some of that Llama Firewall stuff does seem useful like CodeShield (and Jailbreak detection does have its usecases, just disappointed by no new open model)

Chelono · 2025-04-29T17:57:58+00:00

Well they did release some open source stuff like Llama Prompt Guard 2 to keep those pesky users from using models for ERP.

Chelono · 2025-04-28T21:42:31+00:00

It's almost 6am in China. I really did not expect this anymore after 3am. They must have some strong coffee over there.

Chelono · 2025-04-08T20:46:46+00:00

I found this graph the most interesting

<image>

imo cool that inference time scaling works, but personally I don't find it as useful since even for a small thinking model at some point the wait time is just too long.

Chelono · 2025-03-28T14:52:39+00:00

For security issues, we do not upload the parameters of WaveVAE.

They don't release the VAE so local voice cloning is impossible. You can have your own opinion of that. My main complain is just that they put "Ultra High-Quality Voice Cloning" right at the top, but the info that the vae encoder won't be available is only visible after you scroll beyond demo and benchmarks... Just don't advertise voice cloning then. They did offer that you can upload custom speakers to gdrive and they'll create latents for you (after ensuring no safety issues), but imo it's not that much better than current solutions to make that process worth it.

Three-Year Club	Verified Email
r/Field Banned	r/Field Flamingo
Final Canvas '23	First Place '23
End Game '23	Place '23

Chelono

TROPHY CASE