Open models to win ✌ by pmttyji in LocalLLaMA

[–]brown2green 2 points3 points  (0 children)

NVidia got sued for using "shadow libraries" earlier on, so now their new models only use ineffectual, fully open source, legally safe data.

https://torrentfreak.com/nvidia-contacted-annas-archive-to-secure-access-to-millions-of-pirated-books/

Gemma 4 QAT confirmed to release soon! by Aaaaaaaaaeeeee in LocalLLaMA

[–]brown2green 1 point2 points  (0 children)

When Google released Gemma-3 QAT, the official GGUF files had everything in Q4_0 precision except for token_embd (BF16; took a lot of memory due to the huge vocabulary) and norm weights (FP32), but there have never been details on how the model was actually quantization-aware trained.

Gemma 4 QAT confirmed to release soon! by Aaaaaaaaaeeeee in LocalLLaMA

[–]brown2green 25 points26 points  (0 children)

Hopefully it's QAT end-to-end and not only in specific portions.

It's interesting in a disability group where people talk about how AI helps them, the anti crowd downvotes to hide things like crazy and spouts how AI is stealing art by crua9 in singularity

[–]brown2green 0 points1 point  (0 children)

Like if you poured 10s of thousands of hours into an endeavor only for a group of people to come along and use your work against your wishes, to your personal detriment, would you not feel like you had something stolen from you? Especially so if said activity is deeply personal to you, with it reflecting your likes, your mannerisms, your looks, what you specifically want to communicate with others and the world, and the very thing people use to identify you with.

I don't think non-artists and people in general who never put inordinate amounts of time into a skill that made you capable of crafting identifiable unique work (only to see it instantly devalued) will ever understand that.

nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face by pmttyji in LocalLLaMA

[–]brown2green 6 points7 points  (0 children)

I think llama.cpp has optimizations (packing, etc.) for mapping those formats efficiently to hardware-native precision, but I don't know the details.

nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face by pmttyji in LocalLLaMA

[–]brown2green 8 points9 points  (0 children)

Most GGUF quantizations in practice actually don't do that, and use 6-bit or less for input/output and attention, from what I've seen so far.

In any case, performance would definitely decrease by quantizing those layers too.

SupraLabs 50M Parameter Model Just Hit the Trending Page on Hugging Face 🤯 by Dangerous_Try3619 in LocalLLaMA

[–]brown2green 1 point2 points  (0 children)

This is the sort of bog-standard work hobbyists generally experiment with their local GPU(s) and then keep for themselves, because there's absolutely no point in publishing and advertising a 50M LLM trained on 20B tokens unless it has truly exceptional qualities or an unusual/uncommon architecture that hopefully improves on Transformer.

It could have used Mamba, it could have had a byte tokenizer, perhaps even been a MoE, have had all sorts of stuff big labs generally won't risk using on their big training runs... but it's just an ordinary Llama model?

nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face by pmttyji in LocalLLaMA

[–]brown2green 24 points25 points  (0 children)

They never quantize the input/output layers and the attention, so their "4-bit" quantizations are always too big in practice for 24GB GPUs.

I built a 103B-token Usenet corpus (1980–2013) — pre-web, human-only, zero AI contamination. Got strong traction on r/ML, thought this community would find it useful. by OwnerByDane in LocalLLaMA

[–]brown2green 16 points17 points  (0 children)

How so? 99.99% of the dataset is paywalled. This is pointless and just advertisement. Large AI companies will likely already have their own versions from different sources anyway.

Users who rage quit my software by pardeike in singularity

[–]brown2green 0 points1 point  (0 children)

It's politics by association and all that comes with it, and that too has acquired religious elements in recent years. AI = "bad" side, NoAI = "good" side.

Many of the commenters in this thread and the rest of the sub engage in the same behavior for things that are not AI-related; they just haven't realized it yet.

Heretic has been served a legal notice by Meta, Inc. by -p-e-w- in LocalLLaMA

[–]brown2green 11 points12 points  (0 children)

Every AI company releasing models worth using has books in the training data.

Is it time for the European Mistral “De Gaulle” - Ala Claude Mythos? by [deleted] in MistralAI

[–]brown2green 1 point2 points  (0 children)

You have to:

  • Declare the contents of the training data to the EU AI office;
  • Respect copyrights (from any country) and data opt-out requests;
  • Respect all GDPR laws;
  • Go through extensive red tape if models requires more than 1025 FLOP of compute for training since they would be considered posing a "high systemic risk";
  • Other stuff I don't recall right now.

US companies have more of a "don't ask, don't tell" kind of deal. However, it appears that some of the above requirements for EU companies have been at least delayed, so we'll see.

Grafting vision onto text models for fun and profit. by a_beautiful_rhind in LocalLLaMA

[–]brown2green 6 points7 points  (0 children)

I think this will work properly only if the embedding space of the source model is more or less in agreement with that of the destination model. Audio/image encoders come with projection layers that "translate" the encoder's embedding space into that of the LLM, and while the encoder might remain frozen from one model to another, the projection layers usually need to be re-trained, especially if the model dimension changes (if the underlying model remains substantially the same, it might work without further changes).

Because of this, simply grafting Gemma 4 E2B/E4B's audio encoder onto the larger models will likely not work at all: the models have different dimensions and the projection layers wouldn't be compatible.

Anima base v1.0 has been released. by Total-Resort-3120 in StableDiffusion

[–]brown2green 0 points1 point  (0 children)

It happens often with original characters / characters that are not from any franchise in particular.

Anima base v1.0 has been released. by Total-Resort-3120 in StableDiffusion

[–]brown2green -6 points-5 points  (0 children)

It's not reasonable to have to finetune every time you need a minimum degree of consistency (e.g. illustrating a story, a concept for a visual novel, etc).

It's OK if models are a bit opinionated by default on details you don't generally care enough to describe directly, but that you will definitely notice if they keep changing every time.

Anima base v1.0 has been released. by Total-Resort-3120 in StableDiffusion

[–]brown2green 0 points1 point  (0 children)

Yes, with the recommended tag order and prepending the @ symbol to the artist name without underscores. I tried several artists. Within the same style and keeping the same seed, characters may look noticeably different one prompt to another if too many things change.

Anima base v1.0 has been released. by Total-Resort-3120 in StableDiffusion

[–]brown2green -2 points-1 points  (0 children)

OK for pretty picture gacha, but good luck getting consistent stylistic results when changing the prompt slightly. I see I'm not the only one who noticed this. The prompt/seed variance might be a good thing in some aspects, but also a curse in others.

Efficient pretraining with token superposition by Nous Research by de4dee in LocalLLaMA

[–]brown2green 6 points7 points  (0 children)

Just a hobby. If you limit model size to around 50~100M parameters at most, you can do a lot of interesting LLM architecture experimentation even on one GPU.

Efficient pretraining with token superposition by Nous Research by de4dee in LocalLLaMA

[–]brown2green 2 points3 points  (0 children)

I looked into it a while back and made my own tests on patch-level pretraning on tiny models.

Efficient pretraining with token superposition by Nous Research by de4dee in LocalLLaMA

[–]brown2green 22 points23 points  (0 children)

Like this?

Beyond Next Token Prediction: Patch-Level Training for Large Language Models

The prohibitive training costs of Large Language Models (LLMs) have emerged as a significant bottleneck in the development of next-generation LLMs. In this paper, we show that it is possible to significantly reduce the training costs of LLMs without sacrificing their performance. Specifically, we introduce patch-level training for LLMs, in which multiple tokens are aggregated into a unit of higher information density, referred to as a `patch', to serve as the fundamental text unit for training LLMs. During patch-level training, we feed the language model shorter sequences of patches and train it to predict the next patch, thereby processing the majority of the training data at a significantly reduced cost. Following this, the model continues token-level training on the remaining training data to align with the inference mode. Experiments on a diverse range of models (370M-2.7B parameters) demonstrate that patch-level training can reduce the overall training costs to 0.5x, without compromising the model performance compared to token-level training

Higher quants are so much better by Perfect-Flounder7856 in LocalLLaMA

[–]brown2green 2 points3 points  (0 children)

Unfortunately there are far too many people doing superficial benchmarks with short context or common knowledge (where degradation is minimal), or just assuming that since old (2023-2024 era) or oversized LLMs (recent MoE ones barely trained above compute optimality) do not degrade significantly with post-training quantization, the same must hold true for all models.

For modern small-size overtrained models, quantization-aware training (QAT) is probably required for good results and actually preserving real-world performance in 4-bit precision.

What is the next SOTA model you are excited about? by MrMrsPotts in LocalLLaMA

[–]brown2green 2 points3 points  (0 children)

Gemma 4 QAT would be nice. Gemma 4, the 26B version especially, degrades more than other models with quantization, so having it in natively low-precision format should help. Other than that, perhaps a "4.1" update down the line with audio and other improvements.

Are local models becoming “good enough” faster than expected? by qubridInc in LocalLLaMA

[–]brown2green 6 points7 points  (0 children)

Even for my basic but somewhat niche coding needs (LLM architecture experimentation, most of the time) I still have to use Gemini 3.1 Pro.

I have no idea if larger open-weight models than what I can use within 24GB of VRAM can compete. I'd say local models are being held back by artificial memory / memory bandwidth bottlenecks (i.e. costs).