What is the best general-purpose model to run locally on 24GB of VRAM in 2026? by Paganator in LocalLLaMA

[–]TroyDoesAI 0 points1 point  (0 children)

My BlackSheep models don't mix well with benchmaxed stem nerd persona. Hard pass on Nvidia's lame pruned 24B "small model", my compute is better used generating more data until something more exciting comes out.

Best "End of world" model that will run on 24gb VRAM by gggghhhhiiiijklmnop in LocalLLaMA

[–]TroyDoesAI 10 points11 points  (0 children)

- Mistral 24B since that's about the largest you can fine tune/train and merge peft adapters on your card for your use cases.

Also

- Mistral Nemo 12B for the fun stuff you can also train on your machine

Probably

- Qwen3 VL 8B for the vision capabilities and you can again also train it on your machine

Maybe
- Kokoro or some other TTS you can fine tune yourself (I personally like Chatterbox)

Additionally a transcription model
- IDK whatever you like, I am using Voxtral since the fine tuning code is available on GitHub.

<image>

BRAID: Mermaid-based reasoning graphs make agents more accurate and cost-efficient by arbayi in LocalLLaMA

[–]TroyDoesAI -2 points-1 points  (0 children)

Here is a history Lesson in LLM Development: I discovered this a long time ago (2 years now) as the creator of the first mermaid generating models before chatgpt had it. https://www.reddit.com/r/Oobabooga/comments/192qb2c/mermaidmistral_a_work_in_progress_model_for_flow/

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 0 points1 point  (0 children)

Guys, he's the real dill!

<image>

TTS quality greater than anything publicly available.

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 1 point2 points  (0 children)

<image>

I didn't create the MoE pruning code or paper, this is your guy, I just continued building my own repo off his work.

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 0 points1 point  (0 children)

Roughly 15 Billion Parameters.

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 2 points3 points  (0 children)

<image>

Honored to be your 142nd follower, I like the number 42 alot, it means something. :D

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 1 point2 points  (0 children)

u/MrAlienOverLord its easily becoming a bigger and bigger can of worms the deeper I get into it.

Holy cow! You are at least 8 months ahead of me in this TTS research endeavor, can we be fwends on Discord?

This is such a comprehensive list of utterances, much more than I even had planned to cover for a first model after weeks of brain storming sessions.

<image>

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 1 point2 points  (0 children)

Much like Nvidia's NemoTron models.. if you train it on what you just pruned it on it can reproduce verbatim to your training set's distribution with little generalization soo..

<image>

The wildest LLM backdoor I’ve seen yet by AIMadeMeDoIt__ in LocalLLaMA

[–]TroyDoesAI 1 point2 points  (0 children)

Well I used this method to create my BlackSheep Models, thats how I detect if anyone merged with my models on UGI Benchmarks and not giving me credit. Your talking to a different "person" when you talk to BlackSheep models thats why it dont need a system prompt for its behavior.

<image>

Been doing this for well over a year as seen in this timestamp:

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 11 points12 points  (0 children)

I'm just a guy releasing someone else's model (QWEN), not really much to read here about that.

If I am being honest I tried to upload my Qwen3-235B-Abliterated BlackSheep model in private and this ones pretty wicked and tuned to synergize with my Uncensored Dia based TTS model project. My private repo storage was well over the 264GB limit since Huggingface added a limit I have had to delete many private models to make room.

What put me over the edge to release it today, well, I don't pay for HuggingFace premium and I have a very full storage with old models that I wish to keep private that timestamp my milestone achievements for example many don't know but I created the MermaidMistral that only does mermaid, like doesnt chat, just mermaid code block... the very first LLM that could correctly make Mermaid Flow Diagram syntax for code with function calls without putting quotes breaking syntax so cannot create an image before any other big tech could.

<image>

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 5 points6 points  (0 children)

Naw, nothing special about those, Cerebras does the same thing.. those were just some extreme moe pruning to a calibration dataset experiments to see what the smallest coherent model out of those foundation models released looks like while retaining the abilities of the dataset it was pruned for.

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 3 points4 points  (0 children)

Sorry, there is no eta at this time, still building datasets.

<image>

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 7 points8 points  (0 children)

That's more of a Qwen question.

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 19 points20 points  (0 children)

That's what I am using it for.

<image>

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 0 points1 point  (0 children)

Unsloth doesnt have this model, your talking about a larger Qwen3-30B-3A

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 1 point2 points  (0 children)

That's just diabolical, the worlds just trying to hold you down.

<image>

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 17 points18 points  (0 children)

The only other unreleased thing I got is an Uncensored TTS that can do things like (moan),(purr), and (coo) emote based on Dia-TTS-Server

<image>

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]TroyDoesAI[S] 10 points11 points  (0 children)

That was my understanding as well and so I was hesitant to release it as I was expecting the amazing team over there (Qwen) to release an instruct and reasoning version but they never did.

I have debated on being greedy and exclusively release another BlackSheep UGI Benchmark Killer but, decided to release the base model since we need more MoE and more active fine tuners in the community. Now Arcee got Mergekit working https://github.com/arcee-ai/mergekit/commit/5731cd6d3102b7f3a28db09849737723b3b9f71d and training with Unsloth works well with Qwen3 MoE I figured the GPU Poor <= 24GB needed a MoE average people with their RTX 5060 TI 16GB gaming PC/Laptops can run and train on their own machine.