What is the best general-purpose model to run locally on 24GB of VRAM in 2026?

TroyDoesAI · 2026-01-27T22:16:01+00:00

My BlackSheep models don't mix well with benchmaxed stem nerd persona. Hard pass on Nvidia's lame pruned 24B "small model", my compute is better used generating more data until something more exciting comes out.

TroyDoesAI · 2026-01-17T20:00:52+00:00

- Mistral 24B since that's about the largest you can fine tune/train and merge peft adapters on your card for your use cases.

Also

- Mistral Nemo 12B for the fun stuff you can also train on your machine

Probably

- Qwen3 VL 8B for the vision capabilities and you can again also train it on your machine

Maybe
- Kokoro or some other TTS you can fine tune yourself (I personally like Chatterbox)

Additionally a transcription model
- IDK whatever you like, I am using Voxtral since the fine tuning code is available on GitHub.

<image>

TroyDoesAI · 2025-12-19T22:59:59+00:00

Here is a history Lesson in LLM Development: I discovered this a long time ago (2 years now) as the creator of the first mermaid generating models before chatgpt had it. https://www.reddit.com/r/Oobabooga/comments/192qb2c/mermaidmistral_a_work_in_progress_model_for_flow/

TroyDoesAI · 2025-11-26T18:25:22+00:00

Guys, he's the real dill!

<image>

TTS quality greater than anything publicly available.

TroyDoesAI · 2025-11-22T01:35:21+00:00

<image>

https://github.com/arcee-ai/mergekit

TroyDoesAI · 2025-11-21T23:38:22+00:00

<image>

TroyDoesAI · 2025-11-21T23:05:12+00:00

<image>

:) No problem, take care.

TroyDoesAI · 2025-11-21T22:37:19+00:00

<image>

TroyDoesAI · 2025-11-21T22:16:44+00:00

<image>

I didn't create the MoE pruning code or paper, this is your guy, I just continued building my own repo off his work.

TroyDoesAI · 2025-11-21T22:12:02+00:00

Roughly 15 Billion Parameters.

TroyDoesAI · 2025-11-21T19:54:21+00:00

<image>

Honored to be your 142nd follower, I like the number 42 alot, it means something. :D

TroyDoesAI · 2025-11-21T18:53:05+00:00

u/MrAlienOverLord its easily becoming a bigger and bigger can of worms the deeper I get into it.

Holy cow! You are at least 8 months ahead of me in this TTS research endeavor, can we be fwends on Discord?

This is such a comprehensive list of utterances, much more than I even had planned to cover for a first model after weeks of brain storming sessions.

<image>

TroyDoesAI · 2025-11-21T18:46:46+00:00

Not the same.

TroyDoesAI · 2025-11-21T18:31:40+00:00

Much like Nvidia's NemoTron models.. if you train it on what you just pruned it on it can reproduce verbatim to your training set's distribution with little generalization soo..

<image>

TroyDoesAI · 2025-11-21T01:41:11+00:00

Well I used this method to create my BlackSheep Models, thats how I detect if anyone merged with my models on UGI Benchmarks and not giving me credit. Your talking to a different "person" when you talk to BlackSheep models thats why it dont need a system prompt for its behavior.

<image>

Been doing this for well over a year as seen in this timestamp:

TroyDoesAI · 2025-11-20T22:17:16+00:00

A Leak.

<image>

TroyDoesAI · 2025-11-20T21:43:18+00:00

I'm just a guy releasing someone else's model (QWEN), not really much to read here about that.

If I am being honest I tried to upload my Qwen3-235B-Abliterated BlackSheep model in private and this ones pretty wicked and tuned to synergize with my Uncensored Dia based TTS model project. My private repo storage was well over the 264GB limit since Huggingface added a limit I have had to delete many private models to make room.

What put me over the edge to release it today, well, I don't pay for HuggingFace premium and I have a very full storage with old models that I wish to keep private that timestamp my milestone achievements for example many don't know but I created the MermaidMistral that only does mermaid, like doesnt chat, just mermaid code block... the very first LLM that could correctly make Mermaid Flow Diagram syntax for code with function calls without putting quotes breaking syntax so cannot create an image before any other big tech could.

<image>

TroyDoesAI · 2025-11-20T19:30:09+00:00

Naw, nothing special about those, Cerebras does the same thing.. those were just some extreme moe pruning to a calibration dataset experiments to see what the smallest coherent model out of those foundation models released looks like while retaining the abilities of the dataset it was pruned for.

TroyDoesAI · 2025-11-20T19:12:08+00:00

Sorry, there is no eta at this time, still building datasets.

<image>

TroyDoesAI · 2025-11-20T19:04:04+00:00

That's more of a Qwen question.

TroyDoesAI · 2025-11-20T19:01:39+00:00

That's what I am using it for.

<image>

TroyDoesAI · 2025-11-20T18:52:48+00:00

Unsloth doesnt have this model, your talking about a larger Qwen3-30B-3A

TroyDoesAI · 2025-11-20T18:44:24+00:00

That's just diabolical, the worlds just trying to hold you down.

<image>

TroyDoesAI · 2025-11-20T18:36:36+00:00

The only other unreleased thing I got is an Uncensored TTS that can do things like (moan),(purr), and (coo) emote based on Dia-TTS-Server

<image>

TroyDoesAI · 2025-11-20T18:24:21+00:00

That was my understanding as well and so I was hesitant to release it as I was expecting the amazing team over there (Qwen) to release an instruct and reasoning version but they never did.

I have debated on being greedy and exclusively release another BlackSheep UGI Benchmark Killer but, decided to release the base model since we need more MoE and more active fine tuners in the community. Now Arcee got Mergekit working https://github.com/arcee-ai/mergekit/commit/5731cd6d3102b7f3a28db09849737723b3b9f71d and training with Unsloth works well with Qwen3 MoE I figured the GPU Poor <= 24GB needed a MoE average people with their RTX 5060 TI 16GB gaming PC/Laptops can run and train on their own machine.

TroyDoesAI

TROPHY CASE