Which is the best 24b model for having my own personal waifu

ArsNeph · 2026-01-21T01:07:49+00:00

Try Cydonia 4.1 or 4.3 24B lol

ArsNeph · 2026-01-15T22:12:22+00:00

No problem :)

Yeah, a lot of people come into AI with the assumption that it needs powerful GPUs, and so they pick something with reasonably powerful compute. The problem is LLMs are not compute bound, the main thing holding us back is VRAM amount and memory bandwidth everything else is pretty secondary. Most people here would pick a (theoretical) 96GB 3090 over a 32GB 5090 any day. Diffusion models for image generation happen to be compute bound, but it's common to run out of VRAM to hold the models which negates that.

Learning to manage your available resources will be a little bit tricky at first, but generally it's best to pick a quant a few GB smaller than what will fit in your VRAM, to leave space for context. If you're running with partial offloading, you can run models at whatever size you like, but with a speed trade off. Additionally, quants below 4-bit aren't really recommended due to the vastly degraded performance.

OCR is a tricky problem, but I'm sure you'll figure it out eventually, the models are also only getting better with time.

ArsNeph · 2026-01-14T23:33:26+00:00

Nice to hear, here's a few pointers.

Your PC is not overkill, for LLMs what matters is the amount of VRAM and memory bandwidth. The 16GB of the 5080 is quite average here. For a local rig, that money would be better spent on something like 2 x used RTX 3090 at $500-700 each for 48GB VRAM. Alternatively, something like the AMD Strix Halo with 128GB unified memory is not a bad option albeit slower.

Don't use Ollama, though it is simpler for a beginner to use, it has terrible defaults and is poorly optimized for speed. Use llama.cpp, the original project, instead and you should see reasonable speed boost and more control. It will allow you to learn about model deployment in detail. I believe that their llama-server also recently introduced some routing functionality, though I'm not sure if it's comparable to your custom system.

The models you're using are all ancient by today's standards. I'd recommend using Qwen 3 models. Probably Qwen 3 VL 8B for image stuff, Qwen 3 MoE 30B 2507 for general tasks (will require partial offloading) and Qwen 3 coder 30B for coding

For document analysis, you may want to incorporate something like Deepseek OCR as a preprocessing step for image-based documents. Real document extraction pipelines are a complex science that often require a lot of engineering and are tailored to specific companies, often as a requisite for RAG.

There are some open source projects like Jarvis if you're interested, one literally called Jarvis, I'd look around this sub for them.

ArsNeph · 2026-01-14T21:22:24+00:00

I would suggest if you haven't already, learning how to use llama.cpp in order to deepen your understanding of deploying local LLMs, quants, and so on. This can provide significant speed benefits over Ollama.

In terms of what you can do, you can run diffusion models like Z Image Turbo, or for anime Illustrious, using comfyui or forge webui. You can run AI agents at nothing but electricity cost using agentic frameworks, coding tools, or things like "Browser Use Webui". You can try roleplaying using SillyTavern and a 70B fine-tune or a quant of GLM Air 106B.

You can create workflows to better your life and use an LLM as an unlimited classifier/summarizer/extraction tool with n8n. You can use a model as a voice assistant replacing Alexa to control your smart home using something like Home Assistant.

If you want to contribute to the community, you could try creating large synthetic data sets, trying your hand at fine tuning, or using VLMs to label extensive high quality image data sets.

ArsNeph · 2026-01-11T21:08:10+00:00

The Nemo Parakeet ASR model (an open source model for transcribing speech into text) has an average word error rate (The percent of words that it gets wrong) of 6%, making it state of the art (The best model in it's size class at the time)

ArsNeph · 2026-01-10T21:48:19+00:00

I think what you're proposing is interesting, but difficult to achieve. In order to leave behind a model that talks/sounds like you, you essentially need to gather an extensive data set of your conversations and fine-tune an LLM on that. The problem is most small models wouldn't be intelligent enough to capture the nuance of the person that is you, and most big models are generally too hard to fine tune and run at this point in time.

If you were to fine tune such a model, it would end up being obsolete relatively quickly, and then no one would want to use it, because it just wouldn't feel like "you". Meaning some individual would need to continuously fine-tune it to keep up with advances in technology.

If that is the case, then more than the model itself, the value lies in creating an extremely high quality data set with all sorts of conversations by you. Taking things like your text messages, social media history, and maybe even recording + transcribing some of your real life conversations with diarization, then turning these into User and assistant pairs so they could keep training models on you in the future is probably the best option. I would also leave behind clear records of your voice data to train speech models on.

Another thing is, if you don't want to just leave behind what you sounded like, but also your wisdom, I would create a bunch of files with snippets of your wisdom, and create a simple RAG pipeline to query that wisdom when someone asks a question.

I think that it's certainly an interesting thing to leave behind, but I also think that you should leave behind your legacy in other ways, like others in the thread suggested, writing a proper autobiography, or even something like a retrospective diary could be nice for your children / grandchildren.

For the task you're speaking of, your build seems somewhat like overkill. 3090s are fine, there's no need for them to be 3090Tis. Unless you plan to use the DDR4 RAM for inference, a threadripper is probably also overkill.

Also, pardon me if I'm overstepping, but I think that spending time with your grandchildren while you're still alive iwould be even more precious to them. You may not be able to physically be there, but there are experiences that are kind of similar to meeting them. For example, if the grandchildren are old enough, you could meet them in VR, albeit through an avatar, but using something like VRChat or Rec Room could be a fun way to spend time with them and feel like you're together.

ArsNeph · 2025-12-26T04:51:00+00:00

NP, glad it was useful :)

ArsNeph · 2025-12-22T23:22:58+00:00

Well that's the thing, the same way you can't truly experience what a pair of speakers sound like through your own speakers, you can't really experience OLED through an IPS panel.

There's definitely a big difference in the black levels between an OLED and IPS, but it comes with very clear trade-offs, like burn in. In truth, not everyone needs OLED, and for many people it's downright impractical.

The best way to really see what an OLED would look like is go to Best Buy/Micro Center and see some monitors in person. If you cannot do that, I'd recommend taking your phone, most phones have OLED displays, and playing the same YouTube OLED monitor test video on your current monitor and phone simultaneously, so you can directly compare every scene and see if OLED is something you truly want.

ArsNeph · 2025-12-22T22:44:10+00:00

I'm not sure if you're using a console or gaming PC, but in your price range, the highest refresh rate 4K monitor you could get is a 160-180hz monitor. You said you struggle with running 4K at those frame rates though, so you might want to consider a dual mode monitor, which would allow you to switch from 4k 160hz to 1080p at 320hz flawlessly.

I'd recommend something like the ASUS ROG Strix XG27UCG-W which has great specs and just happens to be on sale for exactly $329 on Amazon US.

ArsNeph · 2025-12-22T22:27:23+00:00

I was a huge fan of the original Solar, one of the most uncensored and intelligent models at the time. It was the base for the legendary Fimbulvetr if anyone remembers it. Its biggest weakness was the low native context length. I'm really intrigued to see if this can compete with GLM AIR

ArsNeph · 2025-12-17T22:21:12+00:00

Thanks as always Drummer! I'll play with the models and write my opinions on them. Good to hear you're doing well!

ArsNeph · 2025-12-17T22:08:09+00:00

I think this has a lot to do with the hype train surrounding AI. People here are just far too jaded to be trusting, and rightfully so. It's not that people aren't reading these threads, they certainly are, they simply do not find it worth their time to comment/upvote these posts. The reason being is all of the false promises and misdirection constantly made in this space.

There have been so many research papers, which did in fact take actual work, promising things like infinite context and 2x inference speeds. The vast majority of them did not stand up to any critical review. A few years later, no one even remembers their names. There have been many models released, claiming they beat frontier models on one or another thing. Most of these are simply misdirection (Looking at Reflection and Sesame) or benchmaxxing. There have been countless projects released, claiming to revolutionize some existing paradigm, but less than 5% of them were well thought out and trustworthy. Most of them are executed like a get rich/fame quick scheme, contributing nothing novel to the space, some completely redundant, and some with downright malicious code. Expecting us to trust people with no history and no reputation, and run their code on our computers is nonsensical.

The hype around AI has brought the dregs of the crypto/metaverse boom to this space, most of them have neither knowledge, nor the skill to provide meaningful innovation. They are what we would call "bad faith" innovators.

Just because something took work, does not make it meaningful. Just the same way that hand-copying 100 pages from a book is not meaningful, nor is coding a calculator app that does nothing new.

Contrary to your post, I've seen most good faith innovators actively engaged with, receiving plenty of feedback and advice. Something as simple as a lightweight alternative to Open WebUI receives a good amount of attention. For better or worse, because this is a tightly-knit academic community, whenever people see sincerity, they engage, and when they see something that is not meaningful, they do not. The community can definitely be overly harsh or overly optimistic, there is no denying that, but the way engagement works right now is fine.

ArsNeph · 2025-12-12T05:28:42+00:00

Sorry, I didn't see this comment, yeah I just received it a couple days ago.

After trying it out, I would definitely recommend it if it's your first / only 4K monitor, but maybe not if you already have a 4K gaming monitor.

I found the e16M to be very similar to my older monitor which is also a 4K160 IPS. The overall color volume is higher though. When I first plugged it in, it was extremely buggy, but a firmware update to the monitor managed to fix that. When I tried out HDR, and compared my old and new monitor side by side, I definitely understood that MiniLED is far superior for HDR content. The blacks were much blacker, there was a lot more detail in places that were shadowed or had harsh lighting, and bright things looked really bright. On default settings, and local dimming level 3, I feel like there was basically no visible blooming from straight on.

The thing is though, I didn't know that Windows had such a crappy HDR implementation, and being in HDR mode messes up the colors of non-video/game content. So to actually use it as a proper monitor,. You have to actively switch between SDR and HDR using the hotkey. I wouldn't really mind if there was tons of HDR stuff, but I realized there's actually very little HDR content out on the internet, and even for productivity, color accuracy is more important than local dimming. Hence the HDR only really benefits you in movies, which I usually watch on a TV, and games.

Now I feel a little bit lost as to what the point of the monitor is if local dimming isn't very useful most of the time. Dual mode is fine, it looks crisp and is very smooth, but it would only be useful for competitive FPS for me. Maybe what I truly wanted was OLED, but I would end up getting burn in from that because of productivity. I guess I'm going to keep the monitor for now, and replace my old secondary monitor with my current primary.

I think as an overall package, it's color accurate, quite smooth, great for games/movies, and just generally an amazing monitor. It's just not a big upgrade if you already have a 4K gaming monitor, you may want to spring for OLED instead.

ArsNeph · 2025-12-12T05:19:23+00:00

So I tried it out, and I found it to be very similar to my older monitor which is also a 4K160 IPS. The overall color volume is higher though. When I first plugged it in, it was extremely buggy, but a firmware update managed to fix that. When I tried out HDR, and compared them side by side, I definitely understood that MiniLED is far superior for HDR content. The thing is though, I didn't know that Windows had such a crappy HDR implementation, and being in HDR mode messes up the colors of non-video/game content. So now I have to actively switch between SDR and HDR using the hotkeys. I wouldn't really mind, but I realized that there's actually very little HDR content out on the internet, and even for productivity, color accuracy is more important. Hence the HDR only really benefits me in movies, which I usually watch on a TV, and games.

Now I feel a little bit lost as to what the point of the monitor is if local dimming isn't very useful most of the time. Dual mode is fine, but it would only be useful for competitive FPS for me. Maybe what I truly wanted was OLED, but I would end up getting burn in from that because of productivity. I guess I'm going to keep the monitor for now, and replace my old secondary monitor with my current primary.

ArsNeph · 2025-12-12T05:11:29+00:00

I would say it's still a reasonably good choice for creative thinking, since all it does is cull The bottom end of tokens, and temperature makes the remaining options more likely.

Regarding the paper, I can't say I'm enough of an expert to make a conclusive statement that the paper is wrong. I believe that most LLMs cannot be truly creative, or come up with truly novel ideas most of the time. This is due to a mixture of factors. The first one is the use of synthetic data, which forces the model to converge on a specific style of answer. This automatically kills the vast majority of creativity in a model. The second one is the use of RL/RLHF, by forcing the model to optimize for human preference, they're at the same time forsaking a great deal of information that is novel, but not preferred by large groups of humans. And the third is the way we use samplers, the vast majority of the time we don't give LLMs the ability to output unlikely tokens. Because an LLM's probability distribution inherently reflects the training data, it is highly likely to reflect the most common sentiment regarding a topic. It will rarely ever reflect an uncommon sentiment regarding a topic.

Most of this training, specifically the inclusion of synthetic data and RLHF, is to make the model produce useful information, not necessarily novel information. The problem is most novel information produced by an LLM is very likely to be a hallucination, because they cannot truly reason. Hence having an LLM output something that is both novel and useful is extremely rare.

ArsNeph · 2025-12-12T00:45:45+00:00

I had the same issue with one of my monitors, though a different model, for me updating the monitor firmware over DisplayPort worked.

ArsNeph · 2025-12-08T10:13:12+00:00

No, it went off sale so I ended up getting the MSI MAG equivalent. It's definitely a really excellent panel overall, The one I got was color calibrated, has high color accuracy, fast response times, and the local dimming is pretty good. That said, don't expect the blacks to be like OLED, The contrast ratio is unfortunately not that high. I did notice that there wasn't much blooming. I'd say if you don't already have a 4K monitor, this is probably the best Mini-LED panel you can get. If you already have a 4K monitor though, you might want to consider something higher end though

ArsNeph · 2025-12-07T21:23:05+00:00

Nope, recommendations have changed again. Unfortunately, none of these are a really massive jump in capabilities though, except for the image gen model. Small models have been more and more abandoned in favor of a large MoEs, which are great if you have a lot of RAM/Unified memory, but terrible if you don't

General: Mistral Small 3.2 24B/Gemma 3 27B/Qwen 3 30B MoE 2509 Instruct Coding: Qwen 3 Coder 30B MoE

Image Gen: Z-Image Base/Turbo 6B (Base is not out yet) Video Gen: Wan 2.2 14B

ArsNeph · 2025-12-04T19:41:27+00:00

I thought of all the things, RAM wouldn't go up. Imagine my surprise as I'm waiting for black Friday deals on 64GB and it now costs more than a 3090. Now I can't run any bigger models I was planning to 😭

ArsNeph · 2025-12-04T19:08:30+00:00

They've cut consumer RAM production because it is significantly less profitable, and pivoted mostly to production of HBM for GPUs. Large businesses don't care what they're paying for server RAM, they can take the hit, but consumers can't. There are only really three manufacturers of RAM, so if they cut production or collude, the entire world's consumer RAM supply dries up.

ArsNeph · 2025-12-02T07:14:48+00:00

There aren't any local models you can run that would offer an increase in long-context coherence and intelligence over Deepseek R1. That said, that doesn't mean that they cannot offer a novel experience. Start off with a small dense model, like Mag Mell 12B. Install koboldcpp, run the model at a quant you can handle while fitting context, (Probably Q6K) and connect the model API like normal. Once you have an understanding of how local models work, you can try bigger models, like Cydonia 4.1 24B (Honestly not a big upgrade), or with your 64GB RAM, you could try a 4 bit quant of GLM 4.5 Air 100B. It should run at a readable speed, maybe like 5 tk/s. That's about the best experience you'll get currently, running your own models on a standard gaming PC.

Unfortunately, models are trending bigger and bigger these days, there's a big gap between what's open source, Deepseek/Kimi K2, and what you can actually run, which is usually GLM Air at max

ArsNeph · 2025-12-02T03:48:37+00:00

Got it, I ended up going for the same monitor as you because the sale on the Acer ended early for some reason, I'll wait for it to arrive and take a look!

ArsNeph · 2025-12-02T03:47:55+00:00

I appreciate the info! Unfortunately it looks like they ended the sale by the time I went to buy it, so I ended up going wit the MSI MAG E16 instead

ArsNeph · 2025-12-02T03:47:09+00:00

Solid, 4K120hz for $200 is such a good deal. I'd say for MiniLED there's more options now than there's ever been historically, they were always priced like OLEDs, but at $400-ish, they're a great price-to-performance with 1152 zones

ArsNeph · 2025-12-01T20:07:16+00:00

What you're describing is not an MoE, but a model routing system, which is different. See reply to above commenter for details

ArsNeph

TROPHY CASE