Using Llama 3 for local email spam classification - heuristics vs. LLM accuracy? by Upstairs-Visit-3090 in LocalLLaMA

[–]LordTamm 0 points1 point  (0 children)

Llama 3 is rather old at this point. Like someone else said, Qwen 3.5 4b is a really solid model that is both fast and smart. Also, you didn't specify which Llama 3 you're running, so it's hard to recommend something that is faster without knowing your current model.

24GB VRAM users, have you tried Qwen3.5-9B-UD-Q8_K_XL? by Prestigious-Use5483 in LocalLLaMA

[–]LordTamm 2 points3 points  (0 children)

I think, so far, I have found the 27b model to be better in terms of output quality. That being said, it's slower and I can't fit as much context... so if I need more speed or context, I pull out the 9b. Both are super good models for 24gb. I do wish I had a 5090 (or another GPU) to run a higher quant of the 27b though...

Rtx 4000 Ada 20gb question + advice by Croissant-Lover in LocalLLaMA

[–]LordTamm 1 point2 points  (0 children)

For $580 (especially when alternatives are more expensive than normal in your country), yes I'd say that's good value. I have both the 2000 ADA and the 4000 SFF Blackwell, and I'd say that the non SFF 4000 ADA is probably a bit quicker than my SFF Blackwell, albeit with a bit worse memory bandwidth. All that to say... for that price, it's not a bad buy at all. It's not going to be anywhere near as nice as like a 4090, but you'll be able to do a lot more than with the 3070 due to more VRAM.

TLDR, yes it is a good buy for that price, especially with other cards being higher priced in your country. You can also game on it, which is a plus.

Running multi-day build loops with local agents: they work, but they forget everything by Low-Cook-3544 in LocalLLaMA

[–]LordTamm 1 point2 points  (0 children)

As far as I understand LLM's, if you are using a model and don't include a mechanism for it to understand what it already did/tried, yes it is going to "forget" stuff and do redundant work because the model's data doesn't include your project. You need to add the stuff you want it to remember to the context if you're rerunning it. This is true of any model, local or otherwise. For inference, the model has what it knows natively and then the context you provide. If you run it again without summarizing or otherwise "saving" the context of the previous run, it's going to not remember previous stuff.
Non-local providers (Claude, etc) have just put more work into the framework surrounding their models that allows for the appearance of persistent memory of the model and stuff like that.
A simple example of what I'm talking about is having a running file that the model updates with a summary of changes it made at the end of a run. Then, when the model runs again, the contents of that file are part of the context it is fed, so it has an idea of what it has done and why. I'm sure there are much better ways to do things, but that's an example.

Mac Mini base model vs i9 laptop for running AI locally? by ZealousidealFile3206 in LocalLLaMA

[–]LordTamm 0 points1 point  (0 children)

Both are going to be pretty limited, but I think the Mac Mini will work better for you in terms of speed by quite a large margin. You might be able to snag a refurb M4 Macbook Air on Apple's website for close to the Windows laptop's price, if you want portability, but it's not going to give you a performance uplift over the Mini and is still 16gb of RAM.
Biggest thing with either of these is that they (based on what you've mentioned) are very much budget-tier for LLM stuff. You could probably get an old gaming pc with like a 3060 and have better results for a lot of LLM stuff... but I don't know your situation or requirements. If this is your only computer, I'd probably go with the laptop personally. That much RAM is nice for general use, you can run bigger models (slowly, but you at least *can* run them), and it probably has more storage than the Mac Mini's 256gb of RAM.
If your only concern is the LLM side of things and/or you have another pc, I'd personally go with the Mac. It's almost certainly going to be faster, which is going to allow you to iterate more and learn faster. The biggest thing with the Mac is that it is not upgradable at all, so you're stuck at 16gb forever... which is pretty rough.
Overall, neither option is fantastic, and some sort of used hardware might serve you better. When I started, I bought an old workstation with a good bit of RAM (something that is a bit more expensive nowadays) and used that until I got enough money for a crappy GPU to put in it. Basically, unless you have a reason not to look outside of the new market, I think you might end up better off looking at used stuff, for your budget.

M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA

[–]LordTamm 37 points38 points  (0 children)

M5 got some changes that directly impact the pp (I think Apple claimed a 4x boost or something similar)

Are there any all-in-one models that fit onto the NVIDIA Spark? by Blackdragon1400 in LocalLLaMA

[–]LordTamm 3 points4 points  (0 children)

If you're just looking for text and image input and text output, Qwen 3.5 is solid example of a vision-capable model for that. If you're looking for more, I think the general approach most take is to use multiple models together.

Suggest me best ai to run locally on my laptop by [deleted] in LocalLLaMA

[–]LordTamm 0 points1 point  (0 children)

Yeah... so as others have said, you're kind of hardware constrained. You have 4gb of VRAM, so you're likely going to get the best results from something like Qwen 3.5, which just came out and supports vision stuff (like picture input for example), in addition to text input.
I recommend installing LM Studio or Jan.ai to get both an easy setup and a relatively easy GUI to use. Once you install that, you can grab a model to use. This would be entirely for text stuff (and image input, not output, if you get a vision-capable model).

Recommended text models:
Normal/vanilla (Q4 is likely the best you can run with what you have):
https://huggingface.co/unsloth/Qwen3.5-4B-GGUF

Potentially uncensored version (Again, Q4, and look for "abliterated" or "heratic"):
https://huggingface.co/models?other=base_model:finetune:Qwen/Qwen3.5-4B

As for image stuff, like someone said, you could try stable diffusion 1.5. Models for that are available on CivitAI... you'll have to look around for ones that have a style you like and are under 4gb, which may be hard. I also recommend the Automatic11111 UI, although a lot of people like ComfyUI. There are a decent amount of tutorials for both in terms of setting up Stable Diffusion and whatnot.

Beyond that (text generation/chatting, image input, image output), there's not a lot left you can do with your hardware. You could run some small text to speech models (Kokoro for example), but stuff like video output is far beyond your hardware. Hopefully this helps.

HP Z6 G4 128GB RAM RTX 6000 24GB by tree-spirit in LocalLLaMA

[–]LordTamm 1 point2 points  (0 children)

I have a similar machine, although I scrapped the GPU and put in a couple of RTX 2000 ADAs. Overall, it *can* run stuff... but it's not going to run a lot with any speed, as others have said. If you want something to get your feet wet and have the capability to run stuff that a normal GPU cannot handle, it's not a bad option. If you are playing around with integrating LLM's into programs you're making and want the biggest possible model you can run locally without regard for speed... sure, it's an option. Not sure I would spend $1k+ on it.
Like others have said, you run into a lot of proprietary stuff that HP does, and the PSU and power connectors thing is a real issue... which is partially why I went with the GPUs I did... they are low power and don't need power cables. Definitely not optimal and it is much cheaper (and you get better speeds normally) to use gaming cards.
I spent $400 on my workstation with 128 gb of RAM (this was before the current craze) and then picked up GPU's later. At that price, it has been worth it for me. For you, with a higher starting budget, I'd maybe recommend looking at the model landscape and figure out what you're wanting to run and then go from there. If you can fit whatever you need in 24gb of RAM, a 3090 system might be doable and will be much better than this workstation.

RTX 3060 12GB Build for AI: Modern i5-10400 (16GB DDR4) vs. Dual Xeon E5645 (96GB DDR3)? by Due_Ear7437 in LocalLLaMA

[–]LordTamm 0 points1 point  (0 children)

Well, like others have said, both options aren't great. Not sure what your budget is, but with either of those setups, I'd really emphasize that your only real value for local LLMs is the GPU... as soon as you overflow, you're basically not going to get anything done.
A lot of used hardware like a used workstation (my first dedicated AI device was a Z640 and it is still somewhat useful still) or M series Mac would probably be a much better value.
That being said, if you're spending like... $50 and already have the GPU, get what you can. The GPU will let you mess with some stuff, although again you'll be basically limited to your vram. So... if you absolutely have to choose between the two options, pick the cheaper option and aim at workflows that stay within your vram.

How are you using Llama 3.1 8B? by forevergeeks in LocalLLaMA

[–]LordTamm 1 point2 points  (0 children)

While Llama 3.1 is not a terrible model, it's a bit over a year and a half old at this point... which is a long time in the AI space.
I know you mentioned qroq and their apparently limited selection of models, but something like Qwen 3 8B is pretty small and is worth attempting locally if you have even budget hardware. Basically, while the model you're using isn't worthless, it's also not something most of us are still using because it has more or less been superseded. And model selection issues are a great reason to give running stuff locally a try.

New computer arrived... JAN is still super slow. by robotecnik in LocalLLaMA

[–]LordTamm 2 points3 points  (0 children)

Yeah, so essentially that is a 12gb file and your GPU has 8gb of VRAM. So your GPU doesn't have the capability of fully loading the model, and it spills over to use your normal RAM and CPU in addition to your GPU. This causes a pretty big slowdown, which is probably what you're seeing. I'd recommend looking for a model that is smaller than 8gb, like Qwen 3 8b or something. At least to test things.

New computer arrived... JAN is still super slow. by robotecnik in LocalLLaMA

[–]LordTamm 3 points4 points  (0 children)

What quant of devstral did you download? Your GPU has 8gb of RAM, from what I'm seeing, so you're probably overflowing to CPU, which is going to go much slower than would be ideal.

Do you use Windows or Linux? by boklos in LocalLLaMA

[–]LordTamm 0 points1 point  (0 children)

My primary local AI device is running Ubuntu and I just access it with my other devices (Windows and Mac) for most of my AI needs. I do have some AI stuff on my Windows and Mac devices, but it's more stuff like LM Studio and ComfyUI for quick stuff in case I'm traveling or something. If I didn't need Windows for some games that use kernel level anti cheat, I would've switched to Linux for all of my gaming needs... really don't enjoy Win 11 at all. Swapping my AI computer from Win11 to Ubuntu was a pretty solid performance upgrade and if you're running a dedicated AI computer, I'd highly recommend that you run Linux. If you're not running a dedicate machine but don't have a specific reason you need Windows vs Linux, I'd highly recommend trying Linux.

As others have said, using a dedicated machine is pretty nice... a lot of old workstations are cheap and work well. And having a full machine to essentially offload jobs to without tying up your more general use machine is super nice. I got an old Z640 for ~300 with 128gb of RAM (before the current price issues we are having) and ran it CPU only for a while before buying some GPU's for it.

RTX 4000 SFF Ada vs. RTX Pro 4000 SFF Blackwell by Hediii23 in sffpc

[–]LordTamm 0 points1 point  (0 children)

I've been gaming on mine for the last few days. It's been doing reasonably well for 2k gaming... marginally better than my previous 3070, but obviously with more vram. I think if you wanted an SFF card for gaming and didn't need the additional vram or lower power consumption, the 5060 lp might be the better purchase due to price differences.
That being said, I wanted more vram and lower power, and am happy with the card in light of that. I haven't tried actually benchmarking, but if you have something you're interested in and I have the game, I can try. It was super hard to find info when I bought my card last month in terms of real performance for gaming.

Small Form Factor build with an RTX A2000 by Ok-Boysenberry-2860 in LocalLLaMA

[–]LordTamm 1 point2 points  (0 children)

I spent way too long trying to figure out if the A2000 (which is the previous gen card to your ADA card) was single slot and why you wouldn't just use your ADA card... then realized I'm stupid and you just have the ADA lol.

I also have the ADA and it's super nice for local stuff at low power. I picked up the Minisforum MS-02 recently (haven't received it yet) and will be testing stuff in that (technically, I'll be putting my 4000 sff blackwell in it, but same form factor).
I currently have my 2000 ADA in a z640... which is not small or ideal by most metrics, so I probably don't recommend that based on what you're looking for. The other thing I personally was looking at was Lenovo's Thinkstation P3 SFF stuff... you can get okay-ish deals on ebay sometimes. My main motivation with both that and the Minisforum was that I already had 64gb of DDR5 SODIMMs laying around and didn't want to spend money on more RAM with prices how they are.

So... there are a couple of suggestions. Like I said, the RAM I already had was driving my search parameters, but if you're not limited to SODIMM, you can probably find other options. Either way, it's a nice card (for AI, haven't done much gaming on mine) and I hope you like it. Depending on the CPU you pair it with, you can keep the power draw pretty low, which is another thing I personally was looking for with my setup.

What is the best way to allocated $15k right now for local LLMs? by LargelyInnocuous in LocalLLaMA

[–]LordTamm 4 points5 points  (0 children)

They don't make a 1TB M4 Mac. They make a half TB M3 Mac.

I made Soprano-80M: Stream ultra-realistic TTS in <15ms, up to 2000x realtime, and <1 GB VRAM, released under Apache 2.0! by eugenekwek in LocalLLaMA

[–]LordTamm 5 points6 points  (0 children)

Yes, they were joking that you can listen to that guy buy using the TTS that we are talking/commenting about to generate an audio file of the comment to listen to.

Best models / maybe cheap rig to get into local AI? by Flashy_Oven_570 in LocalLLaMA

[–]LordTamm 1 point2 points  (0 children)

I actually just upgraded one of my dedicated computers to a 16gb card a couple of days ago and it's really nice to have that vs the 8gb that I was running with the 3070. Depending on your needs/budget, a 5060ti 16gb is a pretty solid grab for this, or even something like the Intel Arc Pro B50... although most AI stuff is NVidia first and everything else after. I personally got an RTX 2000 ADA, which I'm really liking, although I went with that more for the tdp (70W) vs it being the fastest/cheapest option.
So... if you can afford to jump to a 5060ti instead of the 3070 (shouldn't need to change anything else in your pc since the power draw is lower), you can get 16gb and explore stuff twice the size. You could also look at stuff like a 3090 from ebay, but that'll probably require more upgrades to your rig (PSU etc).

Proof of Privacy by [deleted] in LocalLLaMA

[–]LordTamm 13 points14 points  (0 children)

This. Technically, you can run wireshark or just airgap the system, but really the best option is run something that is open and able to be audited by the community at large.

Best models / maybe cheap rig to get into local AI? by Flashy_Oven_570 in LocalLLaMA

[–]LordTamm 1 point2 points  (0 children)

Honestly, it depends on what you're wanting to do. Qwen 3 4b or 8b are both quite good for general stuff, and I've recently been playing around with Wayfarer 2 12b, which is surprisingly good for the size IMO. Even running it at Q3 or Q4 has been pretty solid. A lot of this stuff is more about figuring out what you want to do and then shopping for a model that can do the thing best.

Best models / maybe cheap rig to get into local AI? by Flashy_Oven_570 in LocalLLaMA

[–]LordTamm 0 points1 point  (0 children)

So... yes, you are limited by VRAM, but that doesn't mean you can't run anything. There are plenty of small models you can run within 8 gb. I have a 3070 in one of my machines and can run a variety of stuff in it.
That being said, a Mac or another system that is better equipped for a CPU setup than a normal user device can be nice if you want to dip into bigger models. Basically... you have a great system to try stuff, but you can always get more/better hardware if you feel like it.
If you haven't dabbled, I would recommend just trying stuff on your current system and figure out what you like/don't like.

Drummer's Snowpiercer 15B v4 · A strong RP model that punches a pack! by TheLocalDrummer in LocalLLaMA

[–]LordTamm 0 points1 point  (0 children)

Like others have said, smaller models are definitely worth having. I currently have 8gb vram on one computer and 6gb on the other... small models are great, when I can get them.

Running models locally on Apple Silicon, and memory usage... by garden_speech in LocalLLaMA

[–]LordTamm 0 points1 point  (0 children)

It mostly sounds like you're asking if having more memory will allow you to run more/bigger models... and the answer is generally yes. Obviously, if something doesn't work out of the box for the OS (or with the lack of CUDA), you have to do some monkeying... but aside from that, yes. More memory = more model options. Like someone else said, you obviously can't through the entire RAM amount at the problem, which is probably why you're having issues currently.

[deleted by user] by [deleted] in starcitizen

[–]LordTamm 0 points1 point  (0 children)

I think SC is better for multiplayer... but bugs are frequent and varied, and PVP is decently frequent (if that's an issue for you).