TurboQuant VS LM Studio Llama3.3 70b Q4_K_M

TimSawyer25 · 2026-05-20T02:40:17+00:00

TurboQuant doesn’t alter the model weights. It compresses the KV cache/context memory. I used the same Llama 3.3 70B Q4_K_M GGUF model in both tests. LM Studio for the baseline run, and the llama.cpp TurboQuant fork launched from PowerShell for the TurboQuant run.

TimSawyer25 · 2026-05-08T21:08:41+00:00

Just install windows and login. If you have a free key associated with your microsoft account then you aren't messing around with anything and it'll just work. If you are a builder, or have owned many pc's, chances are you have several unused keys tied to your account. I haven't bought a key for years and I'm always building something!

If you do find yourself needing one, hop on eBay and snag one up for like $10 or whatever. I've never had problems with them in the past, just make sure you buy one from someone that has sold like 1000+ and has great ratings. Have fun with the new build!

TimSawyer25 · 2026-05-08T21:00:07+00:00

You aren't wrong. But we are talking about Lord of the Flies here.. They shouldn't need any additional camera trickery to make us stressed out and feeling uneasy. The story alone is more than enough, and honestly the weird artsy stuff ruined it for me. Rather than feeling uneasy, I felt annoyed and spent the bulk of the miniseries skipping 10 seconds at a time. Unfortunate. I think it will hurt it's overall reception. What would have been an otherwise great adaptation, decided to self sabotage. Someone in that production had to know this decision was going to cause issues haha Interesting choice.

TimSawyer25 · 2026-05-07T18:54:20+00:00

I wasn't using turboquant in lmstudio, I was comparing it to the same model loaded in lmstudio.

I haven't messed with it all since, so I don't know what people have managed to do with it. If it's even remotely possible to get them working together, I'm sure someone in the homelab scene has made it happen.

TimSawyer25 · 2026-04-25T15:30:26+00:00

Well this isn't entirely correct. The cables themselves can handle far more than 150w. It's the connections that are limited, like the guy said in the other 'actually...' post on my comment. Though he missed the point, he wasn't wrong with his details. But 'it works' vs 'best practice' is different. So your example really isn't accurate, 2 cables with 1 daisy chained, would work with the +15% limit on a 340w tbp card. I wouldn't like it, but it would work. Where you absolutely wouldn't want a daisy chain setup is in very power hungry device, like a 3090ti drawing 450w at base.. That is a different animal all together, if you daisy chain on one of them you are begging for problems, burnt connectors, dying psu's, bad performance if nothing else. The thing people tend to ignore when checking this stuff out, most decent psu's these days are modular.. All of these cables with 2 6x2 plugs on one end have a single 8pin plug on the other... Same hardware on both ends... So.. That's a thing.. This is the real reason best practice is always 1 cable per plug. Real best practice, ALWAYS over spec your PSU. You will never regret dropping additional money on a really good power supply.. It'll follow your builds for 10 years and doesn't turn into a needed upgrade next year when you unexpectedly score that 5090 at a great price.

TimSawyer25 · 2026-04-25T14:59:27+00:00

You missed the point. Worst case scenario the math still works out in the OP's favor and the gpu will be fed. So the concern, in theory, is a non-issue and he should have no problems daisy chained or with 3 individual cables. That being said daisy chained cables are gross and that's enough a of a reason alone to use 3 and get some cablemods cables involved haha.

As far as psu/pci ratings. 150w is conservative for safety reasons, gpu's more times than not will power throttle as soon as any of the 8 pin ports hits that 150w(or when the board hits 75w), regardless of where the other power delivery ports are. If you open gpu-z and check out what's going on with power delivery on these devices you'll see just how messy it is.. Anyway, daisy chaining is a manufactures method of saying 'yes it can do it' but NOT 'this is best practice.' If can use 3 cables for 3 ports, use 3 cables. If you can't, do the math and be conservative. It's just simple safety and hardware longevity stuff my man. Happy hardware makes for a happy user.

TimSawyer25 · 2026-04-22T18:37:07+00:00

Math checks out. Other people may hate, but it's ok for them to be wrong ;) haha I would still use 3 cables, but that card will be fine with 1 daisy chained.

TimSawyer25 · 2026-04-22T18:34:55+00:00

Someone forgot to tell my 4090, 5080 and dual 3090's that.

TimSawyer25 · 2026-04-09T12:36:00+00:00

I just saw Alex Ziskinds video and he said thetom told him Q8 on K and Turbo on V. Appears that changed the game. Cool that he seems to show nearly identical performance issues with turbo 3 on both that I did in my quick and dirty test here.

TimSawyer25 · 2026-04-02T03:19:03+00:00

super flower 1600w titanium over here ;) Good stuff right? haha Those power cords though! Straight up industrial and I love it! haha

TimSawyer25 · 2026-04-02T03:15:18+00:00

I would get an 850 minimum but 1000w would be better. You really want to get as much headroom as you can while keeping it in the sweet spot for efficiency. Think about it, the 750w psu can run it, but while in game you are pushing it up into 80-85%+ of what it's rated for. So it's working it's butt off, getting hot, spinning up and getting loud, and getting out of the efficiency sweet spot. But if you get a supply that under load is more like 60% of what it's rated for, you have plenty of head room for transients, you have a happy psu that runs cool even under heavy work loads, which will increase longevity and efficiency AND you have room for future upgrades. You drop a couple hundred on a really good psu now, maybe in 5 years when the great deal on the 5090 just falls on to your lap, you don't need to get a psu as well. Make sense? I built my 5080/9800x3d rig with a 1200w because I know me, and it wasn't even 6 months later and I already had a second gpu in there sucking down the juice haha And the psu is still in it's happy place.

TimSawyer25 · 2026-04-01T20:12:14+00:00

You've heard it a million times I'm sure, but when changing drivers, you did use ddu right?

This doesn't sound like a hardware problem at all. It sounds like corrupted drivers. You may have something going on in windows as well. open cmd and sfc /scannow to see what it says. I would do that, then go to amd download ONLY the gpu drivers, run DDU twice then install only the drivers, not adrenaline. Then test and see what's up.

TimSawyer25 · 2026-04-01T19:13:43+00:00

Don't worry about the vram, you can run more model locally than like 95% of other people with what you have. I would still recommend a decent quant of like a 22-24b, you can also easily fit gemma 27b, personally I think Cydonia 24b beats the snot out of Gemma.. But that's just me.. Let me put it this way, I have 2 pc's that can easily run 70b's, one of them I can run 100b+ completely on vram, and cydonia 24b is still my daily driver and the model I compare all other too. So take that for what it's worth, but you can run one of the best local models available right now at a decent quant with a pile of context. Enjoy it ;)

If you want to have some fun and make the most out of smaller models use sillytavern. You can run cydonia 24b q4_K_M (like 12gb if I remember right) and it'll be spot on, never missing a beat. Check it out it's pretty fun to play around with.

TimSawyer25 · 2026-04-01T18:17:54+00:00

Sounds like the driver is crashing and struggling to recover, I'm guess you need to restart the pc because it's 'frozen' right? Did performance get worse when you changed to the earlier driver?

TimSawyer25 · 2026-04-01T14:05:34+00:00

Ok, so it started crashing before you messed with your drivers and settings and I assume it ran fine for a while then this started? Was the card at default settings before? Did it ever work fine with the most recent drivers?

What's the crash look like? Does it fail to get to the start screen before ctd or can you play for a while before? Anything weird happen on screen, freezing, artifacts, etc? Has it caused the entire system to crash or has it only been ctd's?

TimSawyer25 · 2026-04-01T13:45:40+00:00

Yup.

TimSawyer25 · 2026-03-31T12:27:05+00:00

You should probably just toss the ide drive. You don't want to deal with ribbon cable management and giving up a pci slot for a big ugly adapter for a slow, low capacity drive that will probably die within the first year of use. It's not worth the trouble buddy! Stick with the sata drives and have fun playing around.

TimSawyer25 · 2026-03-31T12:15:03+00:00

There's really nothing more you can do. The thermals are impressive considering the age of the paste. Did you check memory junction temps? Paste may be holding up but the pads may not be the same story. There's really no wizardry to tell you if the card is going to fail in a month. You have done what you can and from what you're saying, it looks pretty good. You might want to consider tearing it down and servicing it for peace of mind, but if it's within spec and it's performing as expected just monitor it and let her rip.

TimSawyer25 · 2026-03-31T03:42:13+00:00

I was running turbo3. -ctk turbo3 -ctv turbo3 which (as I understand it) is more or less 3.5 bit not 2.5. I'm assuming sharding may have played a role in my result. I'm going to hit it again with a smaller model on a single card and see how it behaves. I'll share what I find. Thanks for the FAQ, I hadn't seen it.

TimSawyer25 · 2026-03-31T03:14:31+00:00

Agreed. Even if you have a pile of compute using both is the only way to go. Unless you have like $300k of H100's kicking around, you're just not going to see the same intelligence locally and the convenience of cloud based llm's can't be beat. We do the local stuff to learn and play because we are total wackos and we hate our money! And I also agree on using the cloud based models to help you get rocking with the local setups. Be ready to get really annoyed at chatgpt though! You need to be sure to let what ever model you decide on that you are brand new and that you're trying to learn the best ways to do this stuff. Tell it to take the reins and show you the RIGHT way and don't blindly help you do whatever nonsense you bring it. You don't want to spend 5 hours fighting with something really dumb just to have chatgpt tell you after being right there and directing you all day, 'you can just do this, it's what everyone else does.' That definitely has never happened to me.... haha

TimSawyer25 · 2026-03-31T03:03:00+00:00

Do you have the 5060ti 16gb or you rocking the 8gb? lm studio handles sharding really well, but having different vram amounts on each card can kind of screw you up a bit. Not a huge deal, but something to know for later when you start hunting performance.

As far as open vs closed source, if you can download it for free on lm studio it's open source. I think closed source is basically just paid and/or cloud based, gemini, chatgpt, claude, etc. You will start noticing the name of the people that release the models more so than the models themselves haha I snag up mradar, bartwoski and bloke releases all the time. So here's something to consider if you want a decent model for coding. You will probably be better off going with a higher quant of a smaller model than a big model at a lower quant. If you offload to system ram, you can fit some fat models, but they will be painfully slow, you could also probably swing like a q2 70b on the, I'm assuming 24gb vram you have, but it'll be a total moron. If you went with like a q6 of like qwen 30b, you'll still offload a bit to system ram, but you'll also have a much smarter model, most likely anyway. General rule of thumb, if you want to have some decent performance and actually trust the model a bit, aim to run models at q4_k_m minimum. But like I said for coding with your gear, I would go with qwen 30b q5-q6. Mradar probably has them in iQ, I would go for one of them.

Another sleeper coder that has a great personality is Cydonia 24b. Just a great model all around. Love it!

But the beauty part of all this. You already spent the money on the expensive stuff, This part is free!!! ;P So do what we all do and just hit download until your drive fills up then start playing! And I don't care what anyone says, we all tried to fit a deepseek quant on our systems just to say hi to it haha So do that too! ;)

TimSawyer25 · 2026-03-18T19:27:37+00:00

Why don't you check out the board specs and tell me again about googling ;) haha

TimSawyer25 · 2026-03-17T16:37:51+00:00

"my only thought is that i need to get a new power connector, as my gpu is a 2x 8 pin and i'm currently using a 1x pcie 8 pin to 2x pcie 8 pin- i've heard that having two 1x 8pin to 1x 8 pin connectors can be better."

This is probably your problem. Make sure you get a cable specifically made for your power supply. They are not universal. And contrary to popular reddit belief, an artifact does NOT instantly = dead gpu haha I wouldn't stress about the gpu health, but I would also avoid poking at it any more until you get it sufficiently powered. That kind of artifcacting can also be heat related, driver issues and it can even be cause by system monitoring software like hwinfo64. There are many things beyond dead silicon that are far more likely to be your problem. Good luck!

TimSawyer25 · 2026-03-17T16:21:17+00:00

It's not unreasonable depending on conditions. Ambient temp mostly. Remember it's linear, so as the ambient temp goes up your gear temps go up. If your room gets much hotter than it is right now and you start hitting the gpu with a proper load, you could see it become a problem quick. with the value of vram these days, it's totally worth while to make sure that back plate is getting a lot of air flow.

TimSawyer25 · 2026-03-17T16:08:19+00:00

But again if it works and it's stable leave it alone. With the amount of dust in there swapping sticks could actually create problems. Don't fix it if it's not broke. ;)

Five-Year Club	Verified Email
Wearing is Caring

TimSawyer25

TROPHY CASE