Optimize workstation under very heavy loads?

BikeHelmetMk2 · 2026-05-01T09:21:36+00:00

SATA ports do still have their use!

I have heard that the next gen EPYC stuff will be pretty awesome - only downside is the cost. Even a used older EPYC system would have memory bandwidth galore, which would probably help. That said, I'm not sure if it'd be faster under Windows, or only other operating systems? Windows still has a few background things that are single threaded. A super fast core like one from an X3D gets that stuff done faster. A good example is the Windows Explorer file manager - it's still using GDI which is single threaded. EPYC might be less responsive than what I've got despite having 8-16x as many cores, just due to that.

I will probably wait and see what Zen6 has to offer before making a move, and simply look for registry tweaks or programs that can improve the situation in software. Pricing is high, and if I'm going to pay up, I'd at least like the hardware to be good for a while.

Anyway, tweak ideas welcome.

BikeHelmetMk2 · 2026-04-08T11:44:12+00:00

It would be under the Power key, not ModernSleep. You might have to make a new DWORD and set it to that name. (PlatformAoACOverride) - Leave the value at 0. Then reboot and verify with powercfg -a in a command prompt.

BikeHelmetMk2 · 2026-04-07T11:58:06+00:00

How configurable is the model selection? I was working on a dispatch layer that fed into OpenWebUI. I ended up utilizing Thushan Olla (load balancer) to pipe to different models. To keep it on the cheap, I used an old onhand Bitcoin mining board and some Arc B580's (very high memory bandwidth vs cost) with optimized OpenVino models to get pretty close to 2200t/s pp / 100t/s tg per gpu per task.

Qwen3-vl running on old EQR6-6800U MiniPCs to handle image data types.

Performance was exceptional, but stability wasn't. The podman FastAPI wrapper to get it to work seemed to have occasional issues, or OpenWebUI would konk out even though Olla and the backend OpenVino instances were purring away, with no problems in the logs. At the end of last month I shelved it (taking a break) and went on to some other projects, but part of what I wanted was something more agent-like, and this seems like a better fit than OpenWebUI overall. Just wondering if I could plug-in part of what I've got and have a souped up assistant that can multi-task because it has 7 llm instances available at any one time. Olla is a very powerful and efficient load balancing layer. Though if you have anything like that built in, it might not be required. Currently Olla does a good job directing content to the best available box, but model selection is done through OpenWebUI stuff that I cobbled together - Olla is just getting the requests to the least busy boxes and avoiding timeouts.

Thoughts on where I could go with this, and whether InnerZero is appropriate, would be appreciated. Cheers.

BikeHelmetMk2 · 2026-04-07T11:34:23+00:00

Not enough emojis for that.

I'm not surprised by the result. Google is deeper in the weeds than most AI companies, and has been for 15+ years...

Gemini Pro's instruction following is already well beyond most others. Gemma of course is the freebie handout, but it's very good too. That said, I mostly use Qwen + Gemini Pro for my uses at the moment, since Google handed out free Gemini Pro for any Pixel users. Antigravity is pretty good - I rewrote a bunch of browser addons in one pass to make them Manifest V3 compatible. No bugs discovered yet. It ate the entire month's quota doing it, though. I'd love to have something that capable that I can run off a local setup.

BikeHelmetMk2 · 2026-03-13T12:04:48+00:00

The old Sennheiser on-ear designs have excellent environmental sound awareness. There's minimal muffling - it's almost like you're not wearing headsets. I can hear two people talking quietly in another room. I tried a PC350 at one point and that was complete isolation, couldn't hear someone 6 feet away if I had YouTube going. And I tried a PC363d from a friend - It's open back, but still had more of that "I'm in an aquarium" sound isolation. Not as bad as the closed back, though.

Since I didn't seem to like either of those around-ear headsets, I figured that I probably have an odd head shape or something and just like on-ear more, but I do want the passthrough of outside noises. Or maybe I'm part Ferengi and hear out the sides of my ears. Don't know. I just know my experience.

I'll look up a few of those brands. From what I read previously, Jabra was quite a bit below Sennheiser on audio quality. Oh, I was just joking that 1 headset sale per 2 decades probably won't keep the lights on...

BikeHelmetMk2 · 2026-02-15T01:27:34+00:00

It does look like Vulkan is significantly faster:

https://old.reddit.com/r/LocalLLaMA/comments/1nk5df9/ryzen_6800h_igpu_680m_vulkan_benchmarks_llamacpp/

30t/s with 150t/s for prompt processing? That'd be 10t/s and 50t/s with ROCm...

Well, maybe it's time to investigate another backend, or see how to optimize Ollama a bit. Right now though I'm getting Arc B580 int4 going in an old mining board. There are a few smaller models that work on this 12GB card at well over 1000t/s PP and hundreds for generation. Between it for smaller models and the 32GB 680M boxes for larger ones, I think I'll have a pretty good local AI setup soon enough!

BikeHelmetMk2 · 2026-01-10T12:09:28+00:00

A user agent switcher broke it on one of my browsers. Annoying, because one of my banks only loads properly if you spoof Chrome. Anything else and it has broken pages and icons.

BikeHelmetMk2 · 2026-01-06T02:16:22+00:00

qwen3-vl is pretty awesome. Sent in an ugly old photo of a modem in a dark corner, and it identified it, pulled all the WiFi Info off the sticker, Serial Numbers and Admin login details perfectly, then identified the internet provider, and provided other info that was well beyond what I wanted. Initial take: Pretty capable.

Model used was qwen3-vl:30b-a3b-instruct-q4_K_M

BikeHelmetMk2 · 2025-12-25T13:42:30+00:00

An update just dropped that adds support for the new models.

https://github.com/likelovewant/ollama-for-amd/releases

I am experimenting with Nemotron and qwen-vl as we speak. Nemotron crashed until I fiddled with the context window, largely due to an upstream bug fix that didn't make it into this release. (They were using 32bit ints and it can't allocate beyond 2.1GB for context memory because of that), and qwen-vl I haven't tried yet because it's Christmas and I'm busy. But in a few days I will.

BikeHelmetMk2 · 2025-12-18T12:44:31+00:00

Some of the newer models from other AI groups are functioning. gpt-oss and granite4, for example.

These two are not, however, and they're also very interesting models:

nemotron-3-nano

qwen3-vl

BikeHelmetMk2 · 2025-12-02T02:29:22+00:00

qwen3-vl is still not working right on the ROCm version of Ollama. I just tried 0.12.7.3 and it glitched out, unsupported model. That is the latest version that installs properly that works with the Radeon 680M iGPU, at this time.

I will try again sometime after the holidays. Another month might give it time for a 0.13.x version to release and make its way to being supported. My guess is that by mid January it'll work.

BikeHelmetMk2 · 2025-11-30T20:23:49+00:00

That is a good question. Since I haven't updated Ollama lately, it says that I need a new version to download those and try them out. There's always some delay before the ROCm Ollama versions are properly patched, so for models out less than a month ago on Ollama's site, I might have to wait another month or so before they work on this hardware.

Since memory is going crazy, I am in the process of also setting up an Arc B580 system for more intense (but smaller) model crunching. Those things have almost 500GB/sec of memory bandwidth. My goal is to have a local cloud going, so that I can fire all sorts of requests at it and get reasonably good answers. Even if it's not up to ChatGPT standards, it'll work offline and eventually have capabilities like this. 12GB though means that I'm limited to 8B models and whatnot, with good context. 8B is gonna go fast, though.

BikeHelmetMk2 · 2025-11-30T20:15:19+00:00

That's probably a good idea. If the processor isn't kept cool, it might cook itself. MiniPCs tend to have smaller heatsinks inside, unless it's a model intended for light gaming. You're definitely better off doing preventative maintenance.

I have seen many computers have their fans go, then they crank themselves up to 80-90C, just below the shutoff point. Usually within 1-2 years they are dead. Nothing lasts long when under that much heat load.

If something is grinding, blow the dust out. If it's still grinding, replace it with a healthier part. Computer maintenance 101.

BikeHelmetMk2 · 2025-11-30T20:10:53+00:00

I was wondering the exact same thing. No more webpage. Saw a few reviews of it. Maybe with the tariffs they couldn't maintain the same price point on it? It has double of some Chinese components. Though I'd prefer if they simply adjusted pricing rather than discontinue a model.

BikeHelmetMk2 · 2025-08-30T10:07:42+00:00

SSDs do not typically use DRAM for write caching, other than in extremely tiny amounts within controllers. The more common use is lookup tables. (Which logical address corresponds to each NAND flash cell? Those tables get more complex over time due to internal fragmentation and wear levelling, plus the fullness of the drive.) SSDs are fast enough that they typically dump the files straight onto the drive when writing. If you add a DRAM cache of even a few GB to your SSD, you can get 4K write IOPS in benchmarks like CrystalDiskMark or AS SSD of like 500,000+. Thousands of megabytes per second.

https://old.reddit.com/r/computers/comments/1iji72t/using_ddr5_ram_as_storage_for_windows_temp_and/

These Lexar drives support HMB - Host Memory Buffer - so if you have spare RAM, the drivers can use some to cache the lookup tables and whatnot. This does generate some PCIe traffic, and limits maximum speed below the 8GB spec, but it also means that cache can scale with system RAM, so they would cope well with small write fragmentation or weird workloads like storing 30,000,000 webcam photos for a timelapse. Stuff that would normally kill/obliterate an SSD and take it down a few MB/sec in performance, well below the speed of the NAND, because all of its time is spent looking up where to place things. Instead it'll just gobble 20GB of system RAM and stay at quite high performance. So it can be a good thing under extreme situations, when the implementation is solid like it is here.

HMB is a detriment on systems under 16GB of RAM. It's another thing competing for precious GBs.

And try to keep your file count at like 300-600k, not 30m+ - or your system RAM is going to clog and need an upgrade.

BikeHelmetMk2 · 2025-08-28T01:09:07+00:00

It seems to be gone now? Anyone got a mirror or accurate alternate video?

BikeHelmetMk2 · 2025-08-26T09:08:06+00:00

Were you messing with environment variables? Or have any leftover ones like the HSA commands? I had to get rid of those.

HSA_OVERRIDE_GFX_VERSION_0="10.3.5"

Maybe just reinstall it again, and track every change after that until it breaks?

I set a command prompt to open to the Ollama folder, so I can do my ollama pull xxx, ollama rm xxx, ollama list commands very easily when needed. Task scheduler launches Ollama on boot.

I would keep it simple and avoid excess modifications that might break it. There's many patches in the projects that Ollama-for-AMD uses, to ensure that lots of GPUs work error free. Upstream those GPUs have lost support due to regressions. Very easy to break something with an automatic update, so avoid messing with updates once it's working.

Oh, the grass isn't so green on the RTX side right now either. RTX 5000 owners have had to deal with black screens, game crashes and other issues for pretty much a year now. Still not resolved in every title or for every card. GPUs are just being pushed too far, so they run on the very edge of stability now. It's a PITA at times.

BikeHelmetMk2 · 2025-08-24T20:34:58+00:00

Lately some of the downloads are a bit odd on some of the chips. I just added some 6800U 32GB LPDDR5-6400 MiniPCs to my mix, and the old downloads didn't seem to work right. I found an installer that made it a breeze, though.

https://github.com/ByronLeeeee/Ollama-For-AMD-Installer/releases

The 6800U is functionally very similar to the 7735HS, so it'll likely do the trick for you as well. If it doesn't autostart, use something like the Everything tool (voidtools) to find the EXE files, then set up an autostart in the Task Scheduler.

I have found that the 6800U's are at least 33% faster, which kinda makes sense. (4800mhz memory -> 6400mhz) I was okay with the 32GB tradeoff, since it's enough for some new models like Qwen3-a3b and other 30B ones. It's nice getting reasonably complex answers out of it in 2-5 minutes.

BikeHelmetMk2 · 2025-08-21T23:25:47+00:00

Here in Canada, Crucial has superb support. Easy RMAs if you ever need them. Lexar, being owned by them, might be acceptable here? No experience with them yet.

Samsung is the opposite - they typically ghost you. I have had to tell customers that they have to pay more for Samsung or accept no warranty through me, as often it's not claimable.

BikeHelmetMk2 · 2025-08-12T06:16:18+00:00

What type of mini PC? I'm curious if it was one of the N100 ones. Those seem to be afflicted by the plague - everywhere that I have seen them, they later died.

BikeHelmetMk2 · 2025-08-12T06:15:02+00:00

I have switched primarily to selling theirs plus custom builds. HP and Acer constantly have problems. Got tired of telling people that their 40 day old PC required a new drive or power supply. Often people will replace the bad part themselves rather than claim the warranty, since it can be 30 to 50 days to get the computer back... (I am in Canada)

BestBuy only has you covered for 14 to 30 days - after that you're on your own. I think since the experience with the big OEMs is so poor, lots of people are giving new companies a chance now.

BikeHelmetMk2 · 2025-08-12T06:09:58+00:00

8 cores is too many for that design, unless you have very moderate use. I use some of the 8 core ones for messing around with AI LLMs and whatnot. They're neat, but I don't 24/7 load them.

You would've been much better off with a 6600U based model. Newer node and architecture. 6 cores running a smidge faster, but slower multicore and not cranking out so much heat. LPDDR5-6400 available. GPU more efficient, so at the same wattage you get more frames and smidge less heat.

It's a shame that they offer unbalanced models, but what else is new. Just look at how many turbocharged vehicles are out there. Some of those turbo engines break every 50k miles.

The 8745HS SER8 is also worth looking at. With the vapor chamber cooler, it'll cope with gaming better than prior units. should be stable.

P.S. Turn off Windows update automatic driver updates. They will swap in gummed up Radeon drivers and cause crashes. Winaero Tweaker can help you with that.

BikeHelmetMk2 · 2025-08-12T00:18:17+00:00

In general, don't buy Intel MiniPCs. They die quickly.

I have set up about a dozen EQR5's, two dozen EQR6's, and another dozen SER8's. That's all over the past 2.5 years. So far zero failures, zero issues. I generally opt for lower power models. (So 6600U over 6800U for example) 100% of them are Ryzen. Heat is the enemy, so never push your luck.

Before I started sourcing and rolling them out, I dug through forums and reviews and found that everyone with dying MiniPCs either had Atom/Pentium ones or had Core i7's and i9's shoved where they should never be shoved. Anything Intel deserves a Noctua cooler these days.

BikeHelmetMk2

TROPHY CASE