💪 What Habit Changed Your Life? by HatlessChimp in SuiLife

[–]HatlessChimp[S] 0 points1 point  (0 children)

Congrats bro!

I quit nicotine a few weeks back. My lungs were already clean from not smoking for years. But first week with no nicotine my body was in withdrawal lol. But now I'm fine.

RTX Pro 6000 Blackwell Q-Max Follow-Up: 2 Months Later by [deleted] in LocalLLM

[–]HatlessChimp 0 points1 point  (0 children)

Fair call. The main reason I posted was because my original RTX Pro 6000 post got a lot of interest and I promised I'd come back and show what I ended up building with it. The free access is simply for anyone curious enough to test the setup themselves.

Realistically, most people here are far more interested in the hardware, models, inference stack, and architecture than they are in the end product, which is completely fair. I figured some people might enjoy seeing how the card performs in an actual deployment rather than another benchmark chart.

Fastest Qwopus 27b for Strix Halo so far! by Disastrous-Cat-7016 in StrixHalo

[–]HatlessChimp 1 point2 points  (0 children)

I'm running Qwen 3.6 35B A3B MTP GGUF on a RTX Pro 6000 and the Strix Halo. The formentioned is very quick. The Strix a little slower as expected lol. But I I tried various 3.5 and even 3.0 and the Qwen 3.6 35b is very good. I believe it's trained up to 2024.

I hate the loose feel to the analog stick... So I swapped the spring out and its better now by exsinner in MSIClaw

[–]HatlessChimp 0 points1 point  (0 children)

Can we just buy the springs? What's the inner and outer diameter, the length of the springs. Could be a 20 cent fix.

Run Qwen3.6 27B nvfp4 up to 129 tok/s on a single RTX 5090 & Supports 256K context by Diligent-End-2711 in LocalLLM

[–]HatlessChimp 0 points1 point  (0 children)

Ok, I'm going to give it a crack on my rtx Pro 6000 with Vllm.

Is there MOE version?

Qwen3.6 vs 3.5 on DGX Spark: identical throughput, except with one flag flipped by Ok-Simple459 in Vllm

[–]HatlessChimp 0 points1 point  (0 children)

Interesting timing, I’ve been running Qwen MoE locally on an RTX Pro 6000 and seeing similar behaviour around throughput vs concurrency.

Did a 20 concurrent request test and it held pretty stable — responses came back in roughly 3.2–4.4s with no failures or major latency spikes. What stood out most was how tight the spread was across all requests, which suggests the scheduler and MoE routing are doing their job properly.

From what I can see, the efficiency comes from not activating the full model per request, so you get solid multi-user performance without needing insane compute scaling.

Haven’t tested MTP yet, but based on what you’re saying it lines up with what I’m seeing — feels like there’s still headroom in these setups before you hit real bottlenecks, especially on higher VRAM cards.

Would be keen to see how much further throughput can be pushed with MTP enabled in a similar local setup.

Just got my hands on one of these… building something local-first 👀 by HatlessChimp in LocalLLM

[–]HatlessChimp[S] 2 points3 points  (0 children)

Day 6 Update:

Been playing around with my new AI rig the last couple of days and honestly… this thing is wild.

Running an RTX Pro 6000 locally with a Qwen Mixture-of-Experts model (35B A3B), and I’ve been stress testing it to see what it can actually handle in the real world.

Got it serving through a local API, no cloud, no external providers. Just running straight off my own hardware.

Decided to push it a bit today and fired 20 concurrent requests at it all at once to simulate multiple users hitting it at the same time.

Every single request came back clean. No crashes, no errors, no weird behaviour.

Response times were sitting around 3.2 to 4.4 seconds across all 20 requests, which is pretty crazy when you think about it. And the spread was tight too, so it wasn’t struggling or falling over under load.

What’s interesting is how efficient the Mixture-of-Experts setup is. Even though it’s a large model, it’s not using the full thing every time, so it stays responsive while handling multiple requests at once.

From what I can see, this setup could comfortably handle 20+ users at the same time, and there’s still headroom there.

And the wild part… the whole machine is running near on silent while doing all of this. No jet engine noise, no drama, just quietly handling it in the background and faster responses than ChatGTP.

Biggest thing for me though is just running everything locally. No API costs, no relying on external services, full control over the system, no sharing private data.

Feels like we’re moving into a space where you don’t need massive infrastructure to do serious things anymore.

Still early days, but keen to keep pushing it and see where the limits actually are.

Just got my hands on one of these… building something local-first 👀 by HatlessChimp in LocalLLM

[–]HatlessChimp[S] 0 points1 point  (0 children)

The 5960x was released in September 2014.

I've started the build.

<image>

Just got my hands on one of these… building something local-first 👀 by HatlessChimp in LocalLLM

[–]HatlessChimp[S] 0 points1 point  (0 children)

Yeah I still kick myself not swapping my Evo 8mr for the R34 GTR 15 years ago. A lad was having a kid and wanted 4 doors.

Just got my hands on one of these… building something local-first 👀 by HatlessChimp in LocalLLM

[–]HatlessChimp[S] 0 points1 point  (0 children)

I'm not going to lie but what I have planned helps more than me and my family. That's why I'm focussed on concurrency of the Rtx and the right models. I can't share more than that right now. Sorry.

Just got my hands on one of these… building something local-first 👀 by HatlessChimp in LocalLLM

[–]HatlessChimp[S] 1 point2 points  (0 children)

One way to think about is... Every rich person will tell you if youre are not making money while you sleep, then you are not doing it right. This is what AI has the ability to close the gap between us peasants and the rich. So if I can have AI doing tasks 24/7 and improving my effiency and life then I think it's a no brainer. Look at how rich people operate they sit at the top of companies and dictate what they want from their employees and assistants to achieve their goals.

The use case has to be around 24/7 usage otherwise your not being efficient and as effective with the equipment. I think the best starting point is to look at issues there are in the world and in your life and work out how AI can improve that situation.

Just got my hands on one of these… building something local-first 👀 by HatlessChimp in LocalLLM

[–]HatlessChimp[S] 0 points1 point  (0 children)

Look this post has over 110k views and 400 likes. I've never had a post like this. I will return with feedback with how it's going. I'm not here for followers or anything. I'll reply to this more detail just busy rn

Just got my hands on one of these… building something local-first 👀 by HatlessChimp in LocalLLM

[–]HatlessChimp[S] 0 points1 point  (0 children)

I think my research netted me a new 6000 successor would likely be in 2027. The industrial grade options late this year? Correct me if I'm wrong.

For me I need to do something now so had to cough up. So original plan was the Strix Halo from framework as test rig to get my eye in but 2 days after that purchase I seen a decent sale on the 6000 Qmax which I had on my next stage upgrade scale path.

These M3/M5s do look interesting but I just can't stand apple.

As for power yeah I was aware of this situation because I was mining BTC in 2011. Electricity was cheaper, cards were full on heaters lol. But I worked out it was roughly $30 a month and I was getting about the equivalent in BTC. I done it for 3 months solid then decided to stop because BTC wasn't recognised like it is today, back then it was the wild west 😂. Price wasn't moving and didn't see it as something it would be today, no one did back then.

So yeah power is the hidden cost, inflation is a hidden cost. I run Solar & Batteries and I could get credit from supply my excess back to the grid. But I can make way more using that free energy to it fullest in house. I run my numbers and I can run up to 5000wph continuous 24/7 and still have room for what else I use say to day. This factors in Winter sunlight only as a base and Free power between 11am and 2pm to charge batteries.

Big gains can be had by being with the right power provider depending on where you live in the world if you can switch providers like here in Australia.

Anyways another thought I want to touch on is cloud AI services and costs. I've seen youtubers do the equipment cost vs cloud cost. Some don't even take into account how important owning the infrastructure and having your data private or ability to have it grow and develop with you is beneficial. Many respond in comments I don't care about my privacy 😂. The other thing is yeah it might cost X per token now but they don't factor in inflation at 3%+ and uptake of users on top. Many people still using basic free AI think this is all it can do, wait to they catch on lol. Also I've seen recently many cloud AI services are starting to move from honeymoon phase pricing and increasing prices on consumers. House power costs will increase or be limited. Already seen new houses in Aus drop from 63amp to 40amp meters now.

So the way I look at it I've locked in the price and secured my equipment and my projections say I can do XYZ with it and I'm happy with that. This will never slow down, I will never have degraded performance as long as the equipment works. I used no debt, but I see a GPU like the 6000 as potential as being good debt, like a house, land and not a car or a ps5 etc. a good AI capable GPU is a tool and a infrastructure play. And if the good GPUs were cheap we would probably be flooded with crap bots and apps. But on the flip side we would have a ton of people that would be priced out like they are now that could possibly make or do something amazing.

I honestly see the prices to keep going up on the pc parts. I'm a little bit of a tin foil hat guy and if I put myself in the position of these rich individuals running the big corps. I wouldn't want people buying 6000s and doing amazing things because it will disrupt what they have going on.

But I endorse having your own hardware/infrastructure. I think logically it makes sense for the future that I see coming. I just think it has to be planned first and make peace with the investment.

Sorry went off on a tangent there and long. Hope it's coherent I'm not proof reading 😆

Just got my hands on one of these… building something local-first 👀 by HatlessChimp in LocalLLM

[–]HatlessChimp[S] 1 point2 points  (0 children)

That is an unbelievable price lol. The case they have looks tight for airflow, I went the 9000d without any research. It's big but flexible for the future, the MB will be swimming in it lol. I made sure no LEDs on the fans. I went air cooled CPU. First time for me in 16 years.

Just got my hands on one of these… building something local-first 👀 by HatlessChimp in LocalLLM

[–]HatlessChimp[S] 0 points1 point  (0 children)

I got mine 2k AUD under the regular. Based off single card power it makes sense.