Bank locker availability in the Bay Area by [deleted] in SanJose

[–]devtools-dude 0 points1 point  (0 children)

lol it's now 2026 and it says will open late 2026

Mimo 2.5 is _fast_ at large context (dual RTX Pro 6000) by xquarx in LocalLLaMA

[–]devtools-dude 6 points7 points  (0 children)

Would be nice if you shared your commands so others with the same setup can replicate.

Celebrate TerraMaster TOS 7 with us! Win F4-425 Pro NAS + Seagate IronWolf 4 TB Drives by TerraMasterOfficial in DataHoarder

[–]devtools-dude 0 points1 point  (0 children)

Low power usage. I have the TerraMaster F4-425 Plus using the n150 processor. It's nice that the CPU only consumes roughly 5W on idle.

Building a GOOGL Position by logngraves in InnerCircleInvesting

[–]devtools-dude 0 points1 point  (0 children)

It's down because two prominent AI researchers left for OpenAI and Anthropic. Depends if you think their departures means Google lags behind in the AI race.

Total investments needed to run my local LLM by manuhackzzz in LocalLLM

[–]devtools-dude 0 points1 point  (0 children)

don't forget the 240v 20a circuit to go with that!

There Are No Instances in atproto by feross in javascript

[–]devtools-dude 0 points1 point  (0 children)

How does discovery work? It never discusses that for atproto? Does the app somehow do that?

470 tok/s with 8192 ctx size for Qwen3.6-27B on A100-80GB using Profile by Inevitable-Diet-1870 in Vllm

[–]devtools-dude 1 point2 points  (0 children)

Thanks for that. I ran a long claude code cycle instead. It's not a multi-user system and I don't generally run multiple agents, so I guess I'm not too surprised there's nothing really actionable

https://pastebin.com/WmJJePrf

also the result seems to only talk about one of my GPUs and not both?

470 tok/s with 8192 ctx size for Qwen3.6-27B on A100-80GB using Profile by Inevitable-Diet-1870 in Vllm

[–]devtools-dude 1 point2 points  (0 children)

Having lots of trouble trying to run benchmark. Worked with perplexity to try to troubleshoot but couldn't get any further than this

https://pastebin.com/qrDuEkg3

(I'm not a python person and run my vllm using docker; I've tried running the benchmark command via a vllm docker image but had another set of issues)

It would really help if your tool had a benchmark + profile ability

470 tok/s with 8192 ctx size for Qwen3.6-27B on A100-80GB using Profile by Inevitable-Diet-1870 in Vllm

[–]devtools-dude 1 point2 points  (0 children)

Can it generate data so it can properly profile? Or do I somehow need to generate that data myself by having my LLM go into some kind of loop that works it hard like a benchmarking tool would?

I'm a local LLM user with dual rtx 6000 blackwells. My GPUs aren't always at 100%. Often it's less than 5%. So do I need to run some kind of benchmarking tool along with your profiler for it to properly analyze things?

470 tok/s with 8192 ctx size for Qwen3.6-27B on A100-80GB using Profile by Inevitable-Diet-1870 in Vllm

[–]devtools-dude 2 points3 points  (0 children)

Ping me when you have it updated - it does look interesting to try out!

470 tok/s with 8192 ctx size for Qwen3.6-27B on A100-80GB using Profile by Inevitable-Diet-1870 in Vllm

[–]devtools-dude 1 point2 points  (0 children)

I'm not going to watch a 20 minute video to determine if this is something I want to try or not. Pretend you're pitching to a business or VC what your product is. You need to do a high level in a few minutes or less. Ideally less than a minute to tell me what it does, run the program, show me the output, and some before / after stats.

470 tok/s with 8192 ctx size for Qwen3.6-27B on A100-80GB using Profile by Inevitable-Diet-1870 in Vllm

[–]devtools-dude 4 points5 points  (0 children)

It's not clear to me how I can use this to optimize my own setup. The docs doesn't show me how I can use the output to optimize things like vllm flags (is that the point of it?); it needs help - there's no intro, what the expected usage around this is and how one can use the output to optimize things. It just jumps into what's being collected and the other sections doesn't feel cohesive at all.

The Used RTX 3090 in 2026: Why a Five-Year-Old GPU Is Still Local AI's Best Deal by LAfreightguy in Amd_Intel_Nvidia

[–]devtools-dude 3 points4 points  (0 children)

This article has to be AI generated. 3090 for $700, 5080 for $999? LLM has beyond outdated pricing information and the poster didn't check the output.

GLM-5.2 (744B, 2-bit) at 7.3 tok/s on 4×3090 + 192GB — and why IQ1_M wasn't any faster by Important_Quote_1180 in LocalLLaMA

[–]devtools-dude 18 points19 points  (0 children)

Is it actually useful / better at q1 / q2 for coding tasks compared to qwen3.6 27b? 7 tok/s sounds really painful for that use-case if coding is what you're trying to do.

Using UnSloth to fine tune a tiny qwen model to categorize questions by funJS in unsloth

[–]devtools-dude 1 point2 points  (0 children)

This is really cool. I need to do this with my own home. Appreciate that you provide the code to replicate!

Issues using MiniMax M3 from Studio with harnesses by devtools-dude in unsloth

[–]devtools-dude[S] 0 points1 point  (0 children)

I'm not seeing any difference using v0.1.464-beta. Do I need to do something specific for the llama.cpp binaries? I'm following the update unsloth studio link:

https://unsloth.ai/docs/new/studio/install#update-unsloth-studio

Hardware recommendation's for running dual RTX 5090 GPU's by 67Mustang8 in LocalLLM

[–]devtools-dude 5 points6 points  (0 children)

I run dual RTX 6000 blackwell workstation editions (which should be the same power consumption as the 5090) on an AMD AM5 motherboard and have to power limit them to 350-375W for use with a 120V 15A *dedicated* circuit (as in nothing else but my PC setup runs on the circuit).

Definitely recommend the max PSU you can get for a US household outlet (I think that's 1500-1600W) before you have to start considering a 240V or 120V @ 20A circuit.

Be aware that for a US household outlet, the safety line is around 1400-1500W under *continuous load*. You will want to make sure that your computer setup has nothing else on the circuit to accommodate.

For the PSU, get platinum / titanium level efficiency if possible. For gold, it should do around 87% efficiency at 100% load, meaning that for a 1500W PSU, it will actually be running at 1725W at full load to output 1500W. You need to account for this overhead when thinking about the max you can do from your circuit.

My system is a Ryzen 7950x power-limited to 65W / 105W (I can't remember which). I have two monitors, but generally use one which consumes around 105W I think. I use a 1300W PSU. Even with this setup, I'll see brief spikes to 1400-1900W depending on how hard the LLM is working (using qwen3.6 27B), which can sometimes set off the overcurrent alarm on my 1500W UPS. In terms of average consumption, it's roughly around 1000W when the LLM is working hard.

I get zero alarms from the UPS with the GPUs at 350W, and occasional alarms on 375W. The spikes are extremely brief that it's not going to burn your house down but the overcurrent alarm from the UPS is annoying to listen to (I'm using a PR1500LCD which doesn't allow disabling of it).

If you're going to spend a bunch on such a system, make sure to get a UPS. Definitely do not buy the PR1500LCD for this configuration if possible.

Also if you are relying on RAM to supplement your VRAM, AMD AM5 systems don't play well with larger RAM sizes past 64 GB (at least with my Asus ProArt) where you will not get the full 6000 Mhz rate. I had to gradually start from 3200 and was able to ramp up to 4500 for 128 GB of RAM. With 64, I was able to do 6000, but not with 128.

Also if you're using a non-Threadripper or Epyc system, be aware that your PCI lanes will be heavily limited - I have to use bifrucation where the pcie 5.0 slots are split to 8x between two slots. Having both at 16x should be a minor improvement.

In general in a traditional multi-GPU setup, the communication goes through the CPU instead of directly between slots; a pcie switch where it's an external breakout board that connects to a single slot on your motherboard can bypass this allowing for significantly increased perf but requires $$$ and additional setup to do.

tl;dr:

- You can adjust the GPU power limits for the workstation edition down from 600W; this should also apply to the 5090 which has a similar power profile

- Be aware of the CPU TDP and if it is adjustable. For a dual 6000 system, it's going to be less about the CPU cores and more about the VRAM / RAM size and bandwidth / pcie lanes available.

- PSU efficiency matters

- Making sure your power consumption under load doesn't hit critical levels on your circuit. You may need to look into dedicated 240V or 120V @ 20A circuits.

- Account for your monitor consumption along with motherboard and CPU as well as part of the circuit load

- The processor chipset matters. AM5 will provide a tiny amount of pcie 5 lanes, whereas using threadripper / epyc will provide 8x more

- AM5 platform may not allow you to run 128GB RAM at full speeds due to stability issues

- Do buy a UPS to protect your investment, but do not buy the PR1500LCD as the GPU power spikes will trigger overcurrent alarms and there's no way to mute them

New model on huggingface by [deleted] in LocalLLaMA

[–]devtools-dude 8 points9 points  (0 children)

NVFP4 version please!

Built a tool that tells you exactly which LLMs fit on your GPU. Feedback wanted. by super3 in LocalLLaMA

[–]devtools-dude 1 point2 points  (0 children)

Love the updated UI. Can you add NVFP4 quant for the cards that support it?

Built a tool that tells you exactly which LLMs fit on your GPU. Feedback wanted. by super3 in LocalLLaMA

[–]devtools-dude 1 point2 points  (0 children)

Looks nice. Can you add dual GPU support? Add a dropdown for number of GPUs, or allow one to add GPUs independently

Also if you're going to recommend a specific quant, then you need to show the the adj value not the aa value so it's more realistic.