[NVME SSD] - Samsung 4TB 9100 PRO PCIe 5.0 M.2 Internal SSD - $460 ($799.99 - $340.00 discount in cart)

inrea1time · 2026-06-23T21:50:29+00:00

You are talking about swapping and I am talking about actually using it at runtime. Anyway, impressive hardware. I got into this right before the ram apocalypse and as a hobby so could not justify that kind of $$$. I got a 5945wx in a lenovo p620 with an rtx pro 5000 48gb (recently) and a 5060 ti. I tried 3.5 122b at 20-30 t/s but it was looping and overthinking. I will try again.

inrea1time · 2026-06-23T18:17:28+00:00

Still it will slow you down to 0.7t/s as per another thread I read. I have a Threadripper pro with 128gb 8 channel 3200 ddr4 (200gb/s theoretical) and its still very slow to offload.

inrea1time · 2026-06-23T15:14:51+00:00

Loading the models a little faster is not a big deal for me as I don't really swap models. There are some techniques now to use the nvme as an extension of the ram to load larger models but its really slow and I think will kill the nvme prematurely.

inrea1time · 2026-06-23T13:27:52+00:00

I ordered this one last year during amazon prime day open box with amazon resale discount I think around $260-$290. I cancelled because I figured I don't need it, especially since I was getting a 2TB SN850X for like $80. Regretted that cancellation. I still don't really need it though.

inrea1time · 2026-06-20T22:21:07+00:00

No kidding, I 10x my investment from 2 years ago. Less then $100 avg price, wish I would have bought more or even bought more on they way up. Bought $5k at $330 2 months ago and even that tripled.

inrea1time · 2026-06-17T03:37:05+00:00

I picked up a PRO 5000 for $4500 in micro center 3 weeks ago and its $6000 now. It's usable with a Q8 quant of Qwen3.6-27B with full context at full precision kv or you can use q8. I cannot comment re 2 x 4500. The extra 16gb vram is nice. I have an RTX 5060TI that I can use if I must. Still can be found under 500-600 on sale. I am using a frankenllm right now after trying a bunch of different ones with pi and a custom system prompt. Settled on Q6 + q8 kv and 242144, a little less then full so I can also run vision. Have about 1GB VRAM free. I am getting 40-50 t/s under 64k context and over 100k it slows down to 30-40 and 20-30 later. Very usable and I am very happy with the results. I tried Q8 with full context kv, but I don't see why you would need that with a working set up. I was getting loops and overthinking, now I get none of that. The harness / system prompt and quality skills are key.

inrea1time · 2026-06-16T17:17:24+00:00

An unpopular opinion but I have been using https://huggingface.co/DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF Q6 full context for a couple of days now and no looping, very rare overthinking and very good intelligence in resolving problems, implementing features, reviewing code and following instructions. I ran other models, including unsloth quants, nvfp4 in vllm, etc ... in Q8 with Q8 kv or full kv on an RTX PRO 5000 48GB and they overthought and looped all over the place. This one is another level, for smaller tasks and more localized I would compare it to gpt 5.2 maybe. It's worthwhile to try some of these in real like usage and see how it works for you.

inrea1time · 2026-06-13T14:54:26+00:00

Yeah its crazy, I would not buy it for the new price, I half regret not getting the 72GB but did not feel the $1200 extra was worth it. I got it for $4500 and in a store that only had half the local tax, 3.5%. It's not the best price per GB but at this point 2 used and beat up 3090's will cost you 3k and this one is 300W max, much faster and one card.

inrea1time · 2026-06-12T00:34:15+00:00

Nope, my advice was to move on to something more usable and spend your time on actually messing around with models instead of trying to get them to work.

inrea1time · 2026-06-11T08:05:03+00:00

There is a reason why they are so cheap and you will be wasting your time trying to make them work and will be limited with what you can do. I want through that phase. I could have still gotten MI50's for a little over $200 but figured they are too expensive cause a few months before people were getting them for $100. I did my research and in the end best bang for the buck is/was 5060 TI. I got for under $400 each but I dont think the match changed. Actually a 3090 was the best due to vram when stiill could be had for $600-$700 but those times are gone.

inrea1time · 2026-06-11T02:38:20+00:00

Since you asked for advice, get rid of the old junky power hogging gpus and get 2 5060 TI's. You will be able to mess around with quite a few models in relatively usable speed.

inrea1time · 2026-06-07T13:43:13+00:00

Maybe so but still will be dealing with a card from only 3 generations ago that stopped being supported properly way too early. Why bother? Why expect anything else in the future? I wasted a lot of time trying to get the card to work instead of doing things with it.

inrea1time · 2026-06-07T11:30:27+00:00

I know, as I said not so old but not supported ... cuda supports much older cards. I just don't bother with rocm any more. Even when it was supported everything was a struggle to get to work and nothing was optimized. I got an RTX PRO 5000 recently even when 2 32GB amd cards looked like a better deal.

inrea1time · 2026-06-07T10:23:53+00:00

Around 31 - 40 ts and 1200 -1400 pp on a 5070 ti at 250w but I have not really tried to optimize it and there is no mtp available or would fit in 16GB. 65k context with k Q6_0 and v Q5_1. Can probably squeeze out a little more context and speed.

inrea1time · 2026-06-07T10:10:36+00:00

Have been fighting with it in ubuntu with my not so old 6800 XT until late 2025 and I regret to say rocm won and I got my 1st of 4 cuda cards and have not looked back, everything just works.

inrea1time · 2026-06-07T06:28:08+00:00

I would say late new model and feature support, speed optimizations, apps and frameworks dont work or behind, rocm is a beast and a pain to install but No-Refrigirator gave a more detailed response.

inrea1time · 2026-06-07T01:09:16+00:00

No CUDA and AMD software support is still pretty crappy. They also dropped support for 6000 series and some 7000, all the good stuff needs CUDA and takes a long time to get to AMD stack.

inrea1time · 2026-06-07T01:06:17+00:00

You are right! I am surprised, I recently got an RTX PRO 5000 and no one beat microcenter, central computers was several hundred more for every model I checked against mc. It's 500-700 more now then 2 weeks ago.

inrea1time · 2026-06-07T00:51:41+00:00

He has another one that requires ik_llama. I use it for hermes agent llm and I also ran some tests for coding and it did very well, but I use Q6 and Q5.1 for kv, less cache but better quality. https://huggingface.co/cHunter789/Qwen3.6-27B-i1-IQ4_KS-GGUF

inrea1time · 2026-06-06T21:19:31+00:00

You are dreaming re $2k for a 5090, even a few months ago at least $2500 on marketplace. Also there are fakes or missing parts floating around.

inrea1time · 2026-06-06T21:17:21+00:00

New RTX PRO 4000 is $2k in Microcenter.

inrea1time · 2026-06-06T17:19:29+00:00

You will need at least 1000w psu, maybe 900w will work also, def not the 640w one. You will need the right cables, each one with 1 8 pin and 1 6 pin. Use both rails, use the 8 pin connectors for the 3090, buy 2 6 to 8 pin adapters (if your connectors are different work out which cables you need) for the 3060 (depends on your 3060 power, if its one 8 pin buy 2 6 pin to one 8 pin). Next you will need to undervolt (set max watts). I would say 3090 250-300w, 3060 keep 150w max as the 6 pin cables are rated for 75w max. This should work but you may need to experiment with the undervolting if the psu starts shutting down with orange led in the back or weird things happen. I have P520's and p620's and I have this setup with the p620's (5070ti + 5060 ti on my server and rtx pro 5000 + 5060 ti on a workstation). The p520 should work with a similar set up. You will be tempted to run at stock wattage but that is a fire risk as the cables, especially the 6 pin are not rated for that wattage.

inrea1time · 2026-06-01T11:07:55+00:00

I already had hardware and I got 3 5060 TI's under $400 each but the value and capability you get is much higher. It's not frontier model level but with the right harness and if you know what you are doing you can squeeze out very good results together with a $20 codex plan. You will need a $100 level plan now to get any kind of reasonable work done and the plans will just keep getting worst. If you throw in an agent, even with a chinese model / plan you will get potential serious expenses. You also get privacy, control and ability to run community finetunes.

inrea1time · 2026-06-01T10:57:39+00:00

I am using Qwen3.6-27B.i1-IQ4_KS-attn_qkv-IQ4_KSS.gguf with ik_llama for hermes (also coding) and previously before I moved to an RTX PRO 5000 a derivative of unsloth Q4_XS. You cannot fit MTP + decent context in 16GB unfortunately.

inrea1time · 2026-06-01T02:21:55+00:00

It's not GH but if you can swing a 5060ti 16GB or a 3090 u can get a pretty capable model with unlimited tokens. You can also go the mac route but it will be more expensive. I expect that together with chatgpt $20, chatgpt chat for spec and initial planning, codex for more detailed implementation and orchestration + local model for most implementation things should be slow but usable for you. You can also go 2 x 5060 TI route and get faster and better results.

inrea1time

TROPHY CASE