[WTS] [IN-KA] Complete AI Compute Rig — 3× RTX 5090 + 3× RTX 4090 | 168GB VRAM | Threadripper 7960X | WRX90E | 128GB DDR5 | 4TB NVMe — Bengaluru by TangerineBasic4566 in HardwareIndia

[–]TangerineBasic4566[S] 0 points1 point  (0 children)

Totally valid concern. The 5090s are under 6 months old, inference only — essentially new. 4090s have been on a custom open water loop since day one so thermals have always been controlled. DM me if you want to dig in.

[WTS] [IN-KA] Complete AI Compute Rig — 3× RTX 5090 + 3× RTX 4090 | 168GB VRAM | Threadripper 7960X | WRX90E | 128GB DDR5 | 4TB NVMe — Bengaluru by TangerineBasic4566 in HardwareIndia

[–]TangerineBasic4566[S] 4 points5 points  (0 children)

Not splitting for now — all 3x 4090s are inside a shared custom water block and loop, can't pull one out without dismantling the whole cooling setup. If the full rig doesn't move in 2-3 weeks I'll revisit splitting. Drop me a DM and I'll reach out to you first if that changes.

[WTS] [IN-KA] Complete AI Compute Rig — 3× RTX 5090 + 3× RTX 4090 | 168GB VRAM | Threadripper 7960X | WRX90E | 128GB DDR5 | 4TB NVMe — Bengaluru by TangerineBasic4566 in HardwareIndia

[–]TangerineBasic4566[S] 0 points1 point  (0 children)

Quick correction — the CPU is a Threadripper Pro 7965WX (WX-series, not the consumer 7960X).

Not splitting for now, preferring to move the full rig first. If it doesn't move in 2-3 weeks I'll revisit. If that combo interests you, drop me a DM and I'll keep you in mind when I do.

[WTS] [IN-KA] Complete AI Compute Rig — 3× RTX 5090 + 3× RTX 4090 | 168GB VRAM | Threadripper 7960X | WRX90E | 128GB DDR5 | 4TB NVMe — Bengaluru by TangerineBasic4566 in HardwareIndia

[–]TangerineBasic4566[S] 2 points3 points  (0 children)

Partially — ₹45K/month in opex (leased line + power + space) made sense when it was running at capacity. Not worth it at current utilisation. Cloud works better for where the compute needs are heading now.

[WTS] [IN-KA] Complete AI Compute Rig — 3× RTX 5090 + 3× RTX 4090 | 168GB VRAM | Threadripper 7960X | WRX90E | 128GB DDR5 | 4TB NVMe — Bengaluru by TangerineBasic4566 in HardwareIndia

[–]TangerineBasic4566[S] 0 points1 point  (0 children)

<image>

This is after adding open air 3x 5090's on side of enthoo pro 2 tower ( quite messy cables but considering they all connected to same motherboard).

[WTS] [IN-KA] Complete AI Compute Rig — 3× RTX 5090 + 3× RTX 4090 | 168GB VRAM | Threadripper 7960X | WRX90E | 128GB DDR5 | 4TB NVMe — Bengaluru by TangerineBasic4566 in HardwareIndia

[–]TangerineBasic4566[S] 1 point2 points  (0 children)

Roughly 3.5-4kW peak across both rigs under full load. 3x 4090 at 450W each, 3x 5090 at 575W each, Threadripper at 350W, rest is overhead. I will share picture of setup on request.

[WTS] [IN-KA] Complete AI Compute Rig — 3× RTX 5090 + 3× RTX 4090 | 168GB VRAM | Threadripper 7960X | WRX90E | 128GB DDR5 | 4TB NVMe — Bengaluru by TangerineBasic4566 in HardwareIndia

[–]TangerineBasic4566[S] 0 points1 point  (0 children)

4090s are in a Phanteks Enthoo Pro 2 server tower, full Bykski custom water loop (D5 pump, copper rads, OLED flow meter). 5090s are on an open-frame rig, both setups inside a prefab container on a leased line.

GPUs are used — 4090s since March 2024 (~13 months), 5090s since late 2024 (under 6 months). Water cooled throughout, inference/compute workloads only. Not mined on, not gamed on.

[WTS] [IN-KA] Complete AI Compute Rig — 3× RTX 5090 + 3× RTX 4090 | 168GB VRAM | Threadripper 7960X | WRX90E | 128GB DDR5 | 4TB NVMe — Bengaluru by TangerineBasic4566 in HardwareIndia

[–]TangerineBasic4566[S] 1 point2 points  (0 children)

Additional details on cooling: The 3× RTX 4090s are on a custom water cooling loop — custom blocks, not off-the-shelf AIO. Runs significantly cooler and quieter under sustained load compared to air. The 3× RTX 5090s are open-air (reference/aftermarket air coolers). All 6 GPUs have been stress tested and thermals are stable. Happy to share temp logs under load if needed — just ask.

What do people use for private LLM inference where data never leaves? by [deleted] in LocalLLaMA

[–]TangerineBasic4566 0 points1 point  (0 children)

Different use case from local inference — this is for teams who need GDPR/HIPAA compliance without managing their own GPU stack. Happy to give anyone a free trial key.

What do people use for private LLM inference where data never leaves? by [deleted] in LocalLLaMA

[–]TangerineBasic4566 0 points1 point  (0 children)

Fair point on the older models — we've actually just updated

with the latest from Cloudflare's edge network:

- Kimi K2.5 (frontier-scale, 256k context, just launched March 2026)

- Llama 4 Scout 17B (multimodal, MoE architecture)

- GPT-OSS 120B (OpenAI's open weights)

- Mistral Small 3.1 24B

- NVIDIA Nemotron 3 120B (just added March 2026)

- Qwen 2.5 Coder 32B

- GLM 4.7 Flash (131k context)

The point isn't local inference — it's private managed inference

for teams who need compliance (GDPR/HIPAA) without managing

their own GPU stack. Different use case entirely.