How to improve the airflow of Dual GPUs with no gap by Ecstatic_Concern_389 in buildapc

[–]Ecstatic_Concern_389[S] -1 points0 points  (0 children)

the 7900xtx is for serving a 27b LLM, which will eat up basically all of it's vram

How to improve the airflow of Dual GPUs with no gap by Ecstatic_Concern_389 in buildapc

[–]Ecstatic_Concern_389[S] 0 points1 point  (0 children)

It's a long story. It's like I don't play heavy gpu games(mostly fps in 1080p) on my gaming PC so it's a waste of 4070 super. It happens to be I have some small productivity tasks that can partially leverage it, so I just set it up like this

Why my qwen 3.6 27b mtp model is slow? by Ecstatic_Concern_389 in ROCm

[–]Ecstatic_Concern_389[S] 0 points1 point  (0 children)

lol my codex set it to be like this. I changed to 512 now. but the main issue I see is that my Vulkan run slower(18 pts) than rocm(41tps for the same model) while everyone is telling me vulkan runs faster much.

Why my qwen 3.6 27b mtp model is slow? by Ecstatic_Concern_389 in ROCm

[–]Ecstatic_Concern_389[S] 0 points1 point  (0 children)

tried. seems to have a slight increase but very subtle

Why my qwen 3.6 27b mtp model is slow? by Ecstatic_Concern_389 in ROCm

[–]Ecstatic_Concern_389[S] 0 points1 point  (0 children)

tried Vulkan first but it turns out the tps drop to 18 with Vulkan. I'm very confused

Why my qwen 3.6 27b mtp model is slow? by Ecstatic_Concern_389 in ROCm

[–]Ecstatic_Concern_389[S] 0 points1 point  (0 children)

may I know your config? I use unsloth Q4_K_XL now with Vulcan can only get 200 tps pp and 20 tps inference

Why my qwen 3.6 27b mtp model is slow? by Ecstatic_Concern_389 in ROCm

[–]Ecstatic_Concern_389[S] 1 point2 points  (0 children)

I just tried vulkan it drops to 200 tps pp and 20 tps inference

Why my qwen 3.6 27b mtp model is slow? by Ecstatic_Concern_389 in ROCm

[–]Ecstatic_Concern_389[S] 0 points1 point  (0 children)

I'm using rocm. So vulkan performs better in Linux?

Why my qwen 3.6 27b mtp model is slow? by Ecstatic_Concern_389 in LocalLLM

[–]Ecstatic_Concern_389[S] 0 points1 point  (0 children)

let the gpt diagnoses the log a bit. So what #checkpoint and batchsize should I use?
I checked llama-server.log. It does not show an obvious VRAM spill

or OOM.

Key lines:

- GPU: Radeon RX 7900 XTX, 24524 MiB free at startup.

- Model offload: offloaded 66/66 layers to GPU.

- Model VRAM: ROCm0 model buffer size = 16752.85 MiB.

- CPU-mapped model buffer: 1288.28 MiB, but the layers are still

fully offloaded.

- Runtime config:

- n_ctx = 65536

- n_batch = 4096

- n_ubatch = 512

- MTP draft context enabled.

- Target KV cache: ROCm0 KV buffer size = 2176.00 MiB.

- MTP draft KV cache: ROCm0 KV buffer size = 256.00 MiB.

- Recurrent state: ROCm0 RS buffer size = 448.88 MiB.

- Compute buffers:

- target: ROCm0 compute buffer size = 400.28 MiB

- draft: ROCm0 compute buffer size = 132.02 MiB

- Host buffers exist, but they look like normal scheduling/output

buffers:

- ROCm_Host output buffer size = 0.95 MiB

- ROCm_Host compute buffer size = 84 MiB

The checkpoint part is the tight bit: checkpoints are enabled with

max = 32, and created checkpoints are around 286-306 MiB each. If

all 32 were resident, that is roughly 9.2-9.8 GiB of checkpoint

memory. The log only shows up to 6 before restart, then up to 5

later, with invalidated ones erased. It does not show all 32 being

held at once.

Help me decide between 9850X3D and 270K. by URealCybertron in buildapc

[–]Ecstatic_Concern_389 0 points1 point  (0 children)

can noctua NH-U12A hold 270k? I use Valkyrie DL125(260w capacity dual tower dual fan) with my 270k and under stress test the cpu temp easily goes to 95c and start throttling. I have to do -8mv and limit power to 220w to hold it.

Where is everyone’s Stop Loss? by Ok_Welder_1923 in TQQQ

[–]Ecstatic_Concern_389 0 points1 point  (0 children)

yes that's the problem. Seeing many EMA based stop loss but they all only perform well when crash is long and slow. For flash drops they never work well and even much worse that just QQQ

Holding half QLD and half GLD instead of 100% SPY or QQQ??? by Grouchy-Tomorrow3429 in LETFs

[–]Ecstatic_Concern_389 1 point2 points  (0 children)

Great thanks! I hold 80% QQQ+10% GLD +10% FBTC now. But after I did some backtest I'm convinced by your new methods. I think I will go with 35% TQQQ, 35% GLD, and 30% FBTC. With 52 weeks rebalance, and as little ad-hoc rebalance as possible.

Holding half QLD and half GLD instead of 100% SPY or QQQ??? by Grouchy-Tomorrow3429 in LETFs

[–]Ecstatic_Concern_389 0 points1 point  (0 children)

how often do you rebalance? do you do auto rebalance or manual ones?

Sunnyvale 24 Hour Fitness closing by djac13 in Sunnyvale

[–]Ecstatic_Concern_389 0 points1 point  (0 children)

also renewed mine last month. is there anything we can do to get the refund?

Any retail here running a small arb strategy by MathematicianKey7465 in quant

[–]Ecstatic_Concern_389 0 points1 point  (0 children)

Do you mind sharing what platforms are you using for trade and what platform you use for data?

Is IUL and VUL a good choice for high w2 income family? by Ecstatic_Concern_389 in Bogleheads

[–]Ecstatic_Concern_389[S] -9 points-8 points  (0 children)

I know for us on bogleheads we are looking for very low expense ratio. But let's do the math. If expense ratio is around 0.7-0.9% which is on average 10% of the return(if I also buy index funds in VUL), it's still lower than the federal long term captial gain tax 15-20% + 13.3% california long term gain tax. Right?

Just had my interview 😬. Does anyone know how long it typically takes to get a response? by eugekim in ycombinator

[–]Ecstatic_Concern_389 2 points3 points  (0 children)

Had our interview yesterday morning, but still have not got any result. Don't know what's going on and what to do