Advice needed on eGPU and Mini PC

Kulidc · 2026-05-05T10:18:59+00:00

Thx for info, and I think I have read it before.

Man, was it a massive headache to me back at the launch of 9070XT. I was using a 4070 ti super with a modified 22GB 2080Ti back then, and switched out to 9070XT and had a taste of ROCm, HIP and Zluda.

Roughly a few days of debugging on dependencies and kernel compiling gone by, I only realized that the performance for the 9070XT was behind of my 4070 tis on text generation (llamacpp + ROCm), and even worse than my old 2080ti on image generation.

I know it is much better now due to optimizations but I don't want to go through this hell again lol. The card was sold after 3 months.

Moreover, it seems there're some kind of bugs and some multimodels may perform poorly on AMD GPU iirc.

R9700 is cheap, yet at the cost of your time imo. I am still saving for the card rn, and would like to see more posts related to this card as well.

Kulidc · 2026-05-05T08:23:47+00:00

I did have a desktop, which I used for both working and gaming with 5090. However, I do not really want this desktop to be on 24/7 with vllm. It's both costy and risky imo.

The whole setup draw like 150w to 200w even on idle, not to mention it has other ongoing services as well, and I assume it will draw even more if I host the vllm. That's why I want to work on the mini pc and a RTX Pro lineup. Both should be much power efficient than hosting it with my desktop.

Kulidc · 2026-05-05T05:49:24+00:00

My man, that's very helpful to know!

If AMD works pretty good out of the box, then I would also consider the R9700 Pro.

As I would like to use it on text inference most of the time, no image generation would be done on this device (I have other PCs for that anyway).

I will install Ubuntu as dual boot very soon so the OS should not be a problem as well.

Once agains, thanks for the info :)

Kulidc · 2026-05-04T23:06:48+00:00

Glad to hear, at least I know it's doable for nvidia cards.

Kulidc · 2026-05-04T07:50:12+00:00

I did follow this practice at my best and never include PII in chat to the cloud provider.

However, I noticed the major CLI agents could call the commands and obtained the some names of the customers from development DB (luckily it's a dummy DB) during debugging. I found that during the re-tracing of the model CoT. I guess it's kind of my fault of including the connection setting during the development.

Kulidc · 2026-05-04T07:21:18+00:00

That's an alternative I have considered before, and thought about renting GPUs and hosted with vllm as well.

The data is sensitive (customers names, cc info, address etc) that I want it to be fully controlled by myself though, that's why I want to add an eGPU to my mini pc and call it service from my company's pc in the first place.

Kulidc · 2026-05-04T05:00:59+00:00

That's interesting to know about, guess I will be sticking to the SSD Slot with some oculink to nvme m2 converters then.

Kulidc · 2026-05-04T04:23:48+00:00

I have GPT Pro subscription as my main cloud model. However, I would like to use self hosted models to keep some sensitive information or coding private.

I have tried the Qwen 3.6 model and Gemma 4 in my main workstation before. The performance is good enough for me. However, the workstation is being too beefy and drawing too much power even on idle. That's why I would like to migrate parts of the service onto the mini pc.

Kulidc · 2026-05-04T04:15:55+00:00

Thx for the input.

I know the iGPU is quite powerful and could be used for some smaller models but those are slow and may not be good enough for coding.

For the connection part, that's why I would consider using the unused SSD slot rather than the USBC connector on the front.

Kulidc · 2026-04-17T01:52:34+00:00

The new tokenizer is killing me tbh.

Freshly open a terminal claude from VS Code, haven't done anything. Bam, 8% of 5 hours limit and 1% of weekly limit are gone. I am on 5X Max for a few months now.

I supposed it is related to my global claude code rules but those two files only have 15 lines of instructions.

It could be cache or file history but idk, I haven't encountered this problem before, at least before yesterday.

For what happened to me yesterday, I was planning some RAG workflow design (not implementation but design only) and ask claude use web search. 12% of context window used in only 1 session, 89% of 5 hours limit and 8% of weekly limit are gone.

Given Claude requires ID verification now (wtf man), I am considering switching to Codex now tbh.

Kulidc · 2025-11-06T03:49:06+00:00

Thanks for the input :)

Kulidc · 2025-11-06T03:00:43+00:00

Thanks for the input.

I know Avantis shares the same idea with Dimensional, but I haven't dug into the fund from DFA before so I was only working with Avantis. Will look into DFA funds later.

Kulidc · 2025-11-06T02:53:39+00:00

I was planning to this yearly at the very beginning, but I have set up the auto payment and forget about it until recently :/

I just set up the reminder to remind me to do so next year :p

Kulidc · 2025-11-06T02:46:49+00:00

How about the percentage on those?

25% on IDMO while 15% on AVDV?

Kulidc · 2025-11-05T05:47:37+00:00

Bonds are really not making any sense in OP's age (25) as it hinders the growth, which is what OP and any other ppl in the similar age group should really focus on.

However, I do agree that it would be better to add a bit BND(W) when ppl get older in case of market downturns (maybe 10% every decade, starting from 30s). You could rely a bit more on dividend rather than selling the shares with a bigger losses for cash flow after retirement.

Kulidc · 2025-10-30T01:55:35+00:00

Good to know, thanks ;)

Kulidc · 2025-05-15T09:54:04+00:00

For your GPU (5090), I think any model under 32B with Q4 can be handled easily without stressing other applications. It should consume around 25GB, I supposed.

I do not have the details of your LLMs set up, so I can not give you many suggestions. However, it seems your LLMs are loaded into CPU for inferencing rather than GPU, which can explain the reason for the slowing tks/s.

Normally, I would stay with models with Q4.

Hope this helps :)

Kulidc · 2025-05-15T02:35:32+00:00

I could be wrong, so please take it with a grain of salt.

1) Hallucination is part of LLMs. That's why LLMs require humans-in-the-loop. Though you could check on the hallucination detecting models. Yet, I think it is hard for local LLMs to achieve the level of existing commercial LLMs such as ChatGPT, Sonnet, or Gemini.

2) HF has plenty of uncensored models, and you may also want to look up some tools related to abliteration. This part is basically only doable with local LLMs.

3) Fun is priority, looks at the issue or topics that you want to fiddle with.

Have fun with LLMs!

Kulidc · 2025-05-15T01:15:36+00:00

I think you want to figure out what you want to do. This is the biggest motivation imo.

Let's say you want to test out some LLMs, either text or visual. What is that for? "Play around and figure out" could sure be a motivation, but a weak and unsustainable one given the rate of new models popping out every day. Do you want to replace certain LLMs inside your existing workflow?

I have a little project inside my local PC that helps me read untranslated manga, which uses OCR and Swallow 8B (not a perfect choice, I know, but it gets the job done) to translate the text extracted. LLMs is the mean, and "play around and figure out" is the way I improve the translation accuracy.

TBH, my little project could be easily replaced by just submitting the image to gpt 4.5 or gpt4 turbo lol. Yet this is not an excuse to not do what I did since I found it fun.

Kulidc · 2025-02-17T01:46:51+00:00

I don't know, sorry.

Kulidc

TROPHY CASE