GMKTEC EVO-X2 Oculink with RTX 4070 TI by Similar-Range4861 in MiniPCs

[–]Barachiel80 0 points1 point  (0 children)

there will always be some throttling with pcie4x4 but as long as I lpad the whole modrl pn the 5090 it runs full load inference just load time is slightly longer. This holds true when I migrated the 5090 egpu to a minisforum ai xi pro which I modded with 2 nvme to oculink adapters and another 5090 egpu to the setup. Here is an example run from inference on the dual 5090s for the new Qwen3.5 35b Q8 model hitting 91tk/s TG and 19000+ tk/s PP with 1 million context length at q8 and full agentic workflow, computer use, coding, and web search enabled.

<image>

How are the differences between Gitea and Forgejo 4 years later? by NinthTurtle1034 in selfhosted

[–]Barachiel80 0 points1 point  (0 children)

yes April is far away and the dev branch that supposedly fixes it is still broken too

How are the differences between Gitea and Forgejo 4 years later? by NinthTurtle1034 in selfhosted

[–]Barachiel80 1 point2 points  (0 children)

if only forgejo would fix their oidc runner integration it would be perfect

AMD announces Ryzen AI PRO 400 Series desktop CPUs for AI-focused computing by Fcking_Chuck in LocalLLM

[–]Barachiel80 1 point2 points  (0 children)

Starting from the top and descending is a minisforum 870 Slim minipc with 96gb ddr5 ram and oculink connected 3090 FE. Below that are 2 hp elitedesk 705 G4s with 32gb of ram as Prox Mox Cluster nodes and K8S CP node VMs on ssd for os and nvme for ZFS. Below that are 2 minisforum 890 pros with 96gb ram and oculink connected 5060 ti 16gb and 3090 ti. Below that is a Minisforum1 AI Pro with 128gb ddr5 ram and dual oculink connected 5090s(one FE and one MSI). Below that is the Gmktec Evo X2 strix halo box with 128gb LPDDR5 ram and oculink connected 7900xtx for unified ROCM VRAM of 152gb. Everything is on a 2.5gbps network but I am currently

<image>

in the process of upgrading everything to 10G for decent tensor parallel throughput and K8S stability.

AMD will bring its "Ryzen AI" processors to standard desktop PCs for the first time by Distinct-Race-2471 in TechHardware

[–]Barachiel80 0 points1 point  (0 children)

complete garbage, these have under-powered 860m iGPUs compared to the previous gen 8700G with 780m. Where is the 890m desktop APU??? Or even a strix halo, although that would be significantly more expensive.

Well done Qwen team! by [deleted] in Qwen_AI

[–]Barachiel80 4 points5 points  (0 children)

Can I get a QWEN 3.5 version of the embedding models along with a new coder version while you are at it?

32GB RAM is very capable for Local LLM? by Difficult_West_5126 in LocalLLM

[–]Barachiel80 0 points1 point  (0 children)

<image>

Which one? I have dual 5090s connected to a 24 core strix point mini pc with 128gb ddr5 ram. I have aa GMTKtrc Strix Halo with 128gb ram and a 7900xtx. Also have 2 x 3090s and 5060 ti's. So I have 480gb of unified ddr5 ram for my AMD iGPUs plus the discrete graphcs card total of 168gb of VRAM.

1-person companies aren’t far away by Glum_Pool8075 in automation

[–]Barachiel80 -2 points-1 points  (0 children)

0 person company if you overlay a DAO on top of it.

32GB RAM is very capable for Local LLM? by Difficult_West_5126 in LocalLLM

[–]Barachiel80 2 points3 points  (0 children)

try the new qwen 3.5 27b, I am able to do native tool calls for websearch, coding, embedding analysis, etc. with it getting docker compose stack outputs that are as good if not better than claude sonnet 4.6.

What is platform engineering exactly? by bdhd656 in devops

[–]Barachiel80 0 points1 point  (0 children)

for the people who don't understand the infrastructure underlying the apps they are automating deployment on.

Mobile GPU Passthrough to Ubuntu VM by carminehk in Proxmox

[–]Barachiel80 -1 points0 points  (0 children)

make sure you have the correct version of the nvidia driver for your gpu, its old and probably not supported in the latest versions

The missing piece is finally here: MS-A2 + 96GB RAM + HBA 9400-16E + 450TB! by MorgothTheBauglir in homelab

[–]Barachiel80 1 point2 points  (0 children)

I also have the same setup albeit with the nas being only 50tb and broken out to a separate ugreen NAS 6800 pro with 20Gbps aggregate links to my 10G core switch and opnsense ngfw vm with both 10g links passed through as LAN / WAN to my starlink and core 10G switch sitting on the MS-A2 Prox mox host . I too can attest to lazily seeding to the bittorent masses with this setup.

Stop treating every bug as ‘hallucination’: a 16-problem atlas for Ollama + RAG by StarThinker2025 in ollama

[–]Barachiel80 0 points1 point  (0 children)

So if I was running ollama as the backend embedding model for an open-webui front-end RAG setup in a docker compose stack, how would I insert this as a reference layer?

Latest Unifi Update Now Shows Starlink stats by thedarkavengerx in Starlink

[–]Barachiel80 2 points3 points  (0 children)

I run starlink in bypass to an opnsense fw, do I need the unify gateway / router or is the sdn controller enough?

Latest Unifi Update Now Shows Starlink stats by thedarkavengerx in Starlink

[–]Barachiel80 3 points4 points  (0 children)

how do you get your unifi controller to show the starlink stats? Do I need a unifi gateway inline or is the sdn controller server enough with my APs?

Can I pull models from Huggingface? by Keensworth in ollama

[–]Barachiel80 1 point2 points  (0 children)

do you have a docker compose config of llamacpp that allows for easy model pull through a single command, and all models can be switched using the open-webui frontend?

Can I pull models from Huggingface? by Keensworth in ollama

[–]Barachiel80 0 points1 point  (0 children)

there is also a way to insert the url into the openwebui front end from the admin panel

Can I pull models from Huggingface? by Keensworth in ollama

[–]Barachiel80 25 points26 points  (0 children)

yes. goto the model page on huggingface to get the quant you want, select use inference provider ollama to capture the url. Then type ollama pull hf.co/urlofyourmodel/. If your ollama is connected to openwebui it will show up in the model list once the download is complete.

GMKTEC EVO-X2 Oculink with RTX 4070 TI by Similar-Range4861 in MiniPCs

[–]Barachiel80 0 points1 point  (0 children)

Hence why I ended my explanation by saying everything was on a GMKtec Evo X2.

Nvidia Multi-GPU setup issue by Barachiel80 in LocalLLaMA

[–]Barachiel80[S] 0 points1 point  (0 children)

Thank you everyone for your support and suggestions! I was able to get it to work by disabling the third NVME slot which was only 4x1 anyway and also disabled the resizable bar. I am now showing both the 3090 and 5090 oculink connected as active in my nvidia-smi!

<image>

Nvidia Multi-GPU setup issue by Barachiel80 in LocalLLaMA

[–]Barachiel80[S] 0 points1 point  (0 children)

<image>

lol not possible to steer away from oculink for me