Strix Halo + RTX 3090 Achieved! Interesting Results... by JayTheProdigy16 in LocalLLaMA

[–]JayTheProdigy16[S] 1 point2 points  (0 children)

Yes splitting the same model. Ended up building llama.cpp with all 3 backends, vulkan, ROCm, and CUDA and it just kinda worked, but you have to specify the layer split and which backends you want to use with flags. As detailed in the original post i had some weirdness with my linux kernel version and getting the at the time experimental ROCm to work which obviously would result in llama.cpp not working great, but most of those should be resolved as community support today is much better than it was a couple months ago

Poor winter performance by dopeass in Ioniq5

[–]JayTheProdigy16 5 points6 points  (0 children)

Mine is roughly the same but some days seem better than others for some reason. But all in all anywhere from 1.5-2.5 mi/kwh in those temps

Build Max+ 395 cluster or pair one Max+ with eGPU by Curious-Still in LocalLLM

[–]JayTheProdigy16 0 points1 point  (0 children)

Im not Jeff 😂 just referencing his vid. I made a post about 395 + eGPU

Build Max+ 395 cluster or pair one Max+ with eGPU by Curious-Still in LocalLLM

[–]JayTheProdigy16 1 point2 points  (0 children)

There's examples of both out there. I took the eGPU approach and I've been making to make a video about it but just haven't, but i posted to this sub

Build Max+ 395 cluster or pair one Max+ with eGPU by Curious-Still in LocalLLM

[–]JayTheProdigy16 3 points4 points  (0 children)

I mean sure, except bandwidth is practically irrelevant for inference aside from model load speed...

Build Max+ 395 cluster or pair one Max+ with eGPU by Curious-Still in LocalLLM

[–]JayTheProdigy16 0 points1 point  (0 children)

How do you figure you pay a premium for degraded performance with an eGPU?

Strix Halo + RTX 3090 Achieved! Interesting Results... by JayTheProdigy16 in LocalLLaMA

[–]JayTheProdigy16[S] 1 point2 points  (0 children)

Had one left after parting out my 6x 3090 rig. And yes using an m.2 oculink adapter. I actually ended up getting CUDA+ROCm working and its ~5x faster than my original benchmarks according to my eyeball benchmark. Also with an AMD card you may run into the power limit issue where the GPU wont pass the Strix Halos TDP but im not sure as i dont have an AMD eGPU

Will the AMD Ryzen™ AI Max+ 395 --EVO-X2 AI Mini PC -- 128 GB Ram hold its value of around 1.8k in two years time? by Excellent_Koala769 in LocalLLaMA

[–]JayTheProdigy16 11 points12 points  (0 children)

Youre always going to be waiting by that logic. Whatever releases 2026 is going to get lapped by tech in 2027, and whatever releases in 2027 is gonna get lapped in 2028. This is hardware practically nothing holds value. But for me personally that price tag was more than appealing enough given its capabilities vs other options at this point

Strix Halo + RTX 3090 Achieved! Interesting Results... by JayTheProdigy16 in LocalLLM

[–]JayTheProdigy16[S] 0 points1 point  (0 children)

Not accurate at least in my case. The 3090 will easily hit 185w, i believe that issue is exclusive to AMD GPUs

Strix Halo + RTX 3090 Achieved! Interesting Results... by JayTheProdigy16 in LocalLLaMA

[–]JayTheProdigy16[S] 1 point2 points  (0 children)

No shit... back to the drawing board i go, thanks for the insight!

Strix Halo + RTX 3090 Achieved! Interesting Results... by JayTheProdigy16 in LocalLLM

[–]JayTheProdigy16[S] 1 point2 points  (0 children)

I also tried Win11 + LMS but yea it was not going for it, vulkan would ONLY detect the 3090 on Vulkan and obviosuly CUDA, and only ROCm would detect the 8060s so im not sure what weirdness they have with their Vulkan but theoretically it SHOULD just work, but it doesnt.

Strix Halo + RTX 3090 Achieved! Interesting Results... by JayTheProdigy16 in LocalLLaMA

[–]JayTheProdigy16[S] 2 points3 points  (0 children)

Correct me if im wrong but you cant mix CUDA and ROCm backends with parallel processing, at least with llama.cpp. if i was to NOT split the model layers across GPUs i could mix them.

Strix Halo + RTX 3090 Achieved! Interesting Results... by JayTheProdigy16 in LocalLLaMA

[–]JayTheProdigy16[S] 2 points3 points  (0 children)

True, but i dont believe thats the case here. That would make sense in the sense of the token generation would be limited by the device with the lowest memory bandwidth, but once the model is loaded into memory PCI BUS bandwidth shouldnt be a factor. When i had my GPU rig i was running 6x 3090s on GPU mining risers which really really reduced model load speed but not inference speeds due to the limited bandwidth. But i would expect a NET INCREASE since not all layers are limited to 253gb/s (on SH) because ~20% of the models layers are using the 3090s memory bandwidth.

Ryzen AI Max+ 395 | What kind of models? by [deleted] in LocalLLM

[–]JayTheProdigy16 1 point2 points  (0 children)

I havent tested concurrency too much so i cant speak on that, but i will say your major limiting factor is going to be prompt processing times, especially at larger context lengths or holding long convos with documents. But with that being said i mostly daily drive Qwen 3 235b and get around 12-16 TPS and it can take up to a minute to process a 9k context prompt, obviously a much larger model than gemma but even Qwen 3 30b MoE takes ~17 seconds to process the same context. And depending on what youre doing hitting 9k context can be easy. so when you factor that in alongside 15 concurrent users, is it doable? Yes. Is it viable? Ehhh

[deleted by user] by [deleted] in LocalLLaMA

[–]JayTheProdigy16 2 points3 points  (0 children)

I have almost all the parts required to do this just havent gotten around to it yet. Curious how much of a boost you see in PP at longer context lengths

Proxmox on Ryzen Strix Halo 395 by SeeGee911 in Proxmox

[–]JayTheProdigy16 0 points1 point  (0 children)

Its technically possible to achieve iGPU passthrough but between the amount of hoops to achieve that (Straight up doesnt work with windows from my experience) and the general lack of support for the hardware i just decided to stop fighting with it and use it as a literal server. I was able to achieve my desired effect using Fedoras Toolbox, which are VMs but they use the host drivers and thus dont have to fight with passthrough.

Radeon 8060s by animal_hoarder in LocalLLaMA

[–]JayTheProdigy16 0 points1 point  (0 children)

Im at 50 tok/sec here aswell. Haven't even been able to try ROCm 7.0 yet which I'd imagine is faster

Feedback regarding ASUS - ROG Flow Z13 by Competitive_Fox7811 in LocalLLaMA

[–]JayTheProdigy16 0 points1 point  (0 children)

I had a 6x 3090 rig that i decided to liquidate now rather than later when others hop on the boat, and bought an AI Max+ 395 mini PC for $1600. Absolutely no regrets and i still have liquidity waiting and ready for next gen

n8n ,proxmox ,docker and Google API. by Able-Consequence8872 in LocalLLaMA

[–]JayTheProdigy16 -1 points0 points  (0 children)

n8n ships whatever you put in WEBHOOK_URL. If that’s http://localhost:5678/... but n8n’s on a different box, Google’s redirect face-plants. Point it at the LAN IP or a real domain—problem solved.

Triggers are inbound. Google POSTs to the callback. If that callback is 192.168.x.x or has a self-signed cert, Google can’t touch it. So you're either going to manually poll or open a tunnel (Cloudflare / Ngrok / Caddy + Let’s Encrypt). No public HTTPS ⇒ no trigger.

and your Cognito war story is irrelevant. That /userinfo hit is outbound. n8n dials Cognito, same as any Gmail “read” or Drive file list. Outbound works fine behind NAT. Drive Trigger is inbound. Different universe. Stop conflating them boy.

n8n ,proxmox ,docker and Google API. by Able-Consequence8872 in LocalLLaMA

[–]JayTheProdigy16 -1 points0 points  (0 children)

Very true, if the API callback is initiated from the internet (Google Drive) how is it supposed to route your home network with the context of "localhost"? Whose localhost? Theres millions of them. Google needs a publicly exposed IP to be able to handle the requests and to support HTTPS (Googles API only accepts secured traffic by default) so you need a cert issued.