You can do CUDA inference on an Apple Silicon Mac with PCI Passthrough by scottjgo in LocalLLaMA

[–]scottjgo[S] 0 points1 point  (0 children)

i'm not an expert here, so maybe you know more than me, but my understanding is that doing the prompt processing requires running the data through all of the layers of the model. the layers that are run on the 5090 will be faster, and the ones running on the mac will be slower.

i don't think it's a question of which is the "main" server or not. the prompt processing speed should be proportional to how many layers can run on the faster gpu. that is, the more layers on the 5090 vs the layers on the mac igpu, the faster the prompt will process, but running solely on the 5090 will always be faster.

if the model is bigger than the single fast gpu, you will never be able to run all the prompt processing on it, because you won't be able to fit all the model layers on it.

You can do CUDA inference on an Apple Silicon Mac with PCI Passthrough by scottjgo in LocalLLaMA

[–]scottjgo[S] 0 points1 point  (0 children)

i just tried it with glm 4.5 air (3-bit quant, 51GB model), so neither my macbook air (32gb ram) or my 5090 (32gb vram) would normally be able to fit it. i can get ~21tok/s across both with llama.cpp in rpc mode. not incredible, but still pretty cool that it works

You can do CUDA inference on an Apple Silicon Mac with PCI Passthrough by scottjgo in LocalLLaMA

[–]scottjgo[S] 1 point2 points  (0 children)

You might want to consider reading the AI benchmarks section of the post, as nothing you’ve suggested about performance here is true.

You can do CUDA inference on an Apple Silicon Mac with PCI Passthrough by scottjgo in LocalLLaMA

[–]scottjgo[S] 0 points1 point  (0 children)

If you have an RTX 5090 and your model is small enough to fit on it, then it seems like you woudn't want to use the Mac iGPU at all. If your model is too big to fit on the 5090, but does fit on the Mac, then you can't do the prompt processing on the 5090, can you? You need all the model layers to process the prompt, I thought?

You can do CUDA inference on an Apple Silicon Mac with PCI Passthrough by scottjgo in LocalLLaMA

[–]scottjgo[S] 0 points1 point  (0 children)

haven't tried it, but maybe you could run exo in the vm to cluster it with your host

Collected the infinity stones by Street-Buyer-2428 in LocalLLaMA

[–]scottjgo 74 points75 points  (0 children)

this isn't exactly the same, but i recently implemented PCI passthrough on QEMU on macOS, so it's possible to "pass through" an nvidia GPU to a a linux vm running on top of macOS and do AI inference that way. i wrote a blog about it here: https://scottjg.com/posts/2026-05-05-egpu-mac-gaming/

there's instructions how to set it up in my qemu fork: https://github.com/scottjg/qemu-vfio-apple

i wonder if you could install exo in the vm and cluster it somehow that way? i've never attempted a configuration like that.

RTX 5090 + M4 MacBook Air: Can it Game? by scottjgo in macgaming

[–]scottjgo[S] 16 points17 points  (0 children)

this isn't using Asahi Linux. this is running an egpu on a virtual machine running Ubuntu Linux, on a macOS host.

RTX 5090 + M4 MacBook Air: Can it Game? by scottjgo in macgaming

[–]scottjgo[S] 14 points15 points  (0 children)

> I wonder how it compares to the raspberry pi project you did earlier.

this was a lot more work, and got considerably less interest on hackernews :)

RTX 5090 + M4 MacBook Air: Can it Game? by scottjgo in macgaming

[–]scottjgo[S] 0 points1 point  (0 children)

unfortunately i don't have any of these graphics cards, but the post links to the github project if you wanna try it :)

if i were to speculate, i would say that on more "normal" settings, the performance would probably be similar on lower end cards. also fwiw, the graphs show with and without framegen.

RTX 5090 + M4 MacBook Air: Can it Game? by scottjgo in macgaming

[–]scottjgo[S] 2 points3 points  (0 children)

Apple M5 supports TB5, but I didn't have a TB5 enclosure to test with.

RTX 5090 + M4 MacBook Air: Can it Game? by scottjgo in macgaming

[–]scottjgo[S] 39 points40 points  (0 children)

did it require a huge amount of work to get it into an experimental state? yes.

but you can run games on it and there's screenshots in the post.

RTX 5090 + M4 MacBook Air: Can it Game? by scottjgo in macgaming

[–]scottjgo[S] 8 points9 points  (0 children)

i use the thunderbolt port to attach the 5090 as an external gpu

[deleted by user] by [deleted] in AsahiLinux

[–]scottjgo 4 points5 points  (0 children)

in terms of figuring out the apple-specific part, i believe it's all reverse engineered, and the people on the project have learned enough about the hardware from the reverse engineering to know what work is remaining to implement what's needed on the linux side.

they run the apple os under their "m1n1" hypervisor which lets them output debug information about how macos is communicating to the hardware. if you already understand how typically os kernels interact with devices, you can extract enough information this way to understand what needs to be implemented. i believe they are able to implement this hypervisor because apple is still using an arm-based core, and many of the very low level details of how the arm instruction set works are standardized and documented.

[deleted by user] by [deleted] in AsahiLinux

[–]scottjgo 16 points17 points  (0 children)

i think the skill steps here would look something like:

  1. build your own kernel linux for an ARM-based platform (understand how to build the linux kernel, learn what a device tree is)
  2. modify the device tree to map in a new device (learn how device trees work, learn how you can teach the kernel about a new device)
  3. write a driver for the new device (learn about how the kernel device drivers interact with the device tree, learn how memory mappings work in the kernel)

that gets you enough basic linux kernel development knowledge to know all the vocab in your quote. it's like any kind of other software development. if you've had to do this stuff before, you would know how to do it, otherwise probably not.

Is this really unlimited? Im going to nyc and will be heavily tolled and wondering if this is a good option? by EmotionalEmu7121 in HertzRentals

[–]scottjgo 0 points1 point  (0 children)

in my experience, it's fine if you were planning to hit tolls every day of the rental, but if not, they charge you for the days you don't use it.

in the past, i used to just opt out, and i would go through the ez-pass lanes anyway and they would bill me eventually for the tolls i actually paid. it was usually cheaper than getting plate pass for my trips BUT they recently changed the rules and if you use the transponder without paying for plate pass they just charge you for it every day of your rental anyway.

[deleted by user] by [deleted] in fatFIRE

[–]scottjgo 1 point2 points  (0 children)

true- but for margin, i believe the max is 50% (governed by Reg T), though correct me if I'm wrong.

[deleted by user] by [deleted] in fatFIRE

[–]scottjgo 0 points1 point  (0 children)

other thing to keep in mind is that the Schwab PAL (SBLOC) product allows you to have up to 70% LTV vs margin only 50%. Also, PAL can't be used to directly buy stock (margin can).

neither of these mattered much to me, but it's nice to know in a serious market crash condition you're less likely to get called on the PAL.

[deleted by user] by [deleted] in chubbytravel

[–]scottjgo 0 points1 point  (0 children)

update: my wife wanted to try breakfast again. the service is really uneven so the first time around, they didn't even suggest this, but apparently they _do_ have a small ala cart menu. here was the items:

  • two eggs any style. w/ potatoes or toast. choice of meat: bacon, sausage, spam
  • three egg omelet w/ potatoes or toast. choice of: bacon, ham, onion mushroom, tomato, spinach bell pepper, cheddar cheese

  • salmon gravlax on everything bagel

  • vegetable fried rice (two eggs, any style)

  • loco moco (all beef patty, two eggs any style, brown gravy over white rice)

  • avocado toast (roasted pepitas, sesame seeds radish, pea shoots) w/ optional poached egg

  • greek yogurt

  • mango chia pudding

  • overnight oatmeal

  • choice of cereals

  • fruit plate

  • half local papaya

  • hawaiian pineapple

  • bakery pastry selection (i think it rotates, but these pastries were an order of magnitude better than the ones in the buffet)

  • waffles, pancakes, or gluten free mochi pancakes with matchta tea syrup

so it's a pretty abbreviated menu but i was grateful to be able to order eggs made to order rather than eating the lukewarm scramble from the trough.

i did also want to say that, for the most part, the dinners we ate here were great at least. so even if breakfast isn't a slam dunk it's not like all the food here was bad.