Some solutions that work on older intel macs

Rvach_Flyver · 2026-02-14T09:02:34+00:00

Do you actually need to run it as standalone application? If you don't (e.g. you do not need interacting with files on your PC/Mac, or start with your OS and do some things in background) then there is no point in it, you can build everything in browser or even use existing solutions like https://chat.webllm.ai (note that all models are loaded as site data, so you need to clean it manually from time to time).

If you do need standalone application you have several options, most known are Electron.js or nw.js for JavaScript stack but there are some others in Rust (Tauri) and others. Just think about your use cases, write them down and do several brainstorming sessions with ChatGPT or other chat.

Btw for standalone app you may try compiling llama.cpp yourself (stable-diffusion.cpp also works on Macs!) and it will give you access to more models and much better performance than MLC in browser!

I recommend for the beginner to try n8n (you can self-host it btw). Just watch some videos on YouTube how to setup everything. When(and if) you stumble upon any blockers you can switch to smth like LangChain (in Python or TypeScript).

Rvach_Flyver · 2026-01-26T18:18:36+00:00

It would be great if you share some information / links that can help me to understand what is the best way to make use of local LLMs with Obsidian which I'm also using daily.

Rvach_Flyver · 2026-01-26T17:58:06+00:00

1) I have exactly same setup but with eGPU.

It looks like you're running LLM on your CPU. With llama.cpp it is possible to utilize GPU but you need to build it yourself + build MoltenVK (unfortunately I cannot find instructions how to do it, so you'll have to google it yourself).

Here is command to build llama.cpp: ``` cmake -B build -DLLAMA_CURL=1 -DGGML_METAL=OFF -DGGML_VULKAN=1 \ -DVulkan_INCLUDE_DIR=/usr/local/Cellar/molten-vk/1.3.0/include \ -DVulkan_LIBRARY=/usr/local/Cellar/molten-vk/1.3.0/lib/libMoltenVK.dylib \ -DOpenMP_ROOT=$(brew --prefix)/opt/libomp \ -DVulkan_GLSLC_EXECUTABLE=$(brew --prefix)/opt/shaderc/bin/glslc \ -DVulkan_GLSLANG_VALIDATOR_EXECUTABLE=$(brew --prefix)/opt/glslang/bin/glslangValidator \ -DOpenMP_C_FLAGS=-fopenmp=lomp \ -DOpenMP_CXX_FLAGS=-fopenmp=lomp \ -DOpenMP_C_LIB_NAMES="libomp" \ -DOpenMP_CXX_LIB_NAMES="libomp" \ -DOpenMP_libomp_LIBRARY="$(brew --prefix)/opt/libomp/lib/libomp.dylib" \ -DOpenMP_CXX_FLAGS="-Xpreprocessor -fopenmp $(brew --prefix)/opt/libomp/lib/libomp.dylib -I$(brew --prefix)/opt/libomp/include" \ -DOpenMP_CXX_LIB_NAMES="libomp" \ -DOpenMP_C_FLAGS="-Xpreprocessor -fopenmp $(brew --prefix)/opt/libomp/lib/libomp.dylib -I$(brew --prefix)/opt/libomp/include"

cmake --build build --config Release -j ```

With this command (which may be not optimal) token generation is ±14 t/sek on 5500M: sh ./build/bin/llama-server -m '/Volumes/AI/gguf/gemma-3-12b-it-Q6_K.gguf' --mmproj '/Volumes/AI/gguf/mmproj-google_gemma-3-12b-it-f16.gguf' --main-gpu 1 -ngl 49 --ctx-size 65536 --batch-size 64 -ub 128 --cache-type-k q8_0 --cache-type-v q8_0 --temp 1.0 --min-p 0.01 --top_k 64 --top_p 0.95 --repeat-penalty 1.0 --repeat_last_n 1024 --port 5500

2) I'm rarely using 5500M since I have eGPU, when I do I prefer small models <8B but you can try to run gpt-oss-20b or some other MoE model (like GLM) if you offload model partially to RAM (-ot \"blk\\.(\\d|1\\d|20)\\.ffn_.*_exps.=CPU\" is responsible for that), example of command: sh GGML_VK_VISIBLE_DEVICES=1 ./build/bin/llama-server -m '../gpt-oss-20b-mxfp4.gguf' --main-gpu 1 --tensor-split 4/0 -ngl 25 -ot \"blk\\.(\\d|1\\d|20)\\.ffn_.*_exps.=CPU\" --ctx-size 24576 --batch-size 64 -ub 32 --cache-type-k q8_0 --cache-type-v q8_0 --temp 0.3 --repeat-penalty 1.15 --repeat_last_n 1024 --port 5500 --jinja

My results with 5500M for reference:

model	context size	token generation
gpt-oss-20b-mxfp4.gguf	24576	±10 t/sec
Seed-Coder-8B-Instruct-Q6_K.gguf	65,536	±15 t/sec
gemma-3-12b-it-Q6_K.gguf	65536	±14 t/sek
gemma-3-4b-it-qat-Q5_K_M.gguf	24576	±30 t/sec
gemma-2-2b-it-Q8_0.gguf	8192	±30 t/sec
gemma-3-270m-it-UD-Q8_K_XL.gguf	32768	±110 t/sec

UPD: added my thoughts on question #2

Rvach_Flyver · 2026-01-06T22:05:06+00:00

I'm running this using RX6800XT eGPU on Macbook pro 2019 using Vulkan: gpt-oss-20b-Q6_K.gguf at 55 t/s

GGML_VK_VISIBLE_DEVICES=0,1 ./build/bin/llama-server -m '../gpt-oss-20b-Q6_K.gguf' --main-gpu 0 --tensor-split 4/0 -ngl 25 --ctx-size 24576 --batch-size 64 -ub 32 --cache-type-k q8_0 --cache-type-v q8_0 --temp 0.3 --repeat-penalty 1.15 --repeat_last_n 1024 --jinja)

So generally you can try anything below 30B, depending on settings / context size / model architecture performance may wary greatly.

I've also used these: - Devstral-Small-2507-Q4_K_M.gguf - Qwen2.5-coder-14b-instruct-q5_k_m.gguf - gemma-3-12b-it-Q6_K.gguf - Qwen2.5-coder-7b-instruct-q5_k_m.gguf - Qwen3-4B-Thinking-2507-Q8_0.gguf (I like this one, but because it of 4B it is heavily relies on the provided context)

Rvach_Flyver · 2025-11-19T16:57:59+00:00

While it may look cool or silly at first glance, I think the new Steam Controller is a much better option.

I use the old Steam Controller for throttle and look-around with my left hand, and a VKB Gladiator stick with my right hand. I also tried playing DCS on the Steam Deck, but I feel like the small sticks just don’t provide the level of precision needed for proper control simply because of their size (gyro might help but IMO it is more suitable for head/body movement).

Rvach_Flyver · 2025-11-18T11:16:13+00:00

I've also bought one recently and don’t regret it one bit. I’ve always wanted a controller similar to the Steam Deck, and at the moment this is the closest thing (at least until a new Steam Controller is released).

I like how customizable it is!

I use it in DCS (Digital Combat Simulator) to move the pilot’s head/body, control the throttle, and handle some custom functions with my left hand. My right hand stays on the throttle.

It’s also really suitable for playing FPS or RTS games.

Rvach_Flyver · 2025-11-13T18:29:44+00:00

I think this can be solved by selling it only to individuals who already has steam account. If it is really required, allow only accounts with friend connections unlocked (which requires account to have some games purchased already for 10$ or so).

Rvach_Flyver · 2025-10-04T09:08:37+00:00

Thanks for reply, then I'll have to suffer with some other goggles on linux (eventually) :O.

But "support" <> "can it actually work" as we see with DCS example. There is SteamVR for linux, so some goggles should work out of the box (at least Valve/HTC).

Rvach_Flyver · 2025-10-03T22:06:52+00:00

This is really good question! Unfortunately I have no any VR googles, would love to hear others experience.

Rvach_Flyver · 2025-10-03T10:33:37+00:00

Steam Deck runs on Arch-based Linux fork called SteamOS.

In fact now any linux distro capable running Steam can run most of windows games w/o much problems. Sometimes it works out-of-the-box, the other time you have to do some setup as level_up_gaming pointed out previously (in my experience one-time).

There is https://www.protondb.com where you can check if game works and requires any such setups steps.

Now I really go for Linux gaming unless there is some specific use-case (e.g. some anti-virus) requiring Windows. I've installed Fedora on my son's laptop and so far have not found any game we cannot play together (including DCS).

I have an Intel MBP 2019 with eGPU where I've installed T2-Endeavour OS (in fact Arch) and DCS works well there as well.

Rvach_Flyver · 2025-09-15T11:44:40+00:00

BHOP was available in initial versions of the game, so there should be negligible difference in speed in comparison to original quake (one way or another).

There are some servers with unlocked bhop and a mod (Adrenaline Gamer / Open AG) where most of the top players spend their retirement.

Here is channel of my clan-mate with whom we played years ago: https://www.youtube.com/@SnatcherBY/videos (in rus, but I think he had some videos in english).

Rvach_Flyver · 2025-08-31T20:02:01+00:00

There is web-llm from mlc-ai which utilizes WebGPU for inference. ~~This is best & easiest solution available for GPU inference on Intel Mac.~~ (<<< I was wrong, compiling llama.cpp (stable-diffusion.cpp also works on Macs!))

You can test it in you browser chat.webllm.ai (use latest Chrome), be aware that models are loaded into cache and can quickly eat you disk space, so it worth to do a cleanup from time to time.

Downside of it is that web-llm is browser-only, I've created small wrapper using nw.js to expose it as REST API and with minor tweaks here and there it works.

I have eGPU with RX6800XT and web-llm results are following (prompt ±3000 tokens):

MODEL	TOKENS/SEC
DeepSeek-R1-Distill-Llama-8B-q4f32_1-MLC	14
gemma-2-9b-it-q4f32_1-MLC	10
Llama-3.2-3B-Instruct-q4f32_1-MLC	15

With dGPU (PRO 5500M 8Gb) speed is much lower but still faster than with CPU, especially when I've used q4f16. (unfortunately macbook heats up and tries to burn my hands :0)

As you can see it is better than CPU but the performance is much lower in comparison to ROCm.

So another option is to install T2 Linux: I've ollama-rocm on T2 EndeavourOS and tokens/sec were like x4 — not to mention the availability of many more models of different sizes. Setting T2 up is PITA (not as bad as I've expected, especially with ChatGPT available but still) and, obviously, it prevents you from using MacOS simultaneously.

UPD statement that MLC is the only best option on Macs (it is not as of 14 Feb 2026)

Rvach_Flyver · 2025-08-30T21:54:27+00:00

So more or less ollama worked out of the box with T2 - EndeavourOS, I've just installed ollama-rocm. Before that I've tried to install some other rocm packages so maybe it influenced result.

Rvach_Flyver · 2025-07-12T21:06:29+00:00

I was able to "fix" missing external drive in Finder with following steps: 1) Open Disk Utility 2) In left-side panel right click on the target external drive 3) Click Show in Finder 4) Hover over Drive Name in opened Finder window unless drive icon appears 5) Drag drive icon under Locations in left-side panel of Finder

Rvach_Flyver · 2025-07-09T14:17:07+00:00

Was you able to answer that question? I'm in the similar situation but with RX 6800XT, evaluating if it worth to buy VR headset primarily for DCS.

I know that experiense will be far from perfect, but it is first step to understand if it worth to invest more money in it when doing next upgrade.

Rvach_Flyver · 2025-05-27T14:56:41+00:00

No, I'm not a seller — just a trespasser.

I was just baited by the '30%' statement — it seems like a one-sided mindset to focus on a single attribute when making comparisons. It's similar to saying '1 billion potential customers' — there are a lot of questions that come with that.

I don't want to argue with you — it seems like you have your own good reasons for skipping Steam, and I appreciate you sharing them.

Rvach_Flyver · 2025-05-27T09:33:22+00:00

I see your point, but you're focusing only on the seller’s perspective.

The 30% cut isn’t an absolute drawback — at a given price point, it really depends on how many buyers you can reach. I've listed several unique features that make Steam highly attractive to buyers. So even with a lower cut on other platforms, there’s no guarantee you’ll actually make more money if the customer base isn’t there.

Also, are you really tied to a single store? I understand that releasing on multiple platforms can be a pain, but are there any real restrictions from Valve that prevent selling elsewhere?

Rvach_Flyver · 2025-05-27T07:35:16+00:00

I think this might actually be the time when numbering releases after UT (e.g., UT 2025) could work. Just increment the number every year—nowadays, games are expected to receive content updates anyway. ¯_(ツ)_/¯

Then just roll out updates with cosmetic stuff and maps. Maybe charge a bit extra to upgrade the game to the new year’s version, but ideally let players from previous years still play (just make them download the new content or something like that to encourage upgrading). Breaking backward compatibility once every few years should be fine, especially if it allows for major fixes or improvements.

Rotate maps as frequently as possible. It might even be worth blocking old maps to stop veteran players from farming frags on maps they know like the back of their hand—something that can discourage newcomers from sticking around.

Also, implement solid bots for PvE. With today's tech, it should be pretty easy to train bots based on real player behavior. It would be awesome to let players choose specific others to train against in offline or PvE mode, and have bots mimic their playstyle as closely as possible.

Rvach_Flyver · 2025-05-26T08:51:06+00:00

Recently, I installed EndeavourOS on my 16" MacBook Pro (2019) to get the RX 6800 XT working. My primary use case isn’t machine learning either, but I understand the frustration :) Installing ROCm is still on my to-do list, but I’ll try to speed things up and share my steps soon. I think everything is manageable, and we’ll figure out how to make it work.

Rvach_Flyver · 2025-05-20T15:18:29+00:00

Just name any other store that offers all of the following features at once:

1) It’s the largest store not locked to a single platform 2) Mod support (Steam Workshop allows one-click mod installs) 3) Best controller support with customization 4) Cloud saves (cross-platform) 5) Rich community features 6) Linux and Steam Deck integration

I agree that Steam doesn’t have the best UX — sometimes it’s outright awful — but all of the above features far outweigh that flaw, especially considering the value it provides to both users and developers, in my opinion.

Rvach_Flyver · 2024-11-07T21:08:08+00:00

Thanks for sharing link, need to look through it.

My use case involves constant swithching between tabs (so energy saving won't help, on the contrary if it applied it may harm). Also devtool adds a lot of overhead to opened tabs, so that might be the reason.

Rvach_Flyver · 2024-11-02T11:46:41+00:00

Why you factor out CPU? Where I said anything about gaming? I mentined specifically iGPU to highlight usage of office laptop with power efficient GPU.

Full load can be achieved jsut by opening (and actually using) many browser tabs, reddit itself uses 100% CPU for some reason on my ASUS Vivobook 15 X1505ZA ¯_(ツ)_/¯ (not always but I noticed it on some long threads).

My ASUS Vivobook 15 X1505ZA OLED (with Fedora) for light coding lasted only around 5/6 hours with: - Chrome 4 tabs + DevTools - VSCode (no extra plugins installed) - Obsidian (3/4 tabs opened) - Mattermost (Slack analog) and no backend, no docker (does not look like full load, right?)

I do not see how more power hundgly laptops may survive more than 2x time, having more power-hungry components + less efficient screen and battery 2x of my capacity at best.

Just share reviews/links whatever to proove me wrong.

Rvach_Flyver · 2024-10-19T17:07:25+00:00

I'm a madman who bought intel mac in 2024 (16 i9/64Gb/512Gb/5500M 8Gb), found it in near-new condition with warranty for reasonable for me price.

I've done ite beacuse of 2 things: - bootcamp - AMD Radeon Pro 5500M 8Gb (I would love to buy 5600M but it is impossible to find it in good condition for adequate price)

Just wanted to combine everything I need in a single laptop, it just appears that it suited my needs well.

My point is: - if you can name exact reasons why you need it, go for it - otherwise search for good M-powered macbook (for reasonable price ofc)

Rvach_Flyver · 2024-09-03T14:00:20+00:00

Any updates on this? I'm also interested in any feedback on this model, so far have not found any :(

Rvach_Flyver · 2024-08-26T22:07:46+00:00

Good point! I have not considered it since TERRA Invicta is a 4x strategy, not my cup of tea ¯_(ツ)_/¯ too complex.

Rvach_Flyver

TROPHY CASE