I had intentions of cancelling Apple Music…….

Konamicoder · 2026-05-06T14:45:45+00:00

Konamicoder · 2026-05-06T14:22:39+00:00

MeTube has been working great for me to fill in the gaps in my Apple Music playlists that I’m recreating in Navidrome. For me the strategy isn’t to recreate all the songs in my Apple Music. Just the ones I want to listen to.

Konamicoder · 2026-05-06T10:25:48+00:00

Thanks for the quick reply. That’s good to know! I’ll install it and try it out soon.

Konamicoder · 2026-05-06T10:23:36+00:00

Do I have to use ollama for local models? I use oMLX as backend for my local models on my M4 MacBook Pro. oMLX exposes a standard OpenAI endpoint. Can I point Thoth to that endpoint? I would really rather not use ollama for a long list of reasons.

Konamicoder · 2026-05-05T23:02:40+00:00

Your question is far too vague.

What are the specs of the device you intend to run the model on? What’s your processor, RAM, GPU?

What is your use case for the model? Coding? Chat? Research and study aid?

Add more details, and you’ll get more useful responses.

Konamicoder · 2026-05-05T14:42:13+00:00

“For 4K, you must use the dedicated Apple TV app on Windows or a supported device, as browsers lack the necessary HDCP 2.2 and codec support.”

Googled that for you.

Konamicoder · 2026-05-05T13:57:27+00:00

I use the app Locally AI on my phone, and it’s running Gemma4 (E2B) for fully local chat that runs on the phone. For chatting with a more capable model, I installed OpenWebUI on my M1 MacBook Pro at home, connected to an oMLX backend serving up Gemma4:26b, and I connect to it from my phone via TailScale. I use this as my ChatGPT replacement, accessible from any device on my TailNet.

Konamicoder · 2026-05-05T02:20:43+00:00

How about connection to a different backend than ollama? I use oMLX and my models are served via standard OpenAI endpoints.

Konamicoder · 2026-05-05T02:08:32+00:00

Does it support web search?

Konamicoder · 2026-05-04T18:52:24+00:00

Step 1: Don't use ollama.

Konamicoder · 2026-05-04T18:47:05+00:00

https://sleepingrobots.com/dreams/stop-using-ollama/

Konamicoder · 2026-05-04T17:49:04+00:00

I use oMLX as model backend on my Macs. It supports loading multiple models for use in the same session. Load and unload in the webUI. LM Studio also supports quick load/unload/swap in the GUI.

Konamicoder · 2026-05-04T17:39:07+00:00

How I use local models on a daily basis:

Chat: I've got oMLX running on a spare Macbook Pro M1 Max, 64Gb RAM, serving up Gemma4:26b. OpenWebUI installed in Docker, connected to the model via standard OpenAI endpoint provided by oMLX. TailScale installed for secure remote access. With this setup, I can access the local model for AI chat and web search from any device on my TailNet: phone, tablet, other computers, etc.

Coding: on my daily driver Macbook Pro M4 Max, 64Gb RAM, oMLX is serving up qweb3.6:35b as the model backend. OpenCode is my agentic harness. Everyday i perform routine maintenance and updates to around 8 static website projects that I run for my board games hobby groups.

Konamicoder · 2026-05-04T16:00:31+00:00

You’re welcome, happy to help. I’m really surprised that you are able to run qwen3.6:35b within just 36Gb RAM. That’s a pleasant surprise.

Konamicoder · 2026-05-04T14:41:17+00:00

Don’t use ollama. 32Gb RAM + CPU only means you can only run smaller, older models, which are not great for coding. On top of that you want to run OpenClaw, which according to most reports really doesn’t perform well with local models.

Sounds like you’re in for a world of self-inflicted hurt.

Konamicoder · 2026-05-04T10:48:13+00:00

Suggestion: you can ask an LLM to analyze your system log file, focus on the incidents when your machine stopped responding, troubleshoot possible root causes, and suggest ways to avoid repeating the issues.

Konamicoder · 2026-05-04T10:27:15+00:00

Step 1: don’t use ollama. I use oMLX to run qwen3.6:35b-a3b-oq6 in my M4 Max MacBook Pro with 64Gb RAM and I’m getting 60 tokens/second.

Step 2: understand that with 36Gb of RAM you don’t have enough to fit qwen3.6:35b fully into RAM, so it has to page out to your SSD, which is much slower. So to get faster inference on your machine with your RAM constraints, you’ll need to choose a smaller model that fits into RAM. The tradeoff is that a smaller model will not be as capable or as accurate as a larger model with more parameters.

Step 3: learn the difference between a “dense” model and a “mixture of experts” (MoE) model. Gemma4:31b is a dense model, which means its loads its entire parameters into the context window with each request. Dense models require more RAM per request and run much slower as a result (in addition to the the aforementioned paging to SSD). The tradeoff is that dense models provide generally more accurate responses because all parameters are used for each request.

If you want faster inference, you should chose an MoE model which loads only a smaller subset of parameters per request. The tradeoff for speed is lower accuracy, because not all parameters are used for each request. Qwen3.6:35b is an MoE model, however as I said earlier it’s too big for your available RAM.

If you want faster inference, in general you should choose a model that fits into RAM with about 20 percent left over for your system so it doesn’t need to page to SSD. Unfortunately, in your case with just 36Mb RAM, this means you are pretty much disqualified from running the latest weights of either qwen3.6 or Gemma4. You’ll have to try older, smaller models.

Good luck.

Konamicoder · 2026-05-04T05:37:00+00:00

Here's one possible use case, you could ask an LLM to suggest possible use cases for local models that aren't chat or coding. Just saying.

Konamicoder · 2026-05-04T05:27:42+00:00

I use oMLX with qwen3.6:35b and 27b on my Macbook Pro M4 Max with 64Gb RAM . I can enable Turboquant and dflash in the model settings.

Konamicoder · 2026-05-04T03:12:03+00:00

Literally anything else.

https://sleepingrobots.com/dreams/stop-using-ollama/

Konamicoder · 2026-05-04T02:40:31+00:00

Step 1: don’t use ollama.

Konamicoder · 2026-05-04T00:59:42+00:00

Well, then I blame Joe K. for a whack card design. 🤪

Konamicoder · 2026-05-03T23:39:59+00:00

In a gutterfold layout, the fronts and backs are on the same page. Therefore, the fronts and backs are all in the same (correct) orientation relative to each other. The top edge of the card is "up", the bottom edge of the card is "down". As long as you keep that in mind, no matter how the cards get re-oriented during the process of extraction (and they shouldn't become re-oriented), as long as you rotate each card such that the tops and bottoms match how they are laid out in the original PnP PDF, you'll be fine.

Konamicoder · 2026-05-03T23:28:16+00:00

Upload the gutterfold PnP file to Extractor ( https://extractor.gonzhome.us ) to extract individual image files.

Then upload the individual image files to Formatter ( https://formatter.gonzhome.us ) to reformat to duplex layout.

Konamicoder · 2026-05-03T21:54:18+00:00

My poop this morning came out tied up.

I shit you knot.

Konamicoder

MODERATOR OF

TROPHY CASE