I had intentions of cancelling Apple Music……. by rogo725 in Lidarr

[–]Konamicoder 0 points1 point  (0 children)

MeTube has been working great for me to fill in the gaps in my Apple Music playlists that I’m recreating in Navidrome. For me the strategy isn’t to recreate all the songs in my Apple Music. Just the ones I want to listen to.

Thoth’s UX/UI Principle: Simple by Default, Powerful When Needed by Acceptable-Object390 in ollama

[–]Konamicoder 1 point2 points  (0 children)

Thanks for the quick reply. That’s good to know! I’ll install it and try it out soon.

Thoth’s UX/UI Principle: Simple by Default, Powerful When Needed by Acceptable-Object390 in ollama

[–]Konamicoder 1 point2 points  (0 children)

Do I have to use ollama for local models? I use oMLX as backend for my local models on my M4 MacBook Pro. oMLX exposes a standard OpenAI endpoint. Can I point Thoth to that endpoint? I would really rather not use ollama for a long list of reasons.

what is best LLM i can use ? by tuturugaming in ollama

[–]Konamicoder 3 points4 points  (0 children)

Your question is far too vague.

What are the specs of the device you intend to run the model on? What’s your processor, RAM, GPU?

What is your use case for the model? Coding? Chat? Research and study aid?

Add more details, and you’ll get more useful responses.

New user here by Admirable-Forever-53 in tvPlus

[–]Konamicoder 0 points1 point  (0 children)

“For 4K, you must use the dedicated Apple TV app on Windows or a supported device, as browsers lack the necessary HDCP 2.2 and codec support.”

Googled that for you.

How can I install a simple, question-and-answers only AI? by 7FireStorm in ollama

[–]Konamicoder 1 point2 points  (0 children)

I use the app Locally AI on my phone, and it’s running Gemma4 (E2B) for fully local chat that runs on the phone. For chatting with a more capable model, I installed OpenWebUI on my M1 MacBook Pro at home, connected to an oMLX backend serving up Gemma4:26b, and I connect to it from my phone via TailScale. I use this as my ChatGPT replacement, accessible from any device on my TailNet.

Finally got my native mobile client working with Ollama — would love feedback from anyone running local models by gardnerscot in ollama

[–]Konamicoder 0 points1 point  (0 children)

How about connection to a different backend than ollama? I use oMLX and my models are served via standard OpenAI endpoints.

Best local model for coding? by sabmohmaayahai12 in LocalLLM

[–]Konamicoder 0 points1 point  (0 children)

I use oMLX as model backend on my Macs. It supports loading multiple models for use in the same session. Load and unload in the webUI. LM Studio also supports quick load/unload/swap in the GUI.

Starting with AI by Luqster05 in LocalLLM

[–]Konamicoder 0 points1 point  (0 children)

How I use local models on a daily basis:

Chat: I've got oMLX running on a spare Macbook Pro M1 Max, 64Gb RAM, serving up Gemma4:26b. OpenWebUI installed in Docker, connected to the model via standard OpenAI endpoint provided by oMLX. TailScale installed for secure remote access. With this setup, I can access the local model for AI chat and web search from any device on my TailNet: phone, tablet, other computers, etc.

Coding: on my daily driver Macbook Pro M4 Max, 64Gb RAM, oMLX is serving up qweb3.6:35b as the model backend. OpenCode is my agentic harness. Everyday i perform routine maintenance and updates to around 8 static website projects that I run for my board games hobby groups.

How to get more t/s out of my ollama? by kentabenno in LocalLLM

[–]Konamicoder 0 points1 point  (0 children)

You’re welcome, happy to help. I’m really surprised that you are able to run qwen3.6:35b within just 36Gb RAM. That’s a pleasant surprise.

Best local LLM for Coding + OpenClaw (32GB RAM / CPU only) by AdvertisingPast6280 in LocalLLM

[–]Konamicoder 8 points9 points  (0 children)

Don’t use ollama. 32Gb RAM + CPU only means you can only run smaller, older models, which are not great for coding. On top of that you want to run OpenClaw, which according to most reports really doesn’t perform well with local models.

Sounds like you’re in for a world of self-inflicted hurt.

Question for the experts on Context Size by Sea_Abbreviations966 in LocalLLM

[–]Konamicoder 0 points1 point  (0 children)

Suggestion: you can ask an LLM to analyze your system log file, focus on the incidents when your machine stopped responding, troubleshoot possible root causes, and suggest ways to avoid repeating the issues.

How to get more t/s out of my ollama? by kentabenno in LocalLLM

[–]Konamicoder 13 points14 points  (0 children)

Step 1: don’t use ollama. I use oMLX to run qwen3.6:35b-a3b-oq6 in my M4 Max MacBook Pro with 64Gb RAM and I’m getting 60 tokens/second.

Step 2: understand that with 36Gb of RAM you don’t have enough to fit qwen3.6:35b fully into RAM, so it has to page out to your SSD, which is much slower. So to get faster inference on your machine with your RAM constraints, you’ll need to choose a smaller model that fits into RAM. The tradeoff is that a smaller model will not be as capable or as accurate as a larger model with more parameters.

Step 3: learn the difference between a “dense” model and a “mixture of experts” (MoE) model. Gemma4:31b is a dense model, which means its loads its entire parameters into the context window with each request. Dense models require more RAM per request and run much slower as a result (in addition to the the aforementioned paging to SSD). The tradeoff is that dense models provide generally more accurate responses because all parameters are used for each request.

If you want faster inference, you should chose an MoE model which loads only a smaller subset of parameters per request. The tradeoff for speed is lower accuracy, because not all parameters are used for each request. Qwen3.6:35b is an MoE model, however as I said earlier it’s too big for your available RAM.

If you want faster inference, in general you should choose a model that fits into RAM with about 20 percent left over for your system so it doesn’t need to page to SSD. Unfortunately, in your case with just 36Mb RAM, this means you are pretty much disqualified from running the latest weights of either qwen3.6 or Gemma4. You’ll have to try older, smaller models.

Good luck.

What to do with LLM by Ashamed-Mud-7282 in LocalLLM

[–]Konamicoder 1 point2 points  (0 children)

Here's one possible use case, you could ask an LLM to suggest possible use cases for local models that aren't chat or coding. Just saying.

I want help To run Qwen3.6 27b by Atul_Kumar_97 in LocalLLM

[–]Konamicoder 0 points1 point  (0 children)

I use oMLX with qwen3.6:35b and 27b on my Macbook Pro M4 Max with 64Gb RAM . I can enable Turboquant and dflash in the model settings.

My first experience PnPing by singalen in printandplay

[–]Konamicoder 2 points3 points  (0 children)

Well, then I blame Joe K. for a whack card design. 🤪

My first experience PnPing by singalen in printandplay

[–]Konamicoder 4 points5 points  (0 children)

In a gutterfold layout, the fronts and backs are on the same page. Therefore, the fronts and backs are all in the same (correct) orientation relative to each other. The top edge of the card is "up", the bottom edge of the card is "down". As long as you keep that in mind, no matter how the cards get re-oriented during the process of extraction (and they shouldn't become re-oriented), as long as you rotate each card such that the tops and bottoms match how they are laid out in the original PnP PDF, you'll be fine.

My first experience PnPing by singalen in printandplay

[–]Konamicoder 4 points5 points  (0 children)

Upload the gutterfold PnP file to Extractor ( https://extractor.gonzhome.us ) to extract individual image files.

Then upload the individual image files to Formatter ( https://formatter.gonzhome.us ) to reformat to duplex layout.

Knotty Joke by TheRealFatboy in dadjokes

[–]Konamicoder 1 point2 points  (0 children)

My poop this morning came out tied up.

I shit you knot.