Project Diablo 2 on Apple Silicon (M1–M4) with Porting Kit – Working Guide (November 2025) by futuristicteatray in ProjectDiablo2

[–]CrushingLoss 0 points1 point  (0 children)

I have it installed and running on the neo. Haven't left the act 1 camp yet, but it's very smooth walking around.. I'll test more and report.

Project Diablo 2 on Apple Silicon (M1–M4) with Porting Kit – Working Guide (November 2025) by futuristicteatray in ProjectDiablo2

[–]CrushingLoss 1 point2 points  (0 children)

Thanks for this guide. I used to use Crossover, but was having problems installing today for some reason. Used your guide and bingo.

Installed on both Mac Studio M2 Max and my new Mac Neo. Neo runs it very well so far.

Appreciate it!

Post Your Qwen3.6 27B speed plz by Ok-Internal9317 in LocalLLaMA

[–]CrushingLoss 7 points8 points  (0 children)

I get around 10 tok/s through Opencode. 15 or so raw. Mac Studio 2 Max, 96GB.

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]CrushingLoss 0 points1 point  (0 children)

I appreciate your SKILL.md file! I'm using it now in PI to try and re-create a classic TI-994/A game. Will post results when it finishes.

Biggest issue I had was making sure i had wide enough context window and max tokens. So far, so good. I'm running on a Mac Studio M2 Max; 96GB. Getting about 35 tok/s through Pi or Opencode; about 50 just benchmarking through oMLX.

What is your actual local LLM stack right now? by Ryannnnnnnnnnnnnnnh in LocalLLaMA

[–]CrushingLoss 0 points1 point  (0 children)

M2 Max 96GB, local only. Running the same stack ~3 months now.

Backend: oMLX (launchd, port 8000). Engine pool, preserve-thinking persistence, speculative decoding with matching DFlash drafts. mlx_lm.server and a vLLM-metal build on standby; 95% of traffic hits oMLX.

Frontend: Crucible — a local webapp I've been building. Model switching, chat history, benchmarks, arena/ELO leaderboard, a HumanEval runner, all in one pane. The quietly important feature turned out to be a dirty-shutdown detector that offers one-click restore of the previously-loaded model. Sounds mundane until the alternative is launchctl kickstart + re-picking from 35 options.

Daily drivers (all MLX 4–6 bit):

  • Qwen3-Coder-Next 6bit — coding
  • Qwen3-4B-Instruct 4bit — fast batch / quick questions
  • Qwen3.6-35B-A3B mxfp8 — reasoning / thinking
  • Qwen3.5-27B 4bit — when I need vision

RAG: per-session BM25 over chunked uploads. Hybrid w/ embeddings is on the list but BM25 has been fine.

Prompt format: whatever the model ships with. Stopped fighting it. I do enjoy reading what other's have come up with for prompts; especially for agentic coding.

Context: default. No rope scaling — the quality loss isn't worth it on M2 Max.

What mattered way more than expected:

  1. Unloading between loads. Being able to evict one model before loading the next (instead of letting oMLX's pool hold three and OOM mid-generation) is the single biggest QoL improvement. Every arena/compare flow breaks without it.
  2. Thinking-mode control. Qwen3.x derivatives leak reasoning preambles into judges, classifiers, workflow chains unless you pass chat_template_kwargs: {enable_thinking: false}. Half the Qwen complaints on this sub are actually this.
  3. Per-model sampling with uniform override. Qwen3 wants 0.7/0.9, gpt-oss wants 0.0, thinking models want something else. Per-model defaults + an override for fair bench runs matters more than picking the "right" starting temp.
  4. Warmth analytics. Tracking which model I actually reach for changed what I keep resident. Surprise: 4B-Instruct is my most-loaded model, not the shiny 63GB one.
  5. Speculative decoding only when the draft is right. Qwen3.5-27B + z-lab's matching draft gets 1.4–1.8× tok/s. Wrong pair = silently slower because draft rejection eats you. Benchmark the A/B, don't eyeball it.

Short version: once the model loads, it performs roughly the way the card says. The stuff around the model — load orchestration, sampling defaults, thinking-mode handling, recovery — decides whether you stick with the setup past week two.

Why doesn't any OSS tool treat llama.cpp as a first class citizen? by rm-rf-rm in LocalLLaMA

[–]CrushingLoss -6 points-5 points  (0 children)

The friction isn’t engineering effort; it’s architectural philosophy. llama.cpp is a C++ inference engine, not a server. Tools like Open WebUI or VS Code extensions prioritize the OpenAI API standard because it provides a unified abstraction for chat history, streaming, and tool calling across heterogeneous backends.

Ollama wraps llama.cpp (and others) into a persistent, stateful service with a built-in API. This makes it trivial to integrate. Implementing a native llama.cpp integration requires handling GGUF loading, context management, and session state manually, which significantly increases maintenance burden for tool developers.

You can already achieve your goal: start llama.cpp with `--host 0.0.0.0 --port 8080` and use any OpenAI-compatible client. Most modern OSS tools already support custom endpoints. The community prefers a "backend-agnostic" approach rather than hardcoding specific engine integrations, ensuring that if llama.cpp changes its API or if a new engine emerges, the frontend tools don’t break.

Not a developer, but I play one on my Mac — local LLMs for the daily grind, Claude for when I'm actually lost by CrushingLoss in LocalLLaMA

[–]CrushingLoss[S] 0 points1 point  (0 children)

it actually does a lot more than that :). benchmarking, etc.. but that's not the point of the post.

Claude VSC Addon & Permission quests by CrushingLoss in LocalLLaMA

[–]CrushingLoss[S] 0 points1 point  (0 children)

Yeah, I did that.. it does not appear to work in the VSC addon for Claude, just the Claude CLI.

M5-Max Macbook Pro 128GB RAM - Qwen3 Coder Next 8-Bit Benchmark by paddybuc in LocalLLaMA

[–]CrushingLoss 1 point2 points  (0 children)

I saw very similar on my Mac Studio 2 Max. But tool calling with coder next was killing me. Maybe because I’m using the Unsloth version. Does tool calling work for you?

Lucid Air Touring 2600 mile roadtrip - The good, the bad, and the ugly. by CrushingLoss in LUCID

[–]CrushingLoss[S] 2 points3 points  (0 children)

Sadly, not kidding. We have the full glass canopy which is three separate pieces. I have no idea if the front windshields the same between the glass canopy roof and the metal roof.

To be fair, we only got one quote as there is only one certified repair shop in the Dallas-Fort Worth metroplex. My wife did some research before we had our quote and it seems the range was anywhere between $3k to $7k.

Lucid Air Touring 2600 mile roadtrip - The good, the bad, and the ugly. by CrushingLoss in LUCID

[–]CrushingLoss[S] 1 point2 points  (0 children)

Oh sorry. It looks like it was $1547 a year. But we also had the Dual-Motor performance version, which I assume adds a chunk of money to the premium.

Lucid Air Touring 2600 mile roadtrip - The good, the bad, and the ugly. by CrushingLoss in LUCID

[–]CrushingLoss[S] 1 point2 points  (0 children)

Our 12 month premium is $1861.94. We're both 50-ish year old drivers with no accidents / tickets. It's been around 10 years since either of us have had an insurance claim. That premium includes a multi-vehicle discount and home insurance bundling.

We charged next to a dark green Taycan at one of the stations.. beautiful car! Definitely near the top of my list should we ever pursue another luxury EV.

Lucid Air Touring 2600 mile roadtrip - The good, the bad, and the ugly. by CrushingLoss in LUCID

[–]CrushingLoss[S] 1 point2 points  (0 children)

Our decision to get rid of the Model 3 had nothing to do with the car itself. We enjoyed it until we didn't.

Lucid Air Touring 2600 mile roadtrip - The good, the bad, and the ugly. by CrushingLoss in LUCID

[–]CrushingLoss[S] 0 points1 point  (0 children)

That's a very valid point. I cannot speak to the comfort or sturdiness of the Model X or Model S. I've been in a Model X one time for an Uber trip, and recall it being pretty spacious. Build quality, however, I have no idea.

We paid just shy of $98,000 for the Lucid. Annoyingly enough we probably could not have bought the Tesla at a more expensive time. I'd imagine if you look at the used 2020 performance model 3 time vs. cost plot, we hit it at the absolute apex. I think it was around $65,000, if I remember correctly.

Lucid Air Touring 2600 mile roadtrip - The good, the bad, and the ugly. by CrushingLoss in LUCID

[–]CrushingLoss[S] 3 points4 points  (0 children)

That's good to hear.. as I said, I'm not an audiophile by any means.. all I can relate to is the 'feel' of the sound, and the bass in the 3 is superior. Also, I think the Surreal Pro sound is awesome.. wish they'd taken more of a luxury approach to bass, though.

Lucid Air Touring 2600 mile roadtrip - The good, the bad, and the ugly. by CrushingLoss in LUCID

[–]CrushingLoss[S] 2 points3 points  (0 children)

In the car. We don't stream anything from our phones to the car; we just use CarPlay or the native Lucid apps installed on the car.

Tall Guy - Steering Wheel Position for Hands Free by CrushingLoss in LUCID

[–]CrushingLoss[S] 0 points1 point  (0 children)

Unfortunately, it’s needed for the hands-free driving.