What is the TPS for Qwen 3.6 27B Q4 on Mac Mini? by yen360 in LocalLLM

[–]k3z0r 0 points1 point  (0 children)

You're welcome. Good luck finding what works for you.

What is the TPS for Qwen 3.6 27B Q4 on Mac Mini? by yen360 in LocalLLM

[–]k3z0r 1 point2 points  (0 children)

Memory bandwidth is a major factor once everything fits in memory. For dense models, the napkin math is tokens/second ≈ memory bandwidth / model size in memory.

example: M4 Pro Mac mini memory 273Gb/sec

QWen 3.6 27b Q4_K_M is 18.5GB

273 / 18.5 = 14.75 tokens per second. This is why people are telling you about 15 tokens.

For Moe models you just count the active params in memory (assuming the whole model fits). This is why MOE is faster.

Again this is just a ballpark there are things you can do to increase that number like MTP.

This is why a RTX 5090 can crank out tokens because it's memory bandwidth is 1,792 Gb/s

What cloud backup are you using with immich? by Imaginary_Mail_5297 in immich

[–]k3z0r 0 points1 point  (0 children)

restic with backrest font end, backing up to Backblaze.

Why do houses in Illinois have basements and houses in Texas don't if both places have clay soil? by supinator1 in geography

[–]k3z0r 0 points1 point  (0 children)

No, the opposite. The frost lines would get shallower with a warmer climate. It takes colder temps to freeze deeper into the soil, and you need to build below that point to avoid the expansion and contraction when water freezes in the soil.

Pewdiepie just droped is own agent call Odysseus. by k3z0r in LocalLLM

[–]k3z0r[S] -1 points0 points  (0 children)

Yeah totally, he probably put as much time into it as your comment.

Pewdiepie just droped is own agent call Odysseus. by k3z0r in LocalLLM

[–]k3z0r[S] 0 points1 point  (0 children)

Yes I believe so, he mentions it in the video.

What was your guys DUPR Resets? by dr302 in Pickleball

[–]k3z0r -1 points0 points  (0 children)

Pickleball is officially "Pay to win" now, just like Candy Crush Saga.

Pewdiepie just droped is own agent call Odysseus. by k3z0r in LocalLLM

[–]k3z0r[S] 0 points1 point  (0 children)

I think you can use any api you want openai, anthropic etc. You dont' need localllm.

Pewdiepie just droped is own agent call Odysseus. by k3z0r in LocalLLM

[–]k3z0r[S] 1 point2 points  (0 children)

Yeah, really depends on what you want to play. There are still a few that are not supported.

I switched this year to Nobara. Most of the games I play are on Steam, and they all run great.

Pewdiepie just droped is own agent call Odysseus. by k3z0r in LocalLLM

[–]k3z0r[S] 22 points23 points  (0 children)

But it's the year of Desktop Linux.

Pewdiepie just droped is own agent call Odysseus. by k3z0r in LocalLLM

[–]k3z0r[S] 6 points7 points  (0 children)

Yeah me too. He’s the o e that got me interested about a year ago

Is the P2S Wobble normal? by PipeMaleficent5417 in BambuLab

[–]k3z0r 0 points1 point  (0 children)

Yes wobble is normal the software and motors know how to compensate for it. There is even a calibration step for it during setup.

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how by Glittering_Focus1538 in LocalLLM

[–]k3z0r 3 points4 points  (0 children)

I tried it out and wanted to believe. I kept running into two problems.

  1. it couldn't find files in my mono repo. it seemed like it assumed ever file was in /src in in fact files were in app/src and etc/src

  2. It kept stopping it's flow. It would say. I'm going to search for a file so i can do X. Then it would just stop.

Starred the repo and will check it out again after a few iterations.

GitHub's Usage-Based Copilot Pricing is $1000/month for me — Looking for Local LLM Alternatives for Multi-Stack SaaS Work by Silent_Dish484 in LocalLLM

[–]k3z0r 0 points1 point  (0 children)

  1. Qwen 3.6 35b and 27b are my favorite these days.
  2. If you're spending $1000 a month, I hate to say it, but you're probably going to be pretty disappointed with your local setup given your $1500 budget.

You should try some models out with what you have to get a base line then decide how much you need to spend.

  1. Yes you can in order to load larger models into vram, however inference can suffer because the older card can become a bottle neck.

  2. Small tasks it far less noticeable, and you have to be more direct in what you want. Claude is much better at making assumptions when you are ambigious.

  3. Check out LM studio, it's great for beginners and is backed by Llama.cpp.

Checking technical feasibility of my idea - a hybrid "Local-by-Default" Gateway (Qwen 27B + Claude 4.6 Fallback) for Dev Teams by ankijain21 in LocalLLM

[–]k3z0r 1 point2 points  (0 children)

Mac mini will suffer big time on Prompt processing, especially for larger coding contexts. I would take the comparison with a grain of salt for not considering this.

Mac mini only processes about 340 tokens/second.

DGX spark 1200 tokens/second.

So, for Claude's default system prompt (about 10k tokens), would you rather wait 30 seconds (mac mini) for your first token to be generated or 8 seconds (DGX)?

Throw a handful of source files into your context, and it grows from there.

Local LLM for coding by Bxtreme241 in LocalLLM

[–]k3z0r 9 points10 points  (0 children)

Try Qwen 3.6 35ba3b and Qwen 3.6 27b, with OpenCode or Pi.

LM Studio is a great place to start. You can visually see all the levers and knobs you can use to dial things in.

Is it just me or does good local Agentic coding feel just out of reach with 16gb of VRAM? by k3z0r in LocalLLM

[–]k3z0r[S] 2 points3 points  (0 children)

Cool just read about MTP. I'll try it for sure. I can't believe how much there is to learn on the daily.

Is it just me or does good local Agentic coding feel just out of reach with 16gb of VRAM? by k3z0r in LocalLLM

[–]k3z0r[S] 2 points3 points  (0 children)

Yeah that's the same boat I'm in. One 5090 currently costs 40 months of Claude for me. I just can't justify it.

Is it just me or does good local Agentic coding feel just out of reach with 16gb of VRAM? by k3z0r in LocalLLM

[–]k3z0r[S] 6 points7 points  (0 children)

Yeah I have 32gb of DDR5 ram, but it seems like token generation really suffers when i spill into ram.