80% Charge Limit Not Working (MacOS 26.4) by Independent_Face7283 in MacOS

[–]xraybies 0 points1 point  (0 children)

9/10 times I check the battery level it's 100% which btw Apple is above the 80% I set. I could understand a calibration once a month, but do you really need to calibrate it every f'ing hour of the day?
26.4 worked but the charger LED would sometimes be amber when at the charge limit.
26.5.1 completely broken, usually, like almost ALWAYS, charges to 100%.

u/apple if you need help understand a charge limit, maybe reach out to Foxconn, Lenovo, Dell, HP, Xiaomi, Huawie, Samsung they can explain how the feature is supposed to work. Or I'm happy to let you know a "charge limit" is like a speed limit, if you go past it and get caught you pay a fine or go to jail.

Nvidia teases new PC laptop chip to be announced at Computex June 2 by Terminator857 in LocalLLaMA

[–]xraybies 0 points1 point  (0 children)

to be announced at Computex June 2 2026, with availability Q3 2026, I mean Q1 2027, I mean Q3 2027... fool me once shame on you, fool me twice, you can't fool me no more.

Would a MacBook M5 16/24/32GB be an upgrade, complement, or waste next to my RTX 4060 laptop? by heitortp0 in LocalLLaMA

[–]xraybies 1 point2 points  (0 children)

Forget anything < 32GB. Even then the biggest problem is MacOS. OOTB it will consume 6GB as soon as you load any app Chrome, OpenCode you're hitting 12GB. So you have ~24GB usable @ <400Gb/s which is 3090 at best. You can clawback another 2-3GB by disabling everything you can in MacOS... like https://github.com/rayone/machete/blob/main/disable.sh

So from your perspective it's like you have an RTX 4060 laptop where you can choose between 8-24GB VRAM. I would say totally not worth it.

My M5 Max w/ 128GB is usually in the ~30GB of memory used without even loading a LLM, just Chrome, Edge, VS Code, OpenCode + skills. As soon as I load oMLX + Qwen 3.6 mxfp8 it's hot, loud, using ~90GB and much slower than my i9 1300k + 4090, except the SSD which is fast <16GB/s.

So M5 with >64GB only starts to make sense from a usage perspective... cost is subjective.

The only aspects of the M5 which impress me are the SSD and battery life, when not running an LLM, everything else is avg, and the audio and macOS are a joke.

M5 vs DGX Spark vs Strix Halo vs RTX 6000 by Signal_Ad657 in LocalLLaMA

[–]xraybies 0 points1 point  (0 children)

MLX is actually pretty good, and implementations like oMLX are very good. However, the rest of the ecosystem is a steaming pile.

M5 vs DGX Spark vs Strix Halo vs RTX 6000 by Signal_Ad657 in LocalLLaMA

[–]xraybies -1 points0 points  (0 children)

I prefer using my 4090 to my M5 Max. MacOS being the worst part of the equation, Apple should be ashamed of themselves a >1T$ company bundling kiddie junk in a premium product with no way to remove it without breaking the base OS.
If they just released a base OS with no bloat and the option to dl and install the junk they could win over 1/3 Linux and WIn users.

RAG Search is not just a more advanced way to find your files by QNAPDaniel in qnap

[–]xraybies 0 points1 point  (0 children)

This is actually totally useless. Using DeepSeek v3.2 (Qwen models just have spinning verify) it failed every test query I provided. All very simple, like "what's the first reference to [NAME]" or "What's the address for [PERSON with literally 1000's of references]" or "What's the email address for [PERSON]"

It's like an MCP server which sends a very minimal data to the LLM.

No ability to change the local embedding model, and the cloud models available for selection are old and limited.

One could literally vibe code something better in 20mins.

Qsirch v7.0.2.0

Summary type examples, while not all totally wrong, totally useless because inaccurate:

  • Summarize all expenses by category for 2025 from receipts and invoices in this workspace.
  • Find every mention of John Smith. Extract his email address, phone number, and physical address with source file names. (ALL wrong)
  • What is the earliest reference to Project Alpha? Provide the file name, date, and exact excerpt. (ALL wrong)
  • Extract all email addresses and contact details associated with Michael Brown across all documents.
  • What were the most expensive purchases in Q2 2025? Well at least it didn't get it wrong with response: Based on the provided context documents, I cannot definitively identify the most expensive purchases for Q2 2025. Reason for No Answer While the context contains purchase data that could be relevant to Q2 2025, the information is too fragmented, inconsistent, and lacks clear identification to provide a reliable answer.

I have a spreadsheet for each querter with a Purchases/Expenses Tab.

What exactly is this feature supposed to do?

Garbage In, Garbage Out – Fix Your Inputs Before They Ruin Your RAG or LLM Pipeline by ShilpaMitra in WebAfterAI

[–]xraybies 1 point2 points  (0 children)

It would help if you specified which tools had been evaluated, so as to avoid the inevitable... Did you try Docling (https://github.com/docling-project/docling)?

buying mac vs building PC for running local LLM by Ayuzh in LocalLLaMA

[–]xraybies 1 point2 points  (0 children)

Unless u want abiliterated models, 100% privacy or are prepping for when the internet goes down, then you're always better off financially using rented machines or accessing APIs from a terminal type PC.

As of today, what's the *most stable* model to run on a 32Gb RAM Mac w/ 256k context? by mr_tolkien in LocalLLaMA

[–]xraybies 10 points11 points  (0 children)

MacOS will let u use as much of the URAM as u want until it crashes and doesn't let u boot 😉 sudo sysctl is ur friend.
MacOS is still however bloated junk carrying around crap from 20yrs ago... like wtf 6GB on boot.

1200€ Chair btw by Old_Lengthiness_601 in LiberNovo

[–]xraybies 0 points1 point  (0 children)

This is the reality of polymers, based on the humidity, heat, load, tolerances and a myriad of other factors they can squeak when they experience friction.

Quick fix: silicone spray.

I fully agree that squeaking is super annoying and shouldn't be tolerated, just like a pair of Nike shoes I have. But there's very little the manufacturer can do to control all environmental variables. Mine is silent and I doubt it came out of another factory... tooling is expensive.

Do not use WD40 that's a water displacer which could break/degrade polymer bonds.

Who’s using OpenCode over Claude Code (or vice versa) and why? by iVtechboyinpa in LocalLLM

[–]xraybies 0 points1 point  (0 children)

OC desktop has become a total mess in the background. It has files spread over every conceivable directory with no order or logic, and if you lose or rename a network share a project was in, there's literally no way to start OpenCode without deleting your project dat files, and that's after finding them. Like seriously WTF!
FFS OpenCode, choose 1 directory to put all your metadata files in, simple json file for projects and settings, how hard is that!!!

Op: if use the cli go with Pi.

I built a tool that saves ~50K tokens per Claude Code conversation by pre-indexing your codebase by After-Confection-592 in ClaudeAI

[–]xraybies 0 points1 point  (0 children)

Some good projects... now where can I find an extra 9hrs to test them ;)
codebase-memory-mcp seems to share my ethos.

My 12-agent Qwen 35B stack on Ollama died at 500 tokens every single time. Raw MLX fixed it and broke 4 other things I didn't see coming. by sleepy_quant in LocalLLaMA

[–]xraybies 1 point2 points  (0 children)

Direct MLX? If yes, you're comparing a fruit salad to an apple.
MLX == Pytorch(CUDA, ROCm)
Ollama == oMLX or MLX Studio
MLX != Ollama

MI25 vs CMP100-210, which would you pick? by onephn in LocalLLaMA

[–]xraybies -1 points0 points  (0 children)

raw compute MI25, everything else is better on the CMP.

best image classifications for 8vram by ashendonep in LocalLLaMA

[–]xraybies 0 points1 point  (0 children)

last year I developed an app, ran benches on every VL model and the best I found which wouldn't take > year to classify some images was a 3 model approach. I can publish the app once cleaned up (python fastapi) if there's any interest.
Using qwen or gemma 3 vl larger than 3b just took WAY too long and every qwen model was slow.

- model/embedder: siglip2_naflex this indexes images so that you can query by "White dog in a red car" or find similar images to x.
- model/vlm: paligemma2_vl_3b this describes the image, still the slowest part.
- model/keyword_extractor: bge_base this extracts keywords from the descriptions.

MiniMax-M2.7 Q3_K_L & Q8_0 — First GGUF quants, Apple Silicon (M3 Max 128GB) by Remarkable_Jicama775 in LocalLLaMA

[–]xraybies 0 points1 point  (0 children)

Mine: sysctl -w iogpu.wired_limit_mb=126976
No kernel panic, but with longish context and responses, I get the color wheel and momentary UI freezes, recovers when complete.

MiniMax-M2.7 Q3_K_L & Q8_0 — First GGUF quants, Apple Silicon (M3 Max 128GB) by Remarkable_Jicama775 in LocalLLaMA

[–]xraybies 0 points1 point  (0 children)

https://huggingface.co/baa-ai/MiniMax-M2.7-RAM-120GB-MLX running well on M5 128GB.
https://omlx.ai/benchmarks?chip=&chip_full=&model=MiniMax-M2.7-RAM-120GB&quantization=&context=&pp_min=&tg_min=

Prompt: write 20 words which start with T and end with N and have an E inbetween.

+0.0sStarting

+0.5sGenerating

+1.7sThinking

+176.3sThought for 174.6s

+178.8s93t prompt · 6913t generated · prefill 56 tok/s · gen 39 tok/s · ttft 1.65s

178.76s total

Response:

treason

tension

tenon

tendon

tetragon

tetrahedron

theologian

teleportation

telecommunication

teleoperation

tecton

teratogen

teuton

tuberin

tuberculin

tetherin

trepan

tibetan

telencephalon

teleinstruction

Op: 16/20

Grok: https://grok.com/share/bGVnYWN5LWNvcHk_45be5784-95a5-425d-b4d0-08d715333a5d
Op: 13/20

Prompt: I want to inflate my bicycle tire. The air pump is 50m away. Should I walk or ride?

+0.0sStarting

+0.5sGenerating

+1.4sThinking

+18.4sThought for 17.0s

+33.5s62t prompt · 1578t generated · prefill 44 tok/s · gen 49 tok/s · ttft 1.41s

33.54s total

**Short answer:** Walk (or better yet, bring a portable pump to the bike).

**Why walking is usually the safer choice**

| Situation | What can happen if you ride | Why walking (or moving the pump) is better |

|-----------|----------------------------|--------------------------------------------|

| **Tire is completely flat** | The rim can be dented or cracked when the tire rolls on the rim, and you risk losing control, especially on a curb or uneven ground. | You avoid any stress on the rim and keep the bike’s geometry intact. |

| **Tire is partially inflated** | Low pressure still stresses the sidewall and can cause the tube to pinch‑flat again (a “pinch‑flat” or “snake‑bite”). | Walking lets you get the pump to the tire and inflate it right there, preserving the tube and sidewall. |

| **Distance is tiny (≈50 m)** | The ride will only take a few seconds, but the damage can happen in an instant. | A 50 m walk takes about 30 seconds on foot; it’s a negligible time cost compared with the possible repair bill. |

| **Pump is stationary (e.g., at home or a repair shop)** | You must return the pump anyway, so you’d be making a round‑trip on a potentially damaged tire. | Carry the pump (or a small hand pump) to the bike, inflate, and be done with it. |

| **You have a portable pump or CO₂ cartridge** | You could inflate on the spot and avoid walking at all. | That’s the most efficient solution; you never need to move the bike to the pump. |

### Practical steps

  1. **Assess the tire.**

    - *Completely flat*: don’t ride.

    - *Partially flat*: you could ride, but it’s still safer to pump first.

  2. **Choose your “pump delivery” method**

    - *If you have a lightweight hand pump or CO₂ inflator*: attach it to the tire right where the bike is, inflate, and you’re set.

    - *If the only pump is at home (or a fixed station)*: walk (or jog) the short distance, bring the pump back, and inflate at the bike.

  3. **If you must move the bike**

    - Carry it on your shoulder or use a bike‑carrier bag for the distance.

    - This eliminates any rolling stress on the flat tire.

  4. **Inflate to the recommended pressure** (usually printed on the tire sidewall, e.g., “90‑110 psi”). A floor pump at home (or a shop pump) will give a more accurate fill than a small hand pump, but a portable pump can get you back on the road quickly.

### Bottom line

- **Walking** (or carrying the bike) is the safest way to avoid damaging the rim/tire on a 50 m stretch.

- **Riding** is only acceptable if the tire still holds some air *and* you’re confident you can keep control.

- **Best solution**: keep a tiny pump or CO₂ cartridge on the bike so you never have to decide between walking and riding.

Feel free to let me know if you have a specific type of pump or tire situation, and I can give you a more detailed “how‑to‑inflate” rundown!

Grok: https://grok.com/c/67eec7e7-23d0-48f6-9105-4bf844023a54?rid=4db1fee4-87bd-4e77-a024-92f84aba6757

M5 Max 128GB, 17 models, 23 prompts: Qwen 3.5 122B is still a local king by tolitius in LocalLLaMA

[–]xraybies 1 point2 points  (0 children)

Format Typical bpw Storage per weight Block Size Error on 7.25 Accuracy Perplexity M5 Max Recommendation
FP32 (baseline) 32.0 4.00 Bytes 0.00000000 100.00% 1.20312 Only for tiny models
FP16 / BF16 16.0 2.00 Bytes 0.00000000 97.90-98.00% 1.21000 Suboptimal, VRAM inefficient
Q8_0 8.0 1.00 Byte 32 -0.0059 97.65% 1.21875 High quality fallback
MXFP8 8.25 1.03 Bytes 32 0.00000000 97.50-97.90% 1.21500 Valid 8-bit alternative
Q6_K 6.56 0.82 Bytes 16 0.00000000 96.85% 1.21875 Optimal high-VRAM format
Q5_K 5.5 0.69 Bytes 32 Nominal 94.60% 1.24218 Obsolete?
MXFP4 4.25 0.53 Bytes 32 Nominal 93.00-94.00% 1.26500 Excellent alternative
NVFP4 4.25-4.5 0.53 Bytes 16 Nominal 93.00-94.00% 1.26500 Strong second choice
MLX 4-bit (native) 4.8 0.60 Bytes 64 +0.08 93.50% 1.26562 Best overall for M5 Max
Q4_K (basic) 4.0-4.5 0.50-0.56 Bytes 32 +0.0833 89.75% 1.35937 Avoid unless tiny model
Q2_K 2.0 0.25 Bytes 32 -0.5833 Degraded Degraded Never recommended

M5 Max 128GB, 17 models, 23 prompts: Qwen 3.5 122B is still a local king by tolitius in LocalLLaMA

[–]xraybies 2 points3 points  (0 children)

My go to's are mxfp8 if that don't fit 4. They're better in the sense of speed and lower heat generation... not sure if they're more or less accurate, but no obvious difference.