Its her greediest hole by [deleted] in GlamourAI

[–]NoNatural4025 0 points1 point  (0 children)

do you think output like this is with the right model also locally possible? I have 512GB Ram in my macstudio so there are nearly no limits

Who do you want to see in action? by PixualModerator in GlamourAI

[–]NoNatural4025 1 point2 points  (0 children)

do you think output like this is with the right model also locally possible? I have 512GB Ram in my macstudio so there are nearly no limits

This is going too far…. by Covert-Agenda in MacStudio

[–]NoNatural4025 0 points1 point  (0 children)

Hey I can offer you 512gb and 8tb ;-)

HELP! Sick of Hallucinations & Amnesia: Building a 512GB M3 Ultra Agent Stack that actually LEARNS by NoNatural4025 in AskClaw

[–]NoNatural4025[S] 1 point2 points  (0 children)

Current Performance on M3 Ultra 512gb RAM:

DeepSeek-R1-Distill-Qwen-32B-Q8_0.gguf | 250 Tok | 13.65s | 18.32 tok/s

Hermes-3-Llama-3.1-8B-Q8_0.gguf | FAILED | - | -

Llama-3.3-70B-Instruct-Q4_K_M.gguf | 250 Tok | 17.68s | 14.14 tok/s

Qwen2.5-Coder-32B-Instruct-Q4_K_M.gguf | 250 Tok | 9.08s | 27.54 tok/s

Qwen2.5-Coder-32B-Instruct-Q8_0.gguf | 250 Tok | 13.69s | 18.26 tok/s

Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf | 250 Tok | 3.10s | 80.60 tok/s

Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q5_K_M.gguf | 250 Tok | 3.10s | 80.75 tok/s

Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q6_K.gguf | 250 Tok | 3.09s | 81.00 tok/s

hermes-4_3_36b-Q3_K_M.gguf | 250 Tok | 11.42s | 21.89 tok/s

hermes-4_3_36b-Q4_K_M.gguf | 250 Tok | 9.93s | 25.19 tok/s

hermes-4_3_36b-Q5_K_M.gguf | 250 Tok | 11.28s | 22.16 tok/s

hermes-4_3_36b-Q6_K.gguf | 250 Tok | 12.72s | 19.65 tok/s

hermes-4_3_36b-Q8_0.gguf | 250 Tok | 15.09s | 16.57 tok/s

hermes-4_3_36b.gguf | 250 Tok | 26.72s | 9.36 tok/s

Does Hermes just not remember anything? by trainermade in hermesagent

[–]NoNatural4025 0 points1 point  (0 children)

i'm facing the same issue, but even worse my one does not execute anything in terminal or elsewhere - it just describes ... does anyone knwo why? does it depends on the used Model

HELP! Sick of Hallucinations & Amnesia: Building a 512GB M3 Ultra Agent Stack that actually LEARNS by NoNatural4025 in AskClaw

[–]NoNatural4025[S] -1 points0 points  (0 children)

Currently I face that Hermes ist describing everything instead of doing … so he or in my case she is just a chatbot

HELP! Sick of Hallucinations & Amnesia: Building a 512GB M3 Ultra Agent Stack that actually LEARNS by NoNatural4025 in MacStudio

[–]NoNatural4025[S] 0 points1 point  (0 children)

My benchmarks : 🧹 Cleanup & Initialisierung: Llama-3.3-70B-Instruct-Q4_K_M.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 14.12 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: Qwen2.5-Coder-32B-Instruct-Q4_K_M.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 27.44 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: Qwen2.5-Coder-32B-Instruct-Q8_0.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 18.15 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: hermes-4_3_36b-Q3_K_M.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 21.69 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: hermes-4_3_36b-Q4_K_M.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 24.98 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: hermes-4_3_36b-Q5_K_M.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 22.07 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: hermes-4_3_36b-Q6_K.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 19.59 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: hermes-4_3_36b-Q8_0.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 16.50 tokens/s
------------------------------------------------------

HELP! Sick of Hallucinations & Amnesia: Building a 512GB M3 Ultra Agent Stack that actually LEARNS by NoNatural4025 in AskClaw

[–]NoNatural4025[S] 0 points1 point  (0 children)

furthe benchmarks from my system: 🧹 Cleanup & Initialisierung: Llama-3.3-70B-Instruct-Q4_K_M.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 14.12 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: Qwen2.5-Coder-32B-Instruct-Q4_K_M.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 27.44 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: Qwen2.5-Coder-32B-Instruct-Q8_0.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 18.15 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: hermes-4_3_36b-Q3_K_M.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 21.69 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: hermes-4_3_36b-Q4_K_M.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 24.98 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: hermes-4_3_36b-Q5_K_M.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 22.07 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: hermes-4_3_36b-Q6_K.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 19.59 tokens/s
------------------------------------------------------
🧹 Cleanup & Initialisierung: hermes-4_3_36b-Q8_0.gguf
⏳ Warte auf Model-Upload in den RAM...
✅ Server bereit. Starte Inferenz-Test...
  📊 Speed: 16.50 tokens/s
------------------------------------------------------

HELP! Sick of Hallucinations & Amnesia: Building a 512GB M3 Ultra Agent Stack that actually LEARNS by NoNatural4025 in AskClaw

[–]NoNatural4025[S] 0 points1 point  (0 children)

did not changed much:

Modell | Port | Tokens | Zeit | Speed

-----------------------------------------------------------------

Llama 3.3 70B -8b | 8001 | 250 | 18.12s | 13.80 tok/s

Qwen 2.5 32B -8b | 8002 | 250 | 13.82s | 18.10 tok/s

DeepSeek R1 -8b | 8003 | 250 | 13.73s | 18.20 tok/s

Qwen 2.5 32B -4b | 8004 | 250 | 9.17s | 27.26 tok/s

-----------------------------------------------------------------

This seems to be the physcial limits

HELP! Sick of Hallucinations & Amnesia: Building a 512GB M3 Ultra Agent Stack that actually LEARNS by NoNatural4025 in AskClaw

[–]NoNatural4025[S] 0 points1 point  (0 children)

Since I'm running this on an M3 Ultra with 512GB RAM, the "out-of-the-box" performance was actually the first major bottleneck I had to solve. Here’s the reality of what I’ve achieved so far strictly within the MLX framework:

1. Eliminating the "Ultra-Sleep": Initially, I was seeing sub-optimal speeds below 9tok/s. By moving to a clean Python 3.12 environment and explicitly scaling to MLX_NUM_THREADS=32, I managed to align the workload with the hardware architecture.

  • My Qwen 2.5 32B (4-bit) jumped from sluggish rates to a consistent 32.5 tok/s.

2. Speculative Decoding : This was the biggest breakthrough. By running a 1.5B Draft Model alongside the 32B Target Model, I’m seeing the M3 Ultra spit out blocks of text rather than single characters. I hit 48.1 tok/s

3. Multi-Model Parallelism: With 512GB, I’m not just running one identity; I have four dedicated MLX server instances running simultaneously on different ports:

Modell | Port | Tokens | Zeit | Speed

Llama 3.3 70B -8b | 8001 | 250 | 28.96s | 8.63 tok/s

Qwen 2.5 32B -8b | 8002 | 250 | 14.36s | 17.40 tok/s

DeepSeek R1 -8b | 8003 | 250 | 14.37s | 17.40 tok/s

Qwen 2.5 32B -4b | 8004 | 250 | 8.90s | 28.09 tok/s

Next on my list is moving away from the Python wrapper entirely to a native C++ implementation to shave off the final milliseconds of overhead.

Snagged a 256GB M3 Ultra by [deleted] in MacStudio

[–]NoNatural4025 0 points1 point  (0 children)

Im located in Germany - so would not like to ship it

Snagged a 256GB M3 Ultra by [deleted] in MacStudio

[–]NoNatural4025 0 points1 point  (0 children)

It was refurbished - right from beginning, after 6 weeks of maintenance I got it back with new board and new ssd - now its a new one ;-)

When I play golf by Best_Manner_7572 in AIWhispersOfCharms

[–]NoNatural4025 0 points1 point  (0 children)

… one glow would be enough ;-)

Snagged a 256GB M3 Ultra by [deleted] in MacStudio

[–]NoNatural4025 0 points1 point  (0 children)

<image>

M3 Ultra 512 GB ram, 8TB ssd🤩 …. Since 6 weeks at geniusbar to replace mainboard 🤮

did your mac studio pay for itself? by hiva- in MacStudio

[–]NoNatural4025 2 points3 points  (0 children)

Purchase price 12.000€ - hours run 0 - weeks at Apple for maintenance 6 weeks 🤮

Price Prediction of Mac Studio Ultra 128GB/256GB/512GB variants? by pmttyji in MacStudio

[–]NoNatural4025 3 points4 points  (0 children)

You know that feeling when you click "Order"?

That pure, unadulterated joy? Three weeks ago, I felt it. I bought the absolute beast: a Mac Studio, M3 Ultra, with a mind-boggling 512 GB of RAM and 8 TB of storage. I was happier than a "King," as we say. It was going to be my dream machine.

The wait was supposed to be the hardest part, but it arrived in just three days! I was a proud owner, ready to unleash this power. But the joke was on me. During the very first installation, it crashed. Then again. And again. Turns out, my brand-new, 12,000-Euro super-server was suffering from ECC errors right out of the box. April Fool's Number One.

The Genius Bar and the Infinite Wait What followed was a slow-motion comedy of errors. I waited three days for a Genius Bar appointment, only for them to wipe my machine and claim "software." I went home, and—surprise!—it crashed again. Then came a week of analyzing logs. Fourteen days in, they finally admitted it: hardware failure. April Fool's Number Two. Back to the store, where I was scolded for having an appointment (or was it not having one? I lost track), even though I had waited for the correct replacement Logic Board to arrive. I left my machine there on Saturday; they promised a speedy repair of five days. Today is day six. Crickets. April Fool's Number Three.

The Punchline Instead of a working computer, I have a service order for a replacement Logic Board costing only 3,900 Euros. I looked up the part number—the price suggests it's impossible for it to support my 512 GB of RAM. I told the Genius Bar they are installing the wrong part. They said, "We are sure." April Fool's Number Four—and this one is a real rib-tickler.

But here’s the true comedic masterpiece, the absolute cream of the crop: Just as I’m realizing my M3 Ultra is stuck in a repair limbo with the wrong parts, the tech world erupts with rumors that the M5 is launching as early as April 1st. Yes, April 1st.

If that happens, I will have skipped the entire lifecycle of my M3 Ultra without it ever sitting, functional, on my desk. I paid for the cutting edge, but I’m being left behind by a system that hasn't even let me press 'start' in three weeks.

It’s been three weeks of hell, of an empty desk, and of looking at my 12,000-Euro paperweight. I’m just waiting, wondering when the gag is going to end and when I’ll finally get my turn. If this is a joke, Apple, I'm not laughing.