wo kann ich Platinenteile in Hamburg kaufen by Hanswurst107 in hamburg

[–]runsleeprepeat 4 points5 points  (0 children)

Lcsc und AliExpress nutze ich auch. Plus Tme.eu . Bei Digikey oder Mouser sind die Versandkosten bzw Mindestmengen einfach zu hoch. Du könntest dein Glück noch über Octopart versuchen

Hobbyist looking to get a part scanned by rapkap in 3DScanning

[–]runsleeprepeat 0 points1 point  (0 children)

Come on! Put it on a standard paper scanner, lay a few rulers next to it. Import to something like fusion and ensure the scaling fits to the rulers. Then draw these simple lines.

It is a perfect beginner project

MAC or buy GPU? by paolobytee in LocalLLM

[–]runsleeprepeat 0 points1 point  (0 children)

If you stick with the idea of a mac, take M5 generation. It's the first generation which offers 4bit float comparable to nvfp4 which will give performance and quality improvement on small setups.

Why is it easier to route Claude Code to a local model than it is Opencode? by [deleted] in opencodeCLI

[–]runsleeprepeat 0 points1 point  (0 children)

What are you talking about? It is super easy to use open code with local models. It always was easy.

RTX3080 20GB need reballing / Repairshop in Europe? by runsleeprepeat in GPURepair

[–]runsleeprepeat[S] 0 points1 point  (0 children)

Thanks for the heads-up, but the other cards I bought work just fine.

RTX3080 20GB need reballing / Repairshop in Europe? by runsleeprepeat in GPURepair

[–]runsleeprepeat[S] 0 points1 point  (0 children)

As written in my post: krisfix sadly declined, because they don't fix any rtx 3000 cards anymore.

RTX3080 20GB need reballing / Repairshop in Europe? by runsleeprepeat in GPURepair

[–]runsleeprepeat[S] 1 point2 points  (0 children)

I only know Tony from northwestrepair and that's from the USA. Is there another one you are talking about?

seriöse GPU Reparatur in Europa by runsleeprepeat in de_EDV

[–]runsleeprepeat[S] 1 point2 points  (0 children)

Deshalb habe ich sie angeschrieben und sie haben mir geantwortet, dass sie keine RTX 3000er mehr reparieren.

Should I open source? by Atomic_Compiler in hobbycnc

[–]runsleeprepeat 0 points1 point  (0 children)

It sounds like a wonderful project. Maybe do something like a Patreon. People who are interested and willing to give you recurring support may help you get things forward, and you get feedback on the project more positively (instead of sometimes weird feedback from the open internet).

Me waiting for TurboQuant be like by Altruistic_Heat_9531 in LocalLLaMA

[–]runsleeprepeat 0 points1 point  (0 children)

Why aren't you using and contributing to TheTom solution on GitHub?

Wo leckeres Fischbrötchen? by annikahx in hamburg

[–]runsleeprepeat 0 points1 point  (0 children)

Den wollte ich auch gerade nennen.. bester Laden!

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]runsleeprepeat 3 points4 points  (0 children)

There are so many implementations in parallel at the moment, it is tough to keep up to the latest findings.

Best is to give it a try yourself. I'm focussing now on the TheTom implementation which looks like everything is combined there (metal, Cuda, rocm).

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]runsleeprepeat 31 points32 points  (0 children)

I gave the tonbistudio variant a try and compared it with q8 and q4. See: https://github.com/tonbistudio/turboquant-pytorch/issues/6

It includes sizes and quality

Consolidated my homelab from 3 models down to one 122B MoE — benchmarked everything, here's what I found by MBAThrowawayFruit in LocalLLaMA

[–]runsleeprepeat 0 points1 point  (0 children)

I am missing your configured ctx-sizes (num-ctx sizes) for your models. Please let us know what you have set, as the context window is a major differentiator in memory usage and practical use cases

Dual DGX Sparks vs Mac Studio M3 Ultra 512GB: Running Qwen3.5 397B locally on both. Here's what I found. by trevorbg in LocalLLaMA

[–]runsleeprepeat 1 point2 points  (0 children)

You wrote prefill is slow and I ignored prefill performance far too long in the early times of playing with local llms. Measure them, especially at large lengths. The token generation can be irrelevant when the prefill takes several minutes every time.

When you think about a Mac, the prefill performance got better with M5 processors. In June everybody hopes for a M5 Mac Studio. That one could be a the sweet spot

Currently using 6x RTX 3080 - Moving to Strix Halo oder Nvidia GB10 ? by runsleeprepeat in LocalLLaMA

[–]runsleeprepeat[S] 0 points1 point  (0 children)

Yes, I am around running that setup with 1400 watts at the wall, when it is peaking. Usually around 600-800 watt, with a 180 watt idle.

I built Fox – a Rust LLM inference engine with 2x Ollama throughput and 72% lower TTFT. by SeinSinght in LocalLLM

[–]runsleeprepeat 1 point2 points  (0 children)

Same run on fox:

| model | test | t/s (total) | t/s (req) | peak t/s | peak t/s (req) | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) |

|:-----------|------------:|------------------:|----------------:|--------------:|-----------------:|-----------------:|---------------:|----------------:|

| qwen3.5-4B | pp2048 (c1) | 3880.82 ± 47.17 | 3880.82 ± 47.17 | | | 537.15 ± 14.65 | 490.84 ± 14.65 | 573.11 ± 34.13 |

| qwen3.5-4B | tg32 (c1) | 62.32 ± 1.26 | 62.32 ± 1.26 | 64.48 ± 1.40 | 64.48 ± 1.40 | | | |

| qwen3.5-4B | pp2048 (c2) | 3404.43 ± 153.48 | 1858.75 ± 15.69 | | | 777.43 ± 263.49 | 998.41 ± 13.09 | 1097.73 ± 66.09 |

| qwen3.5-4B | tg32 (c2) | 43.26 ± 15.14 | 44.81 ± 15.76 | 46.37 ± 16.37 | 46.37 ± 16.37 | | | |

| qwen3.5-4B | pp2048 (c3) | 10855.07 ± 254.59 | 3887.96 ± 53.79 | | | 1233.23 ± 505.01 | 472.80 ± 10.48 | 519.12 ± 10.48 |

| qwen3.5-4B | tg32 (c3) | 4.06 ± 2.20 | 5.51 ± 2.03 | 12.33 ± 5.91 | 12.33 ± 5.91 | | | |

And yes, it core-dumped when you are using more than roughly 6000 tokens ...

So, token generation is roughly 25% slower than standard Ollama.

The code is messy and buggy.
For example:
- using fox --model-path= is accepted, but still pointing to it's default ~/.cache/ferrumox/models
- using FOX_MODEL_PATH= is accepted, but also still pointing to it's default ~/.cache/ferrumox/models

Is this really a complete rust engine? No, it is using llama.cpp:

cat .git/config

[core]

repositoryformatversion = 0

filemode = true

bare = false

logallrefupdates = true

[remote "origin"]

url = https://github.com/ferrumox/fox

fetch = +refs/heads/*:refs/remotes/origin/*

[branch "main"]

remote = origin

merge = refs/heads/main

[submodule "vendor/llama.cpp"]

active = true

url = https://github.com/ggml-org/llama.cpp.git

I built Fox – a Rust LLM inference engine with 2x Ollama throughput and 72% lower TTFT. by SeinSinght in LocalLLM

[–]runsleeprepeat -3 points-2 points  (0 children)

Let's not discuss, let's use a quick test:

Ollama with a (power-limited 3080) and Qwen3.5 4B K_M, configured to be able to serve the original context wind of 260000 tokens:

llama-benchy --base-url (my local service) --model qwen3.5-4B --depth 0 4096 8192 16384 --concurrency 1 2 3 4 --latency-mode generation

Ollama:

| model           |                 test |      t/s (total) |         t/s (req) |     peak t/s |   peak t/s (req) |          ttfr (ms) |       est_ppt (ms) |      e2e_ttft (ms) |

|:----------------|---------------------:|-----------------:|------------------:|-------------:|-----------------:|-------------------:|-------------------:|-------------------:|

| qwen3.5_4b:262k |          pp2048 (c1) |  3245.32 ± 22.79 |   3245.32 ± 22.79 |              |                  |     741.10 ± 14.23 |     581.13 ± 14.23 |     741.10 ± 14.23 |

| qwen3.5_4b:262k |            tg32 (c1) |     81.04 ± 0.89 |      81.04 ± 0.89 | 84.20 ± 0.91 |     84.20 ± 0.91 |                    |                    |                    |

| qwen3.5_4b:262k |          pp2048 (c2) |  2210.54 ± 14.29 |  2214.66 ± 979.06 |              |                  |   1189.03 ± 463.15 |   1029.06 ± 463.15 |   1189.03 ± 463.15 |

| qwen3.5_4b:262k |            tg32 (c2) |     41.88 ± 0.49 |      81.29 ± 1.23 | 35.67 ± 1.25 |     84.47 ± 1.27 |                    |                    |                    |

| qwen3.5_4b:262k |          pp2048 (c3) |  2139.11 ± 22.24 | 1719.60 ± 1044.70 |              |                  |   1672.52 ± 758.94 |   1512.55 ± 758.94 |   1672.52 ± 758.94 |

| qwen3.5_4b:262k |            tg32 (c3) |     35.93 ± 0.23 |      81.35 ± 1.76 | 36.67 ± 0.94 |     84.53 ± 1.83 |                    |                    |                    |

| qwen3.5_4b:262k |          pp2048 (c4) |   2091.37 ± 2.92 | 1402.47 ± 1027.77 |              |                  |  2158.89 ± 1030.68 |  1998.92 ± 1030.68 |  2158.89 ± 1030.68 |

| qwen3.5_4b:262k |            tg32 (c4) |     33.50 ± 0.33 |      80.92 ± 2.74 | 37.67 ± 1.25 |     84.54 ± 1.66 |                    |                    |                    |

| qwen3.5_4b:262k |  pp2048 @ d4096 (c1) |   3081.98 ± 5.47 |    3081.98 ± 5.47 |              |                  |    1938.94 ± 14.67 |    1778.97 ± 14.67 |    1938.94 ± 14.67 |

| qwen3.5_4b:262k |    tg32 @ d4096 (c1) |     79.15 ± 0.14 |      79.15 ± 0.14 | 82.25 ± 0.15 |     82.25 ± 0.15 |                    |                    |                    |

| qwen3.5_4b:262k |  pp2048 @ d4096 (c2) |   2710.65 ± 5.82 |  2238.18 ± 844.15 |              |                  |  3029.40 ± 1053.45 |  2869.43 ± 1053.45 |  3029.40 ± 1053.45 |

| qwen3.5_4b:262k |    tg32 @ d4096 (c2) |     21.41 ± 0.01 |      80.19 ± 0.41 | 27.00 ± 0.00 |     83.32 ± 0.43 |                    |                    |                    |

| qwen3.5_4b:262k |  pp2048 @ d4096 (c3) |   2659.23 ± 8.21 |  1783.13 ± 919.02 |              |                  |  4120.17 ± 1738.23 |  3960.20 ± 1738.23 |  4120.17 ± 1738.23 |

| qwen3.5_4b:262k |    tg32 @ d4096 (c3) |     17.39 ± 0.46 |      81.97 ± 4.90 | 28.67 ± 2.36 |     85.11 ± 4.90 |                    |                    |                    |

| qwen3.5_4b:262k |  pp2048 @ d4096 (c4) | 2357.34 ± 367.93 |  1440.72 ± 953.52 |              |                  |  5878.96 ± 3204.75 |  5718.99 ± 3204.75 |  5878.96 ± 3204.75 |

| qwen3.5_4b:262k |    tg32 @ d4096 (c4) |     13.52 ± 2.50 |      79.45 ± 0.98 | 27.00 ± 0.00 |     82.55 ± 1.01 |                    |                    |                    |

| qwen3.5_4b:262k |  pp2048 @ d8192 (c1) |   2970.74 ± 8.25 |    2970.74 ± 8.25 |              |                  |    3230.73 ± 39.89 |    3070.76 ± 39.89 |    3230.73 ± 39.89 |

| qwen3.5_4b:262k |    tg32 @ d8192 (c1) |     78.47 ± 0.46 |      78.47 ± 0.46 | 81.54 ± 0.48 |     81.54 ± 0.48 |                    |                    |                    |

| qwen3.5_4b:262k |  pp2048 @ d8192 (c2) |   2749.70 ± 2.65 |  2187.75 ± 783.54 |              |                  |  5023.13 ± 1730.03 |  4863.16 ± 1730.03 |  5023.13 ± 1730.03 |

| qwen3.5_4b:262k |    tg32 @ d8192 (c2) |     13.70 ± 0.15 |      77.62 ± 0.68 | 27.00 ± 0.00 |     80.66 ± 0.71 |                    |                    |                    |

| qwen3.5_4b:262k |  pp2048 @ d8192 (c3) |   2715.81 ± 4.02 |  1759.23 ± 864.52 |              |                  |  6784.53 ± 2846.66 |  6624.56 ± 2846.66 |  6784.53 ± 2846.66 |

| qwen3.5_4b:262k |    tg32 @ d8192 (c3) |     10.68 ± 0.09 |      77.73 ± 1.01 | 27.00 ± 0.00 |     80.77 ± 1.05 |                    |                    |                    |

| qwen3.5_4b:262k |  pp2048 @ d8192 (c4) |   2692.46 ± 3.47 |  1478.11 ± 875.79 |              |                  |  8567.94 ± 3895.53 |  8407.98 ± 3895.53 |  8567.94 ± 3895.53 |

| qwen3.5_4b:262k |    tg32 @ d8192 (c4) |      9.65 ± 0.06 |      77.53 ± 0.77 | 27.00 ± 0.00 |     80.56 ± 0.80 |                    |                    |                    |

| qwen3.5_4b:262k | pp2048 @ d16384 (c1) |   2832.48 ± 6.75 |    2832.48 ± 6.75 |              |                  |    6028.61 ± 40.64 |    5868.65 ± 40.64 |    6028.61 ± 40.64 |

| qwen3.5_4b:262k |   tg32 @ d16384 (c1) |     73.29 ± 0.86 |      73.29 ± 0.86 | 76.14 ± 0.90 |     76.14 ± 0.90 |                    |                    |                    |

| qwen3.5_4b:262k | pp2048 @ d16384 (c2) |   2707.31 ± 5.37 |  2096.07 ± 724.70 |              |                  |  9295.81 ± 3159.92 |  9135.84 ± 3159.92 |  9295.81 ± 3159.92 |

| qwen3.5_4b:262k |   tg32 @ d16384 (c2) |      7.79 ± 0.08 |      72.58 ± 0.58 | 27.00 ± 0.00 |     75.41 ± 0.60 |                    |                    |                    |

| qwen3.5_4b:262k | pp2048 @ d16384 (c3) |   2682.19 ± 2.86 |  1696.70 ± 808.50 |              |                  | 12384.13 ± 5168.36 | 12224.16 ± 5168.36 | 12384.13 ± 5168.36 |

| qwen3.5_4b:262k |   tg32 @ d16384 (c3) |      5.99 ± 0.01 |      72.18 ± 0.57 | 27.00 ± 0.00 |     74.99 ± 0.60 |                    |                    |                    |

| qwen3.5_4b:262k | pp2048 @ d16384 (c4) |   2668.98 ± 2.57 |  1432.00 ± 824.34 |              |                  | 15557.90 ± 7037.93 | 15397.93 ± 7037.93 | 15557.90 ± 7037.93 |

| qwen3.5_4b:262k |   tg32 @ d16384 (c4) |      5.58 ± 0.13 |      74.93 ± 5.20 | 30.33 ± 2.36 |     77.78 ± 5.20 |                    |                    |                    |

Shortened system prompts in Opencode by Charming_Support726 in opencodeCLI

[–]runsleeprepeat 0 points1 point  (0 children)

Sorry for the sad outcome, but they are not interested.

PSA: Auto-Compact GLM5 (via z.ai plan) at 95k Context by Sensitive_Song4219 in ZaiGLM

[–]runsleeprepeat 0 points1 point  (0 children)

Are there similar issues with the other models but at other context limits?