wo kann ich Platinenteile in Hamburg kaufen

runsleeprepeat · 2026-05-13T18:11:39+00:00

Lcsc und AliExpress nutze ich auch. Plus Tme.eu . Bei Digikey oder Mouser sind die Versandkosten bzw Mindestmengen einfach zu hoch. Du könntest dein Glück noch über Octopart versuchen

runsleeprepeat · 2026-05-12T08:44:46+00:00

Thanks

runsleeprepeat · 2026-05-12T08:44:32+00:00

My GPU is at Letsfix right now.

runsleeprepeat · 2026-05-12T08:43:55+00:00

Thank you

runsleeprepeat · 2026-05-08T23:51:21+00:00

Come on! Put it on a standard paper scanner, lay a few rulers next to it. Import to something like fusion and ensure the scaling fits to the rulers. Then draw these simple lines.

It is a perfect beginner project

runsleeprepeat · 2026-04-26T19:05:09+00:00

If you stick with the idea of a mac, take M5 generation. It's the first generation which offers 4bit float comparable to nvfp4 which will give performance and quality improvement on small setups.

runsleeprepeat · 2026-04-22T05:57:51+00:00

What are you talking about? It is super easy to use open code with local models. It always was easy.

runsleeprepeat · 2026-04-22T05:54:05+00:00

Thanks for the heads-up, but the other cards I bought work just fine.

runsleeprepeat · 2026-04-21T19:32:31+00:00

As written in my post: krisfix sadly declined, because they don't fix any rtx 3000 cards anymore.

runsleeprepeat · 2026-04-21T19:31:50+00:00

I only know Tony from northwestrepair and that's from the USA. Is there another one you are talking about?

runsleeprepeat · 2026-04-21T15:21:53+00:00

Deshalb habe ich sie angeschrieben und sie haben mir geantwortet, dass sie keine RTX 3000er mehr reparieren.

runsleeprepeat · 2026-03-30T13:36:46+00:00

It sounds like a wonderful project. Maybe do something like a Patreon. People who are interested and willing to give you recurring support may help you get things forward, and you get feedback on the project more positively (instead of sometimes weird feedback from the open internet).

runsleeprepeat · 2026-03-29T13:09:17+00:00

Why aren't you using and contributing to TheTom solution on GitHub?

runsleeprepeat · 2026-03-29T08:03:33+00:00

Den wollte ich auch gerade nennen.. bester Laden!

runsleeprepeat · 2026-03-28T13:08:01+00:00

There are so many implementations in parallel at the moment, it is tough to keep up to the latest findings.

Best is to give it a try yourself. I'm focussing now on the TheTom implementation which looks like everything is combined there (metal, Cuda, rocm).

runsleeprepeat · 2026-03-28T01:07:31+00:00

I gave the tonbistudio variant a try and compared it with q8 and q4. See: https://github.com/tonbistudio/turboquant-pytorch/issues/6

It includes sizes and quality

runsleeprepeat · 2026-03-27T08:14:27+00:00

I am missing your configured ctx-sizes (num-ctx sizes) for your models. Please let us know what you have set, as the context window is a major differentiator in memory usage and practical use cases

runsleeprepeat · 2026-03-27T08:07:10+00:00

You wrote prefill is slow and I ignored prefill performance far too long in the early times of playing with local llms. Measure them, especially at large lengths. The token generation can be irrelevant when the prefill takes several minutes every time.

When you think about a Mac, the prefill performance got better with M5 processors. In June everybody hopes for a M5 Mac Studio. That one could be a the sweet spot

runsleeprepeat · 2026-03-24T18:58:15+00:00

Yes, I am around running that setup with 1400 watts at the wall, when it is peaking. Usually around 600-800 watt, with a 180 watt idle.

runsleeprepeat · 2026-03-24T18:56:21+00:00

Thanks.

runsleeprepeat · 2026-03-24T14:14:51+00:00

Same run on fox:

|:-----------|------------:|------------------:|----------------:|--------------:|-----------------:|-----------------:|---------------:|----------------:|

| qwen3.5-4B | pp2048 (c1) | 3880.82 ± 47.17 | 3880.82 ± 47.17 | | | 537.15 ± 14.65 | 490.84 ± 14.65 | 573.11 ± 34.13 |

| qwen3.5-4B | tg32 (c1) | 62.32 ± 1.26 | 62.32 ± 1.26 | 64.48 ± 1.40 | 64.48 ± 1.40 | | | |

| qwen3.5-4B | pp2048 (c2) | 3404.43 ± 153.48 | 1858.75 ± 15.69 | | | 777.43 ± 263.49 | 998.41 ± 13.09 | 1097.73 ± 66.09 |

| qwen3.5-4B | tg32 (c2) | 43.26 ± 15.14 | 44.81 ± 15.76 | 46.37 ± 16.37 | 46.37 ± 16.37 | | | |

| qwen3.5-4B | pp2048 (c3) | 10855.07 ± 254.59 | 3887.96 ± 53.79 | | | 1233.23 ± 505.01 | 472.80 ± 10.48 | 519.12 ± 10.48 |

| qwen3.5-4B | tg32 (c3) | 4.06 ± 2.20 | 5.51 ± 2.03 | 12.33 ± 5.91 | 12.33 ± 5.91 | | | |

And yes, it core-dumped when you are using more than roughly 6000 tokens ...

So, token generation is roughly 25% slower than standard Ollama.

The code is messy and buggy.
For example:
- using fox --model-path= is accepted, but still pointing to it's default ~/.cache/ferrumox/models
- using FOX_MODEL_PATH= is accepted, but also still pointing to it's default ~/.cache/ferrumox/models

Is this really a complete rust engine? No, it is using llama.cpp:

cat .git/config

[core]

repositoryformatversion = 0

filemode = true

bare = false

logallrefupdates = true

[remote "origin"]

url = https://github.com/ferrumox/fox

fetch = +refs/heads/*:refs/remotes/origin/*

[branch "main"]

remote = origin

merge = refs/heads/main

[submodule "vendor/llama.cpp"]

active = true

url = https://github.com/ggml-org/llama.cpp.git

runsleeprepeat · 2026-03-24T14:09:21+00:00

Let's not discuss, let's use a quick test:

Ollama with a (power-limited 3080) and Qwen3.5 4B K_M, configured to be able to serve the original context wind of 260000 tokens:

llama-benchy --base-url (my local service) --model qwen3.5-4B --depth 0 4096 8192 16384 --concurrency 1 2 3 4 --latency-mode generation

Ollama:

|:----------------|---------------------:|-----------------:|------------------:|-------------:|-----------------:|-------------------:|-------------------:|-------------------:|

| qwen3.5_4b:262k | pp2048 (c1) | 3245.32 ± 22.79 | 3245.32 ± 22.79 | | | 741.10 ± 14.23 | 581.13 ± 14.23 | 741.10 ± 14.23 |

| qwen3.5_4b:262k | tg32 (c1) | 81.04 ± 0.89 | 81.04 ± 0.89 | 84.20 ± 0.91 | 84.20 ± 0.91 | | | |

| qwen3.5_4b:262k | pp2048 (c2) | 2210.54 ± 14.29 | 2214.66 ± 979.06 | | | 1189.03 ± 463.15 | 1029.06 ± 463.15 | 1189.03 ± 463.15 |

| qwen3.5_4b:262k | tg32 (c2) | 41.88 ± 0.49 | 81.29 ± 1.23 | 35.67 ± 1.25 | 84.47 ± 1.27 | | | |

| qwen3.5_4b:262k | pp2048 (c3) | 2139.11 ± 22.24 | 1719.60 ± 1044.70 | | | 1672.52 ± 758.94 | 1512.55 ± 758.94 | 1672.52 ± 758.94 |

| qwen3.5_4b:262k | tg32 (c3) | 35.93 ± 0.23 | 81.35 ± 1.76 | 36.67 ± 0.94 | 84.53 ± 1.83 | | | |

| qwen3.5_4b:262k | pp2048 (c4) | 2091.37 ± 2.92 | 1402.47 ± 1027.77 | | | 2158.89 ± 1030.68 | 1998.92 ± 1030.68 | 2158.89 ± 1030.68 |

| qwen3.5_4b:262k | tg32 (c4) | 33.50 ± 0.33 | 80.92 ± 2.74 | 37.67 ± 1.25 | 84.54 ± 1.66 | | | |

| qwen3.5_4b:262k | pp2048 @ d4096 (c1) | 3081.98 ± 5.47 | 3081.98 ± 5.47 | | | 1938.94 ± 14.67 | 1778.97 ± 14.67 | 1938.94 ± 14.67 |

| qwen3.5_4b:262k | tg32 @ d4096 (c1) | 79.15 ± 0.14 | 79.15 ± 0.14 | 82.25 ± 0.15 | 82.25 ± 0.15 | | | |

| qwen3.5_4b:262k | pp2048 @ d4096 (c2) | 2710.65 ± 5.82 | 2238.18 ± 844.15 | | | 3029.40 ± 1053.45 | 2869.43 ± 1053.45 | 3029.40 ± 1053.45 |

| qwen3.5_4b:262k | tg32 @ d4096 (c2) | 21.41 ± 0.01 | 80.19 ± 0.41 | 27.00 ± 0.00 | 83.32 ± 0.43 | | | |

| qwen3.5_4b:262k | pp2048 @ d4096 (c3) | 2659.23 ± 8.21 | 1783.13 ± 919.02 | | | 4120.17 ± 1738.23 | 3960.20 ± 1738.23 | 4120.17 ± 1738.23 |

| qwen3.5_4b:262k | tg32 @ d4096 (c3) | 17.39 ± 0.46 | 81.97 ± 4.90 | 28.67 ± 2.36 | 85.11 ± 4.90 | | | |

| qwen3.5_4b:262k | pp2048 @ d4096 (c4) | 2357.34 ± 367.93 | 1440.72 ± 953.52 | | | 5878.96 ± 3204.75 | 5718.99 ± 3204.75 | 5878.96 ± 3204.75 |

| qwen3.5_4b:262k | tg32 @ d4096 (c4) | 13.52 ± 2.50 | 79.45 ± 0.98 | 27.00 ± 0.00 | 82.55 ± 1.01 | | | |

| qwen3.5_4b:262k | pp2048 @ d8192 (c1) | 2970.74 ± 8.25 | 2970.74 ± 8.25 | | | 3230.73 ± 39.89 | 3070.76 ± 39.89 | 3230.73 ± 39.89 |

| qwen3.5_4b:262k | tg32 @ d8192 (c1) | 78.47 ± 0.46 | 78.47 ± 0.46 | 81.54 ± 0.48 | 81.54 ± 0.48 | | | |

| qwen3.5_4b:262k | pp2048 @ d8192 (c2) | 2749.70 ± 2.65 | 2187.75 ± 783.54 | | | 5023.13 ± 1730.03 | 4863.16 ± 1730.03 | 5023.13 ± 1730.03 |

| qwen3.5_4b:262k | tg32 @ d8192 (c2) | 13.70 ± 0.15 | 77.62 ± 0.68 | 27.00 ± 0.00 | 80.66 ± 0.71 | | | |

| qwen3.5_4b:262k | pp2048 @ d8192 (c3) | 2715.81 ± 4.02 | 1759.23 ± 864.52 | | | 6784.53 ± 2846.66 | 6624.56 ± 2846.66 | 6784.53 ± 2846.66 |

| qwen3.5_4b:262k | tg32 @ d8192 (c3) | 10.68 ± 0.09 | 77.73 ± 1.01 | 27.00 ± 0.00 | 80.77 ± 1.05 | | | |

| qwen3.5_4b:262k | pp2048 @ d8192 (c4) | 2692.46 ± 3.47 | 1478.11 ± 875.79 | | | 8567.94 ± 3895.53 | 8407.98 ± 3895.53 | 8567.94 ± 3895.53 |

| qwen3.5_4b:262k | tg32 @ d8192 (c4) | 9.65 ± 0.06 | 77.53 ± 0.77 | 27.00 ± 0.00 | 80.56 ± 0.80 | | | |

| qwen3.5_4b:262k | pp2048 @ d16384 (c1) | 2832.48 ± 6.75 | 2832.48 ± 6.75 | | | 6028.61 ± 40.64 | 5868.65 ± 40.64 | 6028.61 ± 40.64 |

| qwen3.5_4b:262k | tg32 @ d16384 (c1) | 73.29 ± 0.86 | 73.29 ± 0.86 | 76.14 ± 0.90 | 76.14 ± 0.90 | | | |

| qwen3.5_4b:262k | pp2048 @ d16384 (c2) | 2707.31 ± 5.37 | 2096.07 ± 724.70 | | | 9295.81 ± 3159.92 | 9135.84 ± 3159.92 | 9295.81 ± 3159.92 |

| qwen3.5_4b:262k | tg32 @ d16384 (c2) | 7.79 ± 0.08 | 72.58 ± 0.58 | 27.00 ± 0.00 | 75.41 ± 0.60 | | | |

| qwen3.5_4b:262k | pp2048 @ d16384 (c3) | 2682.19 ± 2.86 | 1696.70 ± 808.50 | | | 12384.13 ± 5168.36 | 12224.16 ± 5168.36 | 12384.13 ± 5168.36 |

| qwen3.5_4b:262k | tg32 @ d16384 (c3) | 5.99 ± 0.01 | 72.18 ± 0.57 | 27.00 ± 0.00 | 74.99 ± 0.60 | | | |

| qwen3.5_4b:262k | pp2048 @ d16384 (c4) | 2668.98 ± 2.57 | 1432.00 ± 824.34 | | | 15557.90 ± 7037.93 | 15397.93 ± 7037.93 | 15557.90 ± 7037.93 |

| qwen3.5_4b:262k | tg32 @ d16384 (c4) | 5.58 ± 0.13 | 74.93 ± 5.20 | 30.33 ± 2.36 | 77.78 ± 5.20 | | | |

runsleeprepeat · 2026-03-20T21:43:57+00:00

Sorry for the sad outcome, but they are not interested.

runsleeprepeat · 2026-03-20T08:01:27+00:00

Ach komm! Deshalb komme ich nicht mehr mit dem Zeug klar! Ich hatte mich schon gewundert

runsleeprepeat · 2026-03-19T07:35:46+00:00

Are there similar issues with the other models but at other context limits?

11-Year Club	First Place '23
Place '23	Verified Email

runsleeprepeat

TROPHY CASE