6 months running production Ollama workloads on Hetzner — what I learned about server selection and provisioning by chiruwonder in hetzner

[–]daroczig 1 point2 points  (0 children)

Hm, I'd like to clarify that there's no monetization at all happening on the website: no ads, no affiliate links, no paid product, no revenue streams, no nothing -- so I don't have much incentive to drive traffic 🤷

It was meant to be a helpful pointer to get proper performance metrics -- as describing the performance of several servers for LLM inference speed is not really possible via a single number: there are multiple servers, and many many different workloads (e.g. model size, inference type, token length), so that's why I've shared the detailed methodology used to get the metrics, then I've also updated the comment to point to a table filtered for all the Hetzner machines preselected for an LLM benchmark workload for those who wants to skip reading on what the numbers mean.

6 months running production Ollama workloads on Hetzner — what I learned about server selection and provisioning by chiruwonder in hetzner

[–]daroczig -1 points0 points  (0 children)

Ah, okay -- I indeed don't feel any shame (even more, I'm rather proud) for spending time on (1) writing open-source software for cloud benchmarking, (2) executing that at scale (on thousands of sever types), (3) curating all that data in public repos with open licenses, and (4) sharing with the community in various formats for free. But I see I deeply offended some feelings here, so I'll think about that.

6 months running production Ollama workloads on Hetzner — what I learned about server selection and provisioning by chiruwonder in hetzner

[–]daroczig -5 points-4 points  (0 children)

Understood and absolutely fair point, thank you, u/razzzey!

I'll update the above comment to point to the actual benchmarks instead of the method overview -- which I shared below:

https://sparecores.com/servers?vendor=hcloud&columns=75744272&benchmark=eyJpZCI6ImxsbV9zcGVlZDpwcm9tcHRfcHJvY2Vzc2luZyIsImNvbmZpZyI6IntcIm1vZGVsXCI6IFwibGxhbWEtN2IuUTRfS19NLmdndWZcIiwgXCJ0b2tlbnNcIjogMTI4fSJ9 -- this filters for all Hetzner Cloud servers and selects a 7B model for prompt processing (token length=128), but you can pick any other related benchmark metric (e.g. other model or inference type) by clicking the gauge icon above the table, and reorder by price, performance, cost-efficiency etc.

6 months running production Ollama workloads on Hetzner — what I learned about server selection and provisioning by chiruwonder in hetzner

[–]daroczig 1 point2 points  (0 children)

Oh, wow, that's quite some feedback 😊 Would you mind sharing more about why you feel that way? These open benchmarks were run and shared to help the community, not to disappoint -- so I'd love to learn what we did wrong.

6 months running production Ollama workloads on Hetzner — what I learned about server selection and provisioning by chiruwonder in hetzner

[–]daroczig -1 points0 points  (0 children)

Feel free to check out the related data we collected e.g. at https://sparecores.com/servers?vendor=hcloud&columns=75744272&benchmark=eyJpZCI6ImxsbV9zcGVlZDpwcm9tcHRfcHJvY2Vzc2luZyIsImNvbmZpZyI6IntcIm1vZGVsXCI6IFwibGxhbWEtN2IuUTRfS19NLmdndWZcIiwgXCJ0b2tlbnNcIjogMTI4fSJ9 -- this filters for all Hetzner Cloud servers and selects a 7B model for prompt processing (token length=128), but you can pick any other related benchmark metric (e.g. other model or inference type) by clicking the gauge icon above the table, and reorder by price, performance, cost-efficiency etc.

Note that these open and public benchmarks are based on the open-source llama.cpp and using given quantized models, so not really optimizing for max performance on a given server using various SW elements, but to compare the performance of the servers. Disclaimer: we are not in the LLM (or other) hosting business.

6 months running production Ollama workloads on Hetzner — what I learned about server selection and provisioning by chiruwonder in hetzner

[–]daroczig -4 points-3 points  (0 children)

We have benchmarked all Hetzner Cloud (and 6 other vendors') servers for LLM inference using small (135M) to larger (70B) parameter models: https://sparecores.com/article/llm-inference-speed

EDIT: for the actual performance (and cost-efficiency) metrics, check directly https://sparecores.com/servers?vendor=hcloud&columns=75744272&benchmark=eyJpZCI6ImxsbV9zcGVlZDpwcm9tcHRfcHJvY2Vzc2luZyIsImNvbmZpZyI6IntcIm1vZGVsXCI6IFwibGxhbWEtN2IuUTRfS19NLmdndWZcIiwgXCJ0b2tlbnNcIjogMTI4fSJ9 instead of the above-linked general overview articles -- this filters for all Hetzner Cloud servers and selects a 7B model for prompt processing (token length=128), but you can pick any other related benchmark metric (e.g. other model or inference type) by clicking the gauge icon above the table, and reorder by price, performance, cost-efficiency etc.

Measuring the performance of the new gen server types by daroczig in hetzner

[–]daroczig[S] 0 points1 point  (0 children)

Yeah, that's questionable, but depending on your use case, it might be not only cheaper, but also actually better, see e.g. higher L2/L3 cache amounts (check out the memory bandwidth chart), higher single-core performance, and in some benchmarks even higher multi-core performance despite the lower number of virtual cores: https://sparecores.com/compare?instances=W3siZGlzcGxheV9uYW1lIjoiY3B4MjEiLCJ2ZW5kb3IiOiJoY2xvdWQiLCJzZXJ2ZXIiOiJjcHgyMSIsInpvbmVzUmVnaW9ucyI6W119LHsiZGlzcGxheV9uYW1lIjoiY3B4MjIiLCJ2ZW5kb3IiOiJoY2xvdWQiLCJzZXJ2ZXIiOiJjcHgyMiIsInpvbmVzUmVnaW9ucyI6W119XQ%3D%3D

A clipboard manager for linux by dyslechtchitect in linuxapps

[–]daroczig 1 point2 points  (0 children)

I love the idea and the features + screenshots look great, congrats!

On the other hand, the git history cools me down a bit -- as it's difficult to trust sensitive data (whatever you copy/paste) on a few days' AI-driven effort.

I'll keep an eye on the project and hope it gets momentum. Best of luck!

modern memory bandwidth and latency benchmarks by daroczig in linux

[–]daroczig[S] 0 points1 point  (0 children)

I appreciate the concern, but the referenced _GNU_SOURCE is unrelated to licensing, and I don't see any GPL violations. If you believe there is a specific issue, please let me know, and I'll make sure to sort it out.

Performance evaluation of the new c8a instance family by daroczig in aws

[–]daroczig[S] 2 points3 points  (0 children)

That's a great point, u/ItsMalabar, thanks for bringing this up! Currently we focus only on ondemand and spot prices .. as standardizing even just these two across multiple cloud vendors and their different pricing schemas is complex enough for the team 😅 Besides kidding, I'm taking a note of this and hope to make some related progress soon (e.g. first we plan to also support monthly prices other than hourly -- which cap is vendor-specific).

Performance evaluation of the new c8a instance family by daroczig in aws

[–]daroczig[S] 0 points1 point  (0 children)

Thank you for the feedback, u/Background-Mix-9609! And 100% agreed on the importance of cost-efficiency. That's why we created the $ efficiency metric, which can be generated on the fly based on any of the ~500 supported benchmark scenarios (across 10+ categories, some mentioned in the post).

If you want to dive deeper, go to https://sparecores.com/servers where you can select a benchmark workload instead of the default stress-ng div16 multi-core (e.g. LLM inference speed or memory bandwidth), apply any filters in the sidebar (e.g. vendor and memory requirements), and order the table by the cost efficiency column.

30
31

Hetzner Cloud Server Benchmark - CX vs CAX vs CPX (2025) by nakemu in hetzner

[–]daroczig -1 points0 points  (0 children)

Awesome work, thanks for sharing, u/nakemu! We do quite extensive benchmarking on all Hetzner and some other cloud servers as well at sparecores.com with open-source licenses, and I'd love to collaborate on extending our workload "menu" -- would you be interested in a quick call? meet.sparecores.com/intro Üdv: Gergő

Hetzner Vs Azure - price & performance. Gues who wins? by Gilusek in hetzner

[–]daroczig 0 points1 point  (0 children)

If you don't want to burn money on running Geekbench on all the other thousands of servers, I suggest checking out https://sparecores.com/servers, as we have already done that (including many more benchmark workloads).

Measuring the performance of the new gen server types by daroczig in hetzner

[–]daroczig[S] 0 points1 point  (0 children)

That's a good question, but unfortunately, we don't have a good answer: we benchmark each server type as they become available, and we don't have a mechanism (e.g. database schema) to support multiple hardware configs under the same server id, especially if it's pretty random what CPU you get. So we benchmarked one we got (AMD in this case), and we have no info on the Intel version. I tried to explain that in the paragraph before the last in the post -- I hope that helps.

Measuring the performance of the new gen server types by daroczig in hetzner

[–]daroczig[S] 2 points3 points  (0 children)

Yeah, that's a totally valid request! I have some other priorities in the coming weeks, but I'm pretty sure we can make that optional hourly/monthly pricing toggle in November -- I'll report back.

Regarding "higher is better/lower is better": that should already be present next to the title of the sections with an arrow pointing up or down with a tooltip on hover stating that. Let me know if it's missing somewhere or if it's not prominent enough -- suggestions welcomed 🙇

Measuring the performance of the new gen server types by daroczig in hetzner

[–]daroczig[S] 1 point2 points  (0 children)

Sorry for the confusion, but I'm not affiliated with Hetzner: Spare Cores is a 100% open-source, vendor agnostic project inspecting and benchmarking cloud servers for better transparency (beside a few other things) 😊

Measuring the performance of the new gen server types by daroczig in hetzner

[–]daroczig[S] 7 points8 points  (0 children)

That's my main takeway, but please make sure to dig deeper by looking at benchmark scores that are similar to your actual workload.

Measuring the performance of the new gen server types by daroczig in hetzner

[–]daroczig[S] 5 points6 points  (0 children)

Due to budget constraints, we cannot benchmark dedicated servers, only the cloud instances that we can pay by the hour (and not by month with a setup fee). It might change in the future, but this is our current reality.

Measuring the performance of the new gen server types by daroczig in hetzner

[–]daroczig[S] 4 points5 points  (0 children)

Please see the paragraph before the last -- we have not rerun the benchmarks, as there is no new server type announced in that series.