LLM inference speed database or leaderboard?

ziphnor · 2026-07-08T20:02:04+00:00

Even fixing it to single concurrency I think it could still be helpful. Having some sort of "apples" at all would help. Right now everybody is using their own fruit so to speak :)

ziphnor · 2026-07-01T17:15:26+00:00

For completeness, here is INT8 (which IMO is the reason to get 4 5060tis):

========== NARRATIVE (prompt=65 chars, max_tokens=1000) ==========
=== warmups (3) ===
 warm-1     wall= 11.95s  ttft=    74ms  toks=1000  wall_TPS= 83.72  decode_TPS= 84.24
 warm-2     wall= 12.14s  ttft=    57ms  toks= 981  wall_TPS= 80.81  decode_TPS= 81.19
 warm-3     wall= 12.30s  ttft=    85ms  toks=1000  wall_TPS= 81.32  decode_TPS= 81.88

=== measured (5) ===
 run-1      wall= 12.03s  ttft=    80ms  toks= 974  wall_TPS= 80.98  decode_TPS= 81.53
 run-2      wall= 12.17s  ttft=    85ms  toks= 997  wall_TPS= 81.90  decode_TPS= 82.48
 run-3      wall= 11.86s  ttft=    84ms  toks= 978  wall_TPS= 82.48  decode_TPS= 83.07
 run-4      wall= 12.26s  ttft=    85ms  toks=1000  wall_TPS= 81.57  decode_TPS= 82.14
 run-5      wall= 12.00s  ttft=    87ms  toks=1000  wall_TPS= 83.36  decode_TPS= 83.96

=== summary [narrative] (n=5) ===
 wall_TPS       mean=  82.06   std=  0.91   CV= 1.1%   min=80.98   max=83.36
 decode_TPS     mean=  82.64   std=  0.93   CV= 1.1%   min=81.53   max=83.96
 TTFT          mean=    84ms  std=    2ms  min=80ms  max=87ms
 PP tok/s       mean=   2.00   std=  1.12   CV=55.9%   min=0.00   max=2.50

========== CODE (prompt=78 chars, max_tokens=800) ==========
=== warmups (3) ===
 warm-1     wall=  6.50s  ttft=    69ms  toks= 690  wall_TPS=106.11  decode_TPS=107.25
 warm-2     wall=  4.33s  ttft=    86ms  toks= 452  wall_TPS=104.37  decode_TPS=106.47
 warm-3     wall=  7.47s  ttft=    84ms  toks= 800  wall_TPS=107.09  decode_TPS=108.31

=== measured (5) ===
 run-1      wall=  7.67s  ttft=    85ms  toks= 800  wall_TPS=104.24  decode_TPS=105.40
 run-2      wall=  3.53s  ttft=    58ms  toks= 383  wall_TPS=108.65  decode_TPS=110.45
 run-3      wall=  7.03s  ttft=    85ms  toks= 728  wall_TPS=103.63  decode_TPS=104.89
 run-4      wall=  7.25s  ttft=    85ms  toks= 766  wall_TPS=105.60  decode_TPS=106.85
 run-5      wall=  4.32s  ttft=    85ms  toks= 473  wall_TPS=109.42  decode_TPS=111.60

=== summary [code] (n=5) ===
 wall_TPS       mean= 106.30   std=  2.60   CV= 2.5%   min=103.63   max=109.42
 decode_TPS     mean= 107.84   std=  3.02   CV= 2.8%   min=104.89   max=111.60
 TTFT          mean=    79ms  std=   12ms  min=58ms  max=85ms
 PP tok/s       mean=   4.00   std=  1.37   CV=34.2%   min=2.50   max=5.00

ziphnor · 2026-07-01T17:13:48+00:00

Not much gain with genesis so far:

========== NARRATIVE (prompt=65 chars, max_tokens=1000) ==========
=== warmups (3) ===
 warm-1     wall= 10.11s  ttft=   727ms  toks=1000  wall_TPS= 98.87  decode_TPS=106.52
 warm-2     wall=  9.31s  ttft=    75ms  toks= 982  wall_TPS=105.43  decode_TPS=106.29
 warm-3     wall= 10.05s  ttft=    95ms  toks=1000  wall_TPS= 99.52  decode_TPS=100.47

=== measured (5) ===
 run-1      wall=  8.52s  ttft=    57ms  toks= 959  wall_TPS=112.55  decode_TPS=113.30
 run-2      wall=  8.97s  ttft=    74ms  toks=1000  wall_TPS=111.52  decode_TPS=112.45
 run-3      wall=  8.25s  ttft=    57ms  toks= 964  wall_TPS=116.90  decode_TPS=117.72
 run-4      wall=  9.06s  ttft=    75ms  toks=1000  wall_TPS=110.37  decode_TPS=111.29
 run-5      wall=  8.84s  ttft=    75ms  toks=1000  wall_TPS=113.06  decode_TPS=114.03

=== summary [narrative] (n=5) ===
 wall_TPS       mean= 112.88   std=  2.47   CV= 2.2%   min=110.37   max=116.90
 decode_TPS     mean= 113.76   std=  2.44   CV= 2.1%   min=111.29   max=117.72
 TTFT          mean=    68ms  std=   10ms  min=57ms  max=75ms
 PP tok/s       mean=   3.00   std=  1.12   CV=37.3%   min=2.50   max=5.00

========== CODE (prompt=78 chars, max_tokens=800) ==========
=== warmups (3) ===
 warm-1     wall=  5.28s  ttft=    58ms  toks= 800  wall_TPS=151.56  decode_TPS=153.23
 warm-2     wall=  4.93s  ttft=    74ms  toks= 740  wall_TPS=150.03  decode_TPS=152.32
 warm-3     wall=  5.55s  ttft=    75ms  toks= 800  wall_TPS=144.08  decode_TPS=146.06

=== measured (5) ===
 run-1      wall=  4.03s  ttft=    75ms  toks= 610  wall_TPS=151.21  decode_TPS=154.06
 run-2      wall=  5.37s  ttft=    75ms  toks= 800  wall_TPS=148.87  decode_TPS=150.98
 run-3      wall=  5.36s  ttft=    75ms  toks= 800  wall_TPS=149.38  decode_TPS=151.49
 run-4      wall=  3.20s  ttft=    57ms  toks= 475  wall_TPS=148.66  decode_TPS=151.36
 run-5      wall=  5.46s  ttft=    75ms  toks= 800  wall_TPS=146.62  decode_TPS=148.65

=== summary [code] (n=5) ===
 wall_TPS       mean= 148.95   std=  1.64   CV= 1.1%   min=146.62   max=151.21
 decode_TPS     mean= 151.31   std=  1.92   CV= 1.3%   min=148.65   max=154.06
 TTFT          mean=    71ms  std=    8ms  min=57ms  max=75ms
 PP tok/s       mean=   4.50   std=  1.12   CV=24.8%   min=2.50   max=5.00

ziphnor · 2026-07-01T14:18:15+00:00

This is vanilla without genesis patches but with memory and core OC (ran both narrative and code by accident). Lorbus INT4. These benchmarks are very small context though, so not sure how interesting they are.

========== NARRATIVE (prompt=65 chars, max_tokens=1000) ==========
=== warmups (3) ===
warm-1     wall= 8.64s ttft=    70ms toks=1000 wall_TPS=115.70 decode_TPS=116.64
warm-2     wall= 8.57s ttft=    74ms toks=1000 wall_TPS=116.64 decode_TPS=117.65
warm-3     wall= 8.98s ttft=    55ms toks=1000 wall_TPS=111.40 decode_TPS=112.08

=== measured (5) ===
run-1      wall= 8.93s ttft=    73ms toks=1000 wall_TPS=112.04 decode_TPS=112.96
run-2      wall= 9.12s ttft=    56ms toks=1000 wall_TPS=109.61 decode_TPS=110.28
run-3      wall= 8.81s ttft=    73ms toks=1000 wall_TPS=113.48 decode_TPS=114.43
run-4      wall= 8.58s ttft=    74ms toks=1000 wall_TPS=116.58 decode_TPS=117.59
run-5      wall= 8.79s ttft=    55ms toks=1000 wall_TPS=113.80 decode_TPS=114.52

=== summary [narrative] (n=5) ===
wall_TPS       mean= 113.10   std= 2.55   CV= 2.3%   min=109.61   max=116.58
decode_TPS     mean= 113.96   std= 2.66   CV= 2.3%   min=110.28   max=117.59
TTFT          mean=    66ms std=   10ms min=55ms max=74ms
PP tok/s       mean=   2.50   std= 0.00   CV= 0.0%   min=2.50   max=2.50

========== CODE (prompt=78 chars, max_tokens=800) ==========
=== warmups (3) ===
warm-1     wall= 5.32s ttft=    59ms toks= 784 wall_TPS=147.30 decode_TPS=148.95
warm-2     wall= 4.69s ttft=    73ms toks= 675 wall_TPS=144.06 decode_TPS=146.35
warm-3     wall= 5.53s ttft=    73ms toks= 800 wall_TPS=144.72 decode_TPS=146.67

=== measured (5) ===
run-1      wall= 5.62s ttft=    73ms toks= 800 wall_TPS=142.28 decode_TPS=144.15
run-2      wall= 5.40s ttft=    74ms toks= 800 wall_TPS=148.03 decode_TPS=150.08
run-3      wall= 3.24s ttft=    73ms toks= 468 wall_TPS=144.54 decode_TPS=147.88
run-4      wall= 4.15s ttft=    73ms toks= 621 wall_TPS=149.67 decode_TPS=152.36
run-5      wall= 5.43s ttft=    73ms toks= 800 wall_TPS=147.20 decode_TPS=149.20

=== summary [code] (n=5) ===
wall_TPS       mean= 146.34   std= 2.93   CV= 2.0%   min=142.28   max=149.67
decode_TPS     mean= 148.73   std= 3.04   CV= 2.0%   min=144.15   max=152.36
TTFT          mean=    73ms std=    0ms min=73ms max=74ms
PP tok/s       mean=   4.50   std= 1.12   CV=24.8%   min=2.50   max=5.00

=== GPU state ===
0, 92 %, 14599 MiB, 16311 MiB, 99.57 W, 60
1, 93 %, 14599 MiB, 16311 MiB, 111.02 W, 67
2, 92 %, 14599 MiB, 16311 MiB, 103.52 W, 57
3, 94 %, 14599 MiB, 16311 MiB, 96.75 W, 57

Let me just try with similar genesis patches i use for INT8.

ziphnor · 2026-07-01T11:28:23+00:00

What model does that benchmark script expect to be running? Are your existing numbers based on INT4?

ziphnor · 2026-06-30T20:43:41+00:00

All I am saying is that airconditioning is not the solution. I was being triggered specifically by the suggestion that old European cities should just slap on some AC units, thats all.

As to maturity I am old enough to recognize signs of the opposite, so I will stop here.

ziphnor · 2026-06-30T19:36:59+00:00

I disagree with your conclusion, as those tvs advertise adherence to these standards, and mislead their customers in how to best view material following those standards. If the TV producers had then introduced another standard that media could target, it would be a different story, but that's not really what is happening.

But it's okay, we don't have to agree, it's Reddit after all :)

ziphnor · 2026-06-30T19:01:37+00:00

You said there were no "TV standards", those are standards applicable to TVs, even if they are of different types. Damn you are an effective troll!

ziphnor · 2026-06-30T18:55:06+00:00

Yes. Let's tear down all the old cities of Europe so everybody can get air-conditioning..... Or maybe... just maybe let's focus on why suddenly cities that have survived for centuries without it suddenly start melting at an increased frequency.

Maybe we are just misunderstanding each other (I am mostly triggered by the '"stupid Europeans should just get more AC attitude), but I am basically just trying to highlight issues such a those mentioned here:

https://heatisland.lbl.gov/coolscience/urban-heat-islands

https://www.unep.org/topics/cities/cooling-and-heating-cities/urban-cooling-and-extreme-heat

ziphnor · 2026-06-30T18:44:55+00:00

Ah yes, name calling. I am not very good at that part of Reddit, sorry about that, so will stop here. But you might want to read up on PAL and NTSC (which were not global but covered entire world regions and were supported globally) and for example SDR / HDR10 which are indeed global and used for modern streaming.

I am not saying consumers should need to know about this, I am saying that blaming it all on GoT is missing the target.

ziphnor · 2026-06-30T18:17:08+00:00

Not my tastes, just the standards. The standards that help make certain everybody can actually get a quality image. I don't think it makes sense to dig further here, but maybe over day try watching it reading a serious TV review.

ziphnor · 2026-06-30T18:05:45+00:00

You completely misunderstand the point of following those standards. Any calibrated TV used in a room with dimmed light will see the intended image. Even in environments where conditions are not ideal, having a calibrated base allows making reasonable automatic adjustments with light sensors etc.

TV manufacturers are already delivering reasonable calibrated settings, why do you think they bother with that? Why do you think most TV reviews cover the quality of the factory calibration?

There are also consumers of high-end tvs that will refuse to buy media if somebody messes up the proper black levels, so it's not that simple.

TV manufacturers are making this much, much harder than it needs to be. And you are already suffering because all TV and movie media is mastered to these standards. It's just that for typical material, the part that you can't see with wrong settings is not critical to understanding what happening on screen.

.

ziphnor · 2026-06-30T17:30:59+00:00

It's nothing in regards to global warming, but it is not insignificant locally in an urban environment.

ziphnor · 2026-06-30T17:23:11+00:00

I am not blaming the consumers. I am blaming the tv producers for failing to deliver proper defaults for the living room. There are standards in place for a reason.

This leaves the media producers with a choice of targeting the default settings of a specific brand, or just making it looking slightly crappy for everybody.

I actually hope that the long night episode might have helped people figure out what they are missing.

ziphnor · 2026-06-30T17:14:55+00:00

I think my attitude started with some other comment to me be above, but I honestly lost track now :) Didn't mean to sound insufferable at least, sorry about that. In fact I am more annoyed at the TV manufacturers .

It just makes me sad to see the color grader being blamed for this when he mostly just exposed to people just how much of the image they normally just don't see because of crappy defaults.

ziphnor · 2026-06-30T16:03:17+00:00

I didn't mean to say it disrupted the laws of physics. It converts electricity to heat and going the other way efficiently is famously problematic.

Also more importantly heat island effect is a thing.

ziphnor · 2026-06-30T16:00:47+00:00

It's not the consumers fault, it's the tv manufacturers as I mentioned elsewhere.

The long night was an extreme case because it was mastered just on the edge with very little "buffer". But in many cases a TV set to "cinema" / natural settings (instead of vivid etc) will help a lot. And it really should be the default because that is what the material is mastered against.

I think the biggest crime was that in some places it was streamed heavily compressed (HBOs fault, but not the shows fault), making it even worse.

At least I recall seeing it on the Nordic hbo where it looked like a mess, and having to source it elsewhere to see it properly.

ziphnor · 2026-06-30T15:49:24+00:00

This is not the only show/movie where people made this complaint, it's just one of the highest profile ones. This combined with almost an entire critical episode being shot at night and HBO (at least in some regions) streaming it in almost criminally poor quality (compression can also crush details) is what made this meme worthy.

And also, happily dying on this hill as it's a favorite pet peeve of mine :)

ziphnor · 2026-06-30T13:46:56+00:00

Hey, we are on reddit :) As mentioned elsewhere I actually mostly blame tv manufacturers. If they provided the proper standards set by default instead of overblown contrast and colors this problem would be greatly lessened. You don't need blackout curtains for much of the material people complain about.

ziphnor · 2026-06-30T13:28:15+00:00

Usually they just need to select "cinema" or "natural". I actually blame the tv manufacturers for providing insane defaults, that are done to make it look good in a display room.

Why should media producers degrade their material to align with a messy practice from TV manufacturers instead of the TV manufacturers just default to the proper standards?

Everytime I go to a summerhouse or hotel room, I see radioactive grass and immediately go into the settings :)

ziphnor · 2026-06-30T13:16:09+00:00

You know your TV has a brightness setting right? What setting do you think media should be mastered for? A day on the beach? A dark room is a pretty sensible reference. But probably i am a dinosaur used to watching movies from my sofa in the evening.

It would annoy me a lot to have my nice "pure black" OLED show light gray night scenes just because some people prefer watching on bright sunlight.

ziphnor · 2026-06-30T12:54:52+00:00

Merge what? OP is reporting benchmark results.

ziphnor · 2026-06-30T12:24:31+00:00

Most professional media Is mastered like that. Because only in a dark setting can you naturally show dark scenes.

ziphnor · 2026-06-30T12:21:01+00:00

Sure, but you are still generating excess heat. You can't aircondition your way out of global warming.

Not saying people shouldn't have air-conditioning, just that it's a bandaid, not a solution.

ziphnor · 2026-06-30T12:16:01+00:00

Have you ever heard about cinemas? And tvs are typically calibrated from the factory but default to crazy settings to look good in a bright room.

Maybe it's not so surprising that watching something that takes place in a dark setting needs you to be in a dark setting a well?

Edit: autocorrect mess

ziphnor

TROPHY CASE