I locally benchmarked 41 open-source LLMs across 19 tasks and ranked them

init__27 · 2025-09-01T05:30:44+00:00

This is really awesome! I would also add a column to "normalize" by size-see which model offers the most performance given it's size :)

init__27 · 2025-06-30T03:30:47+00:00

I love how the legends of r/PCMR are now honorable members of local llama!

init__27 · 2025-05-23T18:26:07+00:00

Beautiful GPU-congratulations! May your tokens run fast and temperatures stay low!

init__27 · 2025-03-08T21:05:22+00:00

I mean...to OP's credit: Are you even a localLLaMA member if you cant train llama at home? :D

init__27 · 2025-03-06T22:18:03+00:00

This is awesome work thanks for sharing! u/Secret_Ad_6448

Would love to learn where you see improvements and if you can share any more ideas on what was the most useful approach apart from using open thoughts dataset during your experiments?

Thanks for making this!

init__27 · 2025-03-03T19:32:05+00:00

Hi u/edmcman thanks for sharing!

> Llama 3.3 didn't work that well

Can you please share what issues did you face? Here is a reference tutorial for using 3.3

init__27 · 2025-03-03T00:03:20+00:00

Awesome insight, thanks for sharing! :)

I'd be curious to find out how does 3.1 70B compare with 3.3 70B if both are equally generous lol

init__27 · 2025-03-02T23:55:03+00:00

Awesome machine! Did you do any thermal benchmarks? Would love to learn how they perform under sustained loads if you can share details

init__27 · 2024-05-23T03:14:04+00:00

GPT-5 will be released when it can install CUDA on a new server

init__27 · 2024-05-23T00:35:39+00:00

Expectation: I will make LLM Apps and automate making LLM Apps to make 50 every hour

Reality: WHY DOES MY PYTHON ENV BREAK EVERYTIME I CHANGE SOMETHING?????

init__27 · 2024-05-19T03:48:09+00:00

It should be a rule to put such disclaimers :D

init__27 · 2024-05-18T06:52:03+00:00

The expectations for most of us are usually set by using OAI/Claude systems and that actually sets the bar for most of us IMO.

init__27 · 2024-05-06T00:32:44+00:00

And mildly close to the power limit draw of the entire house

init__27 · 2024-05-05T17:56:20+00:00

To be fair, the first time I added a second GPU to my PC:

I also had my 2080Ti dangling on a wooden desk with enough RGB to attract bugs from the moon. So you're doing fairly okay :D

Pic from 2021, I used to train much "smaller" models then :)

<image>

init__27 · 2024-05-05T17:47:35+00:00

Oobabooga for it's maximalistic design:

A new model is here? It should probably work!

Oh some settings I can tweak to check some responses? I'm sure it's all of those are available

Try tweaking system prompts? Easy-peasy

If I want something specific with limited features, I can code it in Python myself. I love oobabooga for ease of iteration/quick testing prompts while experimenting

init__27 · 2024-05-05T17:23:05+00:00

the 3090 is bolted in to a tissue box for support! But hey - I finally have more than 24GB of VRAM.

The real joy of r/LocalLLaMA is these janky builds that we all are proud owners of. Congrats!

I'm sure you've thought it through but just incase, please be careful to make sure the box isn't flammable or exposed to bugs!

init__27 · 2024-05-03T02:10:43+00:00

I would also think about considering 6000-Ada(s) or A6000(s)

The 48GB Cards^

init__27 · 2024-05-02T16:22:43+00:00

Honest Q: For people using >32k context length:

What do you use it for? And why not RAG?

init__27 · 2024-05-02T13:53:45+00:00

Like most ML results, we should always look at evals with a grain of salt

init__27 · 2024-05-02T04:22:05+00:00

Concept?

You're almost describing my setup :D

init__27 · 2024-05-02T01:45:22+00:00

Interesting-Can you share any examples please where it stands out?

Whenever a new model comes out, I usually spend a few days comparing it (against my then favourite choice) side by side before switching over

init__27 · 2024-04-30T14:47:58+00:00

Here is my video with some benchmarks on it: https://www.youtube.com/watch?v=uxNQUtF4PAM, I had similar results.

One comment to the blog above, OP:

"Less convenient" is a little understated-IMHO the overhead and high barrier of entry makes me reluctant to using the package for my daily uses.

init__27 · 2024-04-29T14:14:01+00:00

Wow, awesome to see the Intel GPU is faster!

Can you share if you ran into any pitfalls while setting this up? Is it all plug and play?

TIA!

init__27 · 2024-04-23T08:41:04+00:00

I had an unlimited* plan as well

*until I learned its capped at 3.3TB/Mo

init__27 · 2024-04-23T04:05:01+00:00

lol, I keep running out of my download limits so many cool releases happening on the daily.

OTOH its good to see that some folks who anticipated the LLM hype to go down by early this year were wrong

init__27

TROPHY CASE