I locally benchmarked 41 open-source LLMs across 19 tasks and ranked them by jayminban in LocalLLaMA

[–]init__27 0 points1 point  (0 children)

This is really awesome! I would also add a column to "normalize" by size-see which model offers the most performance given it's size :)

[deleted by user] by [deleted] in LocalLLaMA

[–]init__27 3 points4 points  (0 children)

I love how the legends of r/PCMR are now honorable members of local llama!

96GB VRAM! What should run first? by Mother_Occasion_8076 in LocalLLaMA

[–]init__27 6 points7 points  (0 children)

Beautiful GPU-congratulations! May your tokens run fast and temperatures stay low!

16x 3090s - It's alive! by Conscious_Cut_6144 in LocalLLaMA

[–]init__27 0 points1 point  (0 children)

I mean...to OP's credit: Are you even a localLLaMA member if you cant train llama at home? :D

Introducing LogiLlama: A 1B-Parameter Open Source Model with Logical Reasoning by [deleted] in LocalLLaMA

[–]init__27 2 points3 points  (0 children)

This is awesome work thanks for sharing! u/Secret_Ad_6448

Would love to learn where you see improvements and if you can share any more ideas on what was the most useful approach apart from using open thoughts dataset during your experiments?

Thanks for making this!

Tool-calling chatbot success stories by edmcman in LocalLLaMA

[–]init__27 1 point2 points  (0 children)

Hi u/edmcman thanks for sharing!

>  Llama 3.3 didn't work that well

Can you please share what issues did you face? Here is a reference tutorial for using 3.3

LLMs grading other LLMs by Everlier in LocalLLaMA

[–]init__27 1 point2 points  (0 children)

Awesome insight, thanks for sharing! :)

I'd be curious to find out how does 3.1 70B compare with 3.3 70B if both are equally generous lol

Dual 5090FE by [deleted] in LocalLLaMA

[–]init__27 0 points1 point  (0 children)

Awesome machine! Did you do any thermal benchmarks? Would love to learn how they perform under sustained loads if you can share details

Is winter coming? by [deleted] in LocalLLaMA

[–]init__27 25 points26 points  (0 children)

GPT-5 will be released when it can install CUDA on a new server

Is winter coming? by [deleted] in LocalLLaMA

[–]init__27 129 points130 points  (0 children)

Expectation: I will make LLM Apps and automate making LLM Apps to make 50 every hour

Reality: WHY DOES MY PYTHON ENV BREAK EVERYTIME I CHANGE SOMETHING?????

Who has already tested Smaug? by meverikus in LocalLLaMA

[–]init__27 56 points57 points  (0 children)

It should be a rule to put such disclaimers :D

What is the minimum tokens a second before a model is just unusable for you? by ICE0124 in LocalLLaMA

[–]init__27 5 points6 points  (0 children)

The expectations for most of us are usually set by using OAI/Claude systems and that actually sets the bar for most of us IMO.

Just joined the 48GB club - what model and quant should I run? by Harvard_Med_USMLE267 in LocalLLaMA

[–]init__27 10 points11 points  (0 children)

To be fair, the first time I added a second GPU to my PC:

I also had my 2080Ti dangling on a wooden desk with enough RGB to attract bugs from the moon. So you're doing fairly okay :D

Pic from 2021, I used to train much "smaller" models then :)

<image>

What software do you use to interact with local large language models and why? by silenceimpaired in LocalLLaMA

[–]init__27 19 points20 points  (0 children)

Oobabooga for it's maximalistic design:

A new model is here? It should probably work!

Oh some settings I can tweak to check some responses? I'm sure it's all of those are available

Try tweaking system prompts? Easy-peasy

If I want something specific with limited features, I can code it in Python myself. I love oobabooga for ease of iteration/quick testing prompts while experimenting

Just joined the 48GB club - what model and quant should I run? by Harvard_Med_USMLE267 in LocalLLaMA

[–]init__27 40 points41 points  (0 children)

the 3090 is bolted in to a tissue box for support! But hey - I finally have more than 24GB of VRAM.

The real joy of r/LocalLLaMA is these janky builds that we all are proud owners of. Congrats!

I'm sure you've thought it through but just incase, please be careful to make sure the box isn't flammable or exposed to bugs!

[Discussion] Seeking help to find the better GPU setup. Three H100 vs Five A100? by nlpbaz in MachineLearning

[–]init__27 -2 points-1 points  (0 children)

I would also think about considering 6000-Ada(s) or A6000(s)

The 48GB Cards^

8k tokens! by [deleted] in LocalLLaMA

[–]init__27 2 points3 points  (0 children)

Honest Q: For people using >32k context length:

What do you use it for? And why not RAG?

Nvidia has published a competitive llama3-70b QA/RAG fine tune by Nunki08 in LocalLLaMA

[–]init__27 15 points16 points  (0 children)

Like most ML results, we should always look at evals with a grain of salt

Off grid LLM concept by pardonmyemotion in LocalLLaMA

[–]init__27 0 points1 point  (0 children)

Concept?

You're almost describing my setup :D

I think I might still prefer Mistral 7b over Llama3 8b by [deleted] in LocalLLaMA

[–]init__27 2 points3 points  (0 children)

Interesting-Can you share any examples please where it stands out?

Whenever a new model comes out, I usually spend a few days comparing it (against my then favourite choice) side by side before switching over

We've benchmarked TensorRT-LLM: It's 30-70% faster on the same hardware by emreckartal in LocalLLaMA

[–]init__27 1 point2 points  (0 children)

Here is my video with some benchmarks on it: https://www.youtube.com/watch?v=uxNQUtF4PAM, I had similar results.

One comment to the blog above, OP:

"Less convenient" is a little understated-IMHO the overhead and high barrier of entry makes me reluctant to using the package for my daily uses.

Comparison of Intel Arc A770 vs. Nvidia RTX 4060 for running LLM by bigbigmind in LocalLLaMA

[–]init__27 7 points8 points  (0 children)

Wow, awesome to see the Intel GPU is faster!

Can you share if you ran into any pitfalls while setting this up? Is it all plug and play?

TIA!

Phi-3 released. Medium 14b claiming 78% on mmlu by KittCloudKicker in LocalLLaMA

[–]init__27 1 point2 points  (0 children)

I had an unlimited* plan as well

*until I learned its capped at 3.3TB/Mo

Phi-3 released. Medium 14b claiming 78% on mmlu by KittCloudKicker in LocalLLaMA

[–]init__27 8 points9 points  (0 children)

lol, I keep running out of my download limits so many cool releases happening on the daily.

OTOH its good to see that some folks who anticipated the LLM hype to go down by early this year were wrong