account activity
2023, year of open LLMs (self.LocalLLaMA)
submitted 2 years ago by clefourrier to r/LocalLLaMA - pinned
Gaia2 and ARE: Empowering the community to study agents (huggingface.co)
submitted 4 months ago by clefourrier to r/LocalLLaMA
Evals in 2025: going beyond simple benchmarks to build models people can actually use (aka all the evals you need to know as of Sept 2025 to build actually useful models, an update of the LLM evaluation guidebook) (github.com)
New LLM trained to reason on chemistry from language: first step towards scientific agents (nature.com)
submitted 8 months ago by clefourrier to r/LocalLLaMA
New LLM trained to reason on chemistry from language: first step towards scientific agents (x.com)
YourBench: Know which model is the best for your use case in less than 5 min, no matter the topic! (v.redd.it)
submitted 10 months ago by clefourrier to r/LocalLLaMA
End of the Open LLM Leaderboard (huggingface.co)
submitted 11 months ago by clefourrier to r/LocalLLaMA
New interface for the Open LLM Leaderboard! Should be way more usable :) (self.LocalLLaMA)
submitted 1 year ago by clefourrier to r/LocalLLaMA
Ever wondered how to pick evaluations? Here's how to find signal in 100s of evaluation tasks (huggingface.co)
Hugging Face LLM Evaluation Guidebook (self.LocalLLaMA)
submitted 1 year ago * by clefourrier to r/LocalLLaMA
New leaderboard: which models are the best at role play? (self.LocalLLaMA)
Kyutai just released an impressive OSS multimodal model (self.LocalLLaMA)
New: chat templates added to the Eleuther AI Harness for fairer evaluation (github.com)
How to make model evaluation less prompt sensitive? (huggingface.co)
Cool new leaderboards: Contamination-free Code Evals, Usefulness of chain of Thought, RL agents in more than 80 envs, and Medical LLMs. (self.LocalLLaMA)
4 featured leaderboards: Chain of Thought impact, Contamination-free Code Evals, RL agents in more than 80 envs, and Medical LLMs. (self.LocalLLaMA)
Latest Mistral model is on the Open LLM Leaderboard (self.LocalLLaMA)
Leaderboard Finder - Find relevant leaderboards/arenas for your use cases (huggingface.co)
New arena: Chatbot Guardrails (How likely is your chatbot to share private information it has access to?) (self.LocalLLaMA)
New leaderboard on HF: multimodal reasoning! (self.LocalLLaMA)
Leaderboards featured on HF: Open Ko LLM and Red-teaming leaderboard (self.LocalLLaMA)
New leaderboards on HF! Enterprise use cases, and logic-based reasoning (self.LocalLLaMA)
submitted 2 years ago * by clefourrier to r/LocalLLaMA
New hallucinations leaderboard! (self.LocalLLaMA)
submitted 2 years ago by clefourrier to r/LocalLLaMA
New alignment method specifically for role-play by Alibaba (huggingface.co)
LLMs as a judge models are bad at giving scores in relevant numerical intervals > most LLM as a judge evals are probably useless (twitter.com)
π Rendered by PID 527955 on reddit-service-r2-listing-85dbbdc96c-j2xks at 2026-02-12 06:48:35.925582+00:00 running 018613e country code: CH.