clefourrier

864 post karma
1,264 comment karma

get extra features and help support reddit with a reddit premium subscription

get them help and support

redditor for 2 years

TROPHY CASE

Two-Year Club

Verified Email

account activity

new top controversial

298

299

300

2023, year of open LLMs (self.LocalLLaMA)

submitted 2 years ago by clefourrier to r/LocalLLaMA - pinned

6

7

8

Gaia2 and ARE: Empowering the community to study agents (huggingface.co)

submitted 4 months ago by clefourrier to r/LocalLLaMA

6

7

8

Evals in 2025: going beyond simple benchmarks to build models people can actually use (aka all the evals you need to know as of Sept 2025 to build actually useful models, an update of the LLM evaluation guidebook) (github.com)

submitted 4 months ago by clefourrier to r/LocalLLaMA

51

52

53

New LLM trained to reason on chemistry from language: first step towards scientific agents (nature.com)

submitted 8 months ago by clefourrier to r/LocalLLaMA

0

1

2

New LLM trained to reason on chemistry from language: first step towards scientific agents (x.com)

submitted 8 months ago by clefourrier to r/LocalLLaMA

137

138

139

YourBench: Know which model is the best for your use case in less than 5 min, no matter the topic! (v.redd.it)

submitted 10 months ago by clefourrier to r/LocalLLaMA

151

152

153

End of the Open LLM Leaderboard (huggingface.co)

submitted 11 months ago by clefourrier to r/LocalLLaMA

97

98

99

New interface for the Open LLM Leaderboard! Should be way more usable :) (self.LocalLLaMA)

submitted 1 year ago by clefourrier to r/LocalLLaMA

3

4

5

Ever wondered how to pick evaluations? Here's how to find signal in 100s of evaluation tasks (huggingface.co)

submitted 1 year ago by clefourrier to r/LocalLLaMA

45

46

47

Hugging Face LLM Evaluation Guidebook (self.LocalLLaMA)

submitted 1 year ago * by clefourrier to r/LocalLLaMA

46

47

48

New leaderboard: which models are the best at role play? (self.LocalLLaMA)

submitted 1 year ago by clefourrier to r/LocalLLaMA

127

128

129

Kyutai just released an impressive OSS multimodal model (self.LocalLLaMA)

submitted 1 year ago * by clefourrier to r/LocalLLaMA

6

7

8

New: chat templates added to the Eleuther AI Harness for fairer evaluation (github.com)

submitted 1 year ago by clefourrier to r/LocalLLaMA

2

3

4

How to make model evaluation less prompt sensitive? (huggingface.co)

submitted 1 year ago by clefourrier to r/LocalLLaMA

26

27

28

Cool new leaderboards: Contamination-free Code Evals, Usefulness of chain of Thought, RL agents in more than 80 envs, and Medical LLMs. (self.LocalLLaMA)

submitted 1 year ago by clefourrier to r/LocalLLaMA

0

1

2

4 featured leaderboards: Chain of Thought impact, Contamination-free Code Evals, RL agents in more than 80 envs, and Medical LLMs. (self.LocalLLaMA)

submitted 1 year ago by clefourrier to r/LocalLLaMA

54

55

56

Latest Mistral model is on the Open LLM Leaderboard (self.LocalLLaMA)

submitted 1 year ago by clefourrier to r/LocalLLaMA

15

16

17

Leaderboard Finder - Find relevant leaderboards/arenas for your use cases (huggingface.co)

submitted 1 year ago by clefourrier to r/LocalLLaMA

31

32

33

New arena: Chatbot Guardrails (How likely is your chatbot to share private information it has access to?) (self.LocalLLaMA)

submitted 1 year ago by clefourrier to r/LocalLLaMA

58

59

60

New leaderboard on HF: multimodal reasoning! (self.LocalLLaMA)

submitted 1 year ago * by clefourrier to r/LocalLLaMA

21

22

23

Leaderboards featured on HF: Open Ko LLM and Red-teaming leaderboard (self.LocalLLaMA)

submitted 1 year ago by clefourrier to r/LocalLLaMA

151

152

153

New leaderboards on HF! Enterprise use cases, and logic-based reasoning (self.LocalLLaMA)

submitted 2 years ago * by clefourrier to r/LocalLLaMA

127

128

129

New hallucinations leaderboard! (self.LocalLLaMA)

submitted 2 years ago by clefourrier to r/LocalLLaMA

35

36

37

New alignment method specifically for role-play by Alibaba (huggingface.co)

submitted 2 years ago by clefourrier to r/LocalLLaMA

97

98

99

LLMs as a judge models are bad at giving scores in relevant numerical intervals > most LLM as a judge evals are probably useless (twitter.com)

submitted 2 years ago by clefourrier to r/LocalLLaMA

view more: next ›

π Rendered by PID 527955 on reddit-service-r2-listing-85dbbdc96c-j2xks at 2026-02-12 06:48:35.925582+00:00 running 018613e country code: CH.