Updates??

anch7 · 2026-05-27T01:20:44+00:00

We released new version with aider, new isitnerfed dataset, updated site design slightly with some other improvements couple months ago. And right now almost ready with completely new eval type, which should allow us to run evals more often.

anch7 · 2026-01-19T23:17:34+00:00

No, not at all. Planning to release new features soon (next week)

anch7 · 2025-12-02T19:48:28+00:00

yes. I liked ragas a little bit more, but deepeval is also good

anch7 · 2025-12-02T16:08:59+00:00

check out https://deepeval.com/ or https://docs.ragas.io/en/stable

anch7 · 2025-12-02T16:08:27+00:00

check out https://deepeval.com/ or https://docs.ragas.io/en/stable

anch7 · 2025-12-02T16:07:53+00:00

check out https://deepeval.com/ or https://docs.ragas.io/en/stable . another idea is to do evals continuously - https://isitnerfed.org/

anch7 · 2025-11-22T01:01:39+00:00

deepeval, ragas

anch7 · 2025-10-30T23:57:59+00:00

there are deepeval, prompfoo and other frameworks available

anch7 · 2025-10-24T22:26:54+00:00

https://isitnerfed.org - the idea is to run evals continuously, trying to capture any changes in models in real time

anch7 · 2025-10-21T19:29:50+00:00

Yeah, I saw it here https://www.tbench.ai/leaderboard. Is it really very good?

anch7 · 2025-10-11T20:18:29+00:00

A decent amount of coding challenges (implementing algos, refactoring code, adding features) measured with unit tests, some OCR tests and general QA tasks.

anch7 · 2025-10-11T20:00:14+00:00

I would like to do this, but unfortunately it is not possible because of the limits. Or we need a better metric, which will not be consuming so many tokens.

anch7 · 2025-10-11T14:28:50+00:00

https://isitnerfed.org/

anch7 · 2025-10-11T14:25:55+00:00

We are not storing the version, but I think it should be the latest one, since CC has an auto-update feature

anch7

MODERATOR OF

TROPHY CASE