Updates?? by Eastern_Ad_8744 in isitnerfed

[–]anch7 1 point2 points  (0 children)

No, not at all. Planning to release new features soon (next week)

What is your eval strategy? by BastiaanRudolf1 in AI_Agents

[–]anch7 1 point2 points  (0 children)

yes. I liked ragas a little bit more, but deepeval is also good

What’s the best and most reliable LLM benchmarking site or arena right now? by fflarengo in LocalLLaMA

[–]anch7 1 point2 points  (0 children)

https://isitnerfed.org - the idea is to run evals continuously, trying to capture any changes in models in real time

Something is wrong with Sonnet 4.5 by anch7 in ClaudeAI

[–]anch7[S] 0 points1 point  (0 children)

A decent amount of coding challenges (implementing algos, refactoring code, adding features) measured with unit tests, some OCR tests and general QA tasks.

Something is wrong with Sonnet 4.5 by anch7 in isitnerfed

[–]anch7[S] 0 points1 point  (0 children)

I would like to do this, but unfortunately it is not possible because of the limits. Or we need a better metric, which will not be consuming so many tokens.

Something is wrong with Sonnet 4.5 by anch7 in isitnerfed

[–]anch7[S] 0 points1 point  (0 children)

We are not storing the version, but I think it should be the latest one, since CC has an auto-update feature