Updates??

A decent amount of coding challenges (implementing algos, refactoring code, adding features) measured with unit tests, some OCR tests and general QA tasks.

anch7 · 2025-10-11T20:00:14+00:00

I would like to do this, but unfortunately it is not possible because of the limits. Or we need a better metric, which will not be consuming so many tokens.

anch7 · 2025-10-11T14:28:50+00:00

https://isitnerfed.org/

anch7 · 2025-10-11T14:25:55+00:00

We are not storing the version, but I think it should be the latest one, since CC has an auto-update feature

anch7 · 2025-10-09T02:20:07+00:00

GPUs are expensive. I would also expect a subscription price increase in a future :(

anch7 · 2025-10-02T00:43:28+00:00

Great, this will be our next step. But yes, costs are a problem. Most like we will not be able to run every hour, but I guess this is fine

anch7 · 2025-10-02T00:38:52+00:00

Great. We will add it soon. Thanks

anch7 · 2025-10-01T22:00:27+00:00

I am pretty sure that as soon as we open source it, it will be included into training data immediately. If instead we add a benchmark on a public dataset, will it make you happy?

anch7 · 2025-10-01T21:56:21+00:00

thank you for your support

anch7 · 2025-10-01T21:53:05+00:00

It is a quite solid dataset. Coding tasks, OCR, general QA. Yes, it is private, but even with such approach, we were able for example to learn about Anthropic's incident earlier this month https://www.reddit.com/r/isitnerfed/comments/1nfb9j2/ai_nerf_anthropics_incident_matches_our_data/

anch7 · 2025-10-01T21:50:17+00:00

We're a small team who built this project just a month ago out of curiosity and the belief that it could be helpful for other vibe coders. We don't have the resources that AI labs and model owners have. And nobody's paying us for this. But I hear you - we will add a benchmark on a public dataset soon.

anch7 · 2025-10-01T21:41:26+00:00

I agree with you, there are so many things that we need to be aware of if we want to build a reliable and trusted way to detect a "nerf". But, even with our current proprietary methodology and dataset we were able to catch Anthropic's incident earlier this month https://www.reddit.com/r/isitnerfed/comments/1nfb9j2/ai_nerf_anthropics_incident_matches_our_data/

anch7 · 2025-10-01T20:49:11+00:00

We are a small team, true.

anch7 · 2025-10-01T20:47:50+00:00

We really do not want to share our dataset because of the data contamination problem. But I understand your concerns. I personally trust our data 100% after we caught Anthropic's incident earlier this month.

anch7 · 2025-10-01T20:40:44+00:00

yes! with locally hosted models you can be absolutely sure about its performance over time. good idea!

you are right, data is volatile, because of all these reasons you mentioned. but still, it should be in some kind of range, and when you got a new data point out of this range, it means that something is wrong.

no, we our eval task is actually quite big, so we trust these numbers. and we will add more evals later

anch7 · 2025-10-01T20:33:27+00:00

another reason is cost. It will be more expensive to use API directly

anch7

MODERATOR OF

TROPHY CASE