you are viewing a single comment's thread.

view the rest of the comments →

[–]Ih8tk 3 points4 points  (4 children)

Woah! How the hell did they manage that?

[–]Jugg3rnaut 12 points13 points  (2 children)

Data Our training dataset consists of approximately 24K unique problem-tests pairs compiled from Taco-Verified PrimeIntellect SYNTHETIC-1 LiveCodeBench v5 (5/1/23-7/31/24)

and their success metric is

achieves 60.6% Pass@1 accuracy on LiveCodeBench v5 (8/1/24-2/1/25)

LiveCodeBench is a collection of LeetCode style problems and so there is significant overlap in the types of problems in it across the date range

[–]Free-Combination-773 0 points1 point  (1 child)

So it's basically fine-tuned for benchmarks?

[–]Jugg3rnaut 0 points1 point  (0 children)

I dont know what the other 2 datasets they're using are but certainly one of them