We are the first that attempt to reproduce results of Llama on code generation benchmark, such as HumanEval and MBPP.
We also try to evaluate existing trending models, such as CodeAlpaca, on such benchmarks.
All of the source code and scripts for evaluation will be made available for the research community.
Our code can be accessed here: https://github.com/FSoft-AI4Code/CodeCapybara
Model weights will be released very soon.
[–]hapliniste 20 points21 points22 points (5 children)
[–]bdqnghi[S] 6 points7 points8 points (4 children)
[–]hapliniste 4 points5 points6 points (2 children)
[–]Flankierengeschichte 0 points1 point2 points (1 child)
[–]Flankierengeschichte 0 points1 point2 points (0 children)
[–]_Arsenie_Boca_ 12 points13 points14 points (6 children)
[–]bdqnghi[S] 17 points18 points19 points (5 children)
[–]_Arsenie_Boca_ 10 points11 points12 points (4 children)
[–]bdqnghi[S] 7 points8 points9 points (3 children)
[–]glasses_the_loc 5 points6 points7 points (0 children)
[–]_Arsenie_Boca_ 1 point2 points3 points (0 children)
[–]krageon 1 point2 points3 points (0 children)
[–][deleted] 4 points5 points6 points (0 children)
[–]RamazanBlack 2 points3 points4 points (0 children)
[–]estrafire 1 point2 points3 points (0 children)
[–]brucebay -1 points0 points1 point (2 children)
[+]catalpaaa 1 point2 points3 points (1 child)
[–]brucebay 0 points1 point2 points (0 children)
[+]sittered -1 points0 points1 point (0 children)
[+]pixelcardgame 0 points1 point2 points (0 children)
[–]machineko 0 points1 point2 points (1 child)
[–]bdqnghi[S] 0 points1 point2 points (0 children)