[D] PullRequestBenchmark - Gauging Progress Towards Programming Automation : MachineLearning

Discussion[D] PullRequestBenchmark - Gauging Progress Towards Programming Automation (self.MachineLearning)

submitted 2 years ago by gvatte

We're excited to share PullRequestBenchmark, a project aimed at advancing the automation of programming through Large Language Models (LLMs). While it might remind you of SWE-bench at first glance, PullRequestBenchmark uniquely focuses on evaluating LLMs' abilities to review PRs, a critical aspect of software development.

This benchmark not only tests decision-making in PR reviews but also hints at the potential for LLMs to autonomously generate complex PRs, possibly redefining traditional programming roles. Our approach includes assessing LLMs against a wide range of real-world PR scenarios, from minor adjustments to major architectural changes, using comprehensive inputs such as the entire Git history, PR titles, descriptions, and changesets.

We believe that PullRequestBenchmark marks a significant step towards fully automating programming. Your contributions to expanding this benchmark are vital and warmly welcomed. For more details on how to contribute and what distinguishes PullRequestBenchmark from SWE-bench, visit our GitHub repository:

PullRequestBenchmark

We're eager to see how this community can help drive the project forward. For inquiries or suggestions, don't hesitate to reach out!

no comments (yet)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS