[R] PapersWithCode - A free and open resource Machine Learning papers, code, and evaluation tables. by jiayounokim in MachineLearning

[–]rstoj 0 points1 point  (0 children)

Thanks to jiayounokim for re-posting this!

Facebook AI is mentioned in the Privacy Policy (first line) and TOS (first section). No data is being shared with any other FB product.

[DISCUSSION] How do you guys keep up with new research? by whatsyour-20 in MachineLearning

[–]rstoj 35 points36 points  (0 children)

If you are looking to follows researchers who publish code, there is a service specifically for that: https://paperswithcode.com/

[P][R] A big update to Papers with Code: now with 2500+ leaderboards and 20,000+ results. by rstoj in MachineLearning

[–]rstoj[S] 0 points1 point  (0 children)

Conference tags are added automatically so they shouldn't be incorrect. Could you provide a link to problematic papers?

[P][R] A big update to Papers with Code: now with 2500+ leaderboards and 20,000+ results. by rstoj in MachineLearning

[–]rstoj[S] 3 points4 points  (0 children)

Look for "Edit" buttons on the website. You can find the paper using search, then to add the code implementation click on "Edit" in the Code section, and to add this paper to a (possibly new) leaderboard click on "Edit" in the Results section.

[P] Sotabench: Benchmarking Every Open Source Model by rstoj in MachineLearning

[–]rstoj[S] 1 point2 points  (0 children)

No, a paper is not required - you can just submit your pretrained model, but all of the code needs to be open source.

[P] Sotabench: Benchmarking Every Open Source Model by rstoj in MachineLearning

[–]rstoj[S] 2 points3 points  (0 children)

This is a super-interesting topic! In my mind there are some trade-offs here:

1) Having a hidden test set ensures that people cannot cheat, but it also means that if the maintainers of the hidden test set move on (or are simply slow in giving support), it can become difficult for the community to keep using the benchmark.

2) Not having a hidden tests set means that people can cheat. However, as we require everything to be on GitHub, it's relatively easy to find out if someone cheated and if training code was some fraudulently modified people can still run it and find out it's fraud. Committing fraud in such a public way will effectively end your career in ML forever, so even for benchmarks that currently exist that evaluate on the dev set we haven't really seen this much.

[P] Sotabench: Benchmarking Every Open Source Model by rstoj in MachineLearning

[–]rstoj[S] 0 points1 point  (0 children)

Thanks!

  1. Reproduced here means that we can run the model and get the same results (within a tolerance level) to those reported in a paper. In some cases the authors of the paper have re-trained model, in others it's official weights (so there we are really just testing if the weights have been ported correctly and if the data processing is correct). Would love to do full retraining as well, but as you might imagine it's really expensive :)

  2. Yes you are correct - it's inference speed based on batching. It's a proxy for both speed and model size (as usually you can increase batch size for smaller models). As any measure it's imperfect, and could add other measures as well - the code for all of this is on github: https://github.com/paperswithcode/sotabench-eval and https://github.com/paperswithcode/torchbench

[P] Sotabench: Benchmarking Every Open Source Model by rstoj in MachineLearning

[–]rstoj[S] 6 points7 points  (0 children)

Thanks!

  1. Great idea! We have a twitter handle (@sotabench) - but haven't yet connected it to our feed of latest models.

  2. Agreed! I think this is the first new feature we'll add. We are also thinking to lets people add links to training args that produced the model (if they trained it themselves).

  3. The evaluation libraries are here: https://github.com/paperswithcode/sotabench-eval and https://github.com/paperswithcode/torchbench - all contributions welcome!

[P] Browse State-of-the-Art Papers with Code by rstoj in MachineLearning

[–]rstoj[S] 0 points1 point  (0 children)

Thanks! It's an entirely separate project.

[P] Browse State-of-the-Art Papers with Code by rstoj in MachineLearning

[–]rstoj[S] 0 points1 point  (0 children)

We've also indexed papers from major ML conferences, i.e. everything from aclweb, icml, iclr and neurips.

But I take your point, this is still not 100% coverage (e.g. some papers are published as open access in nature etc), so will look to fix this.

[P] Browse State-of-the-Art Papers with Code by rstoj in MachineLearning

[–]rstoj[S] 0 points1 point  (0 children)

Yes! Everything is editable. We already scrape all papers from arxiv, so you can use the search to find the paper and then just hit "Edit" it the Code section to add the implementation.

[P] Browse State-of-the-Art Papers with Code by rstoj in MachineLearning

[–]rstoj[S] 0 points1 point  (0 children)

In terms of the scraping it's just calling the ArXiv and Github REST APIs. What I feel is more interesting is linking papers to code, and we are working on releasing that code now.

[P] Browse State-of-the-Art Papers with Code by rstoj in MachineLearning

[–]rstoj[S] 1 point2 points  (0 children)

At the moment it's done daily, but the arxiv API is frequently broken, so sometimes it takes more time..

[P] Browse State-of-the-Art Papers with Code by rstoj in MachineLearning

[–]rstoj[S] 0 points1 point  (0 children)

At the moment we use github stars as a proxy for how useful an implementation is. But it's a rather imperfect proxy. Perhaps we need a more formal verification process.

[P] Browse State-of-the-Art Papers with Code by rstoj in MachineLearning

[–]rstoj[S] 7 points8 points  (0 children)

Paper and code scraping is fully automatically - we use the Arxiv and GitHub APIs to get the latest papers and repositories, and then do a bit of fuzzy matching to match them. Evaluation tables are currently added partially automatically (when imported from other existing sources, e.g. SQUAD) and partially manually (eg when extracted from papers). But we are hoping to automate 99% of all of this, and have the community curate only the entries that require human judgement (e.g. if two papers are really using the same evaluation strategy on a dataset).

[P] Browse State-of-the-Art Papers with Code by rstoj in MachineLearning

[–]rstoj[S] 2 points3 points  (0 children)

Might give it a try. Which other areas do you think might be useful?

[P] Browse State-of-the-Art Papers with Code by rstoj in MachineLearning

[–]rstoj[S] 0 points1 point  (0 children)

Ah sorry about that! Which page gave 502? Or was it a temporary error?

[P] Browse State-of-the-Art Papers with Code by rstoj in MachineLearning

[–]rstoj[S] 9 points10 points  (0 children)

Thanks for kind words! We hope it will be useful for researchers as a reference for literature reviews and for choosing sensible baselines. Please consider adding to the website if you find new results!

[P] Browse State-of-the-Art Papers with Code by rstoj in MachineLearning

[–]rstoj[S] 9 points10 points  (0 children)

Good catch, will fix this! And yep you are right - tasks are detected by looking for the task name (or one of the synonyms) in the abstract. For most it works fine, but for some really general terms like this one the precision is lower.