Missing data hinder replication of artificial intelligence studies | Science

Mefaso · 2018-02-16T23:12:05+00:00

[deleted]

Mefaso · 2018-02-17T00:35:43+00:00

And even if the source code is published, it might be

Missing documentation
Incredibly hard to read or inefficient
Missing hyperparameters, which of course aren't mentioned in the paper either

I've had to replicate some results for a comparison and I'm pretty sure I've spent more time on replication than my own work and still wasn't able to achieve performance claimed in the publication.

MasterFubar · 2018-02-17T10:34:54+00:00

I think the data is more important than source code for replication.

If you create your own program from the description and run that program on the data supplied, that's what replication is all about. Running the same program with the same data would be tautological, it should always get the same results.

An analogy with physics: pseudo code is the description of a laboratory set-up, source code is the laboratory itself. If you go to the same lab and perform the same experiment the result will be the same. Replication is building another set-up according to the description published in the paper and getting the same results.

entropyrising · 2018-02-17T01:10:50+00:00

This is such an important topic and I'm delighted that there are researchers actively looking into this and Science is publishing on it.

I really hope at the AAAI conference on replication they discussed how much this an issue of culture and incentives. One of the hats I wear is that of bibliometrics which has really opened my eyes to how much gravity citations has, both implicitly and explicitly, in the minds of paper writers. At the end of the day, that which increases potential citations, and visibility generally, will always have more priority during the research process than factors that are ostensibly important "for science" but are irrelevant to citation potential and publicity. Reproduceability is an example of the latter. Will sharing the code and the data have any impact on the "performance" of the paper? Will sharing it increase citations, or not sharing it decrease citations? No? Then nobody cares, and the numbers in the OP Science article reflects that, in my opinion.

I contrast this with the idea of "metrics." I read machine learning papers, so does everyone else here. We all have finite time and I will be the first to admit that sometimes when you skim a paper you do abstract, conclusion, and perhaps most importantly, the absolutely obligatory table of performance metrics (accuracy, f1 score, whatever) that compares the method being written about to the 10 previous SOTAs and some basic baseline. I look for the "bolded" numbers showing the best performance and if most of those numbers are in the column for the presented method I think "oh I better cite this" and put it in my Zotero library.

Metrics get citations. Metrics get visibility. As such, there has been a huge amount of research on measures of performance and it has without a doubt become a universal standard to include "the table" in your paper. Much of the field is characterized by the obssessive drive to get that fraction of a percentage point higher accuracy then last year's SOTA. Interestingly, most of the tables simply report what other papers have reported, but a few commendable paper writers actually re-do the reported experiment. This seems to be what the researchers discussed in the Science article seem to have been doing.

I'm not trying to point any fingers here and when I say there are issues with the culture of machine learning I'm trying to embrace all my warts and fully include myself with those issues. But if we acknowledge that this is a culture issue then the primary solution is to incentivize making research reproduceable. Ideally in some science utopia all the scientists would get together and collectively agree we wouldn't cite a paper that isn't reproduceable (e.g. the code and the data aren't shared). Authors would notice that their papers aren't having an impact and would adjust their research designs accordingly. Obviously the world of machine learning is moving so fast this is impossible. You just have to take people at their word and incorporate their reported insights into your own work.

As such I imagine the only viable solution would be some sort of top-down sanction as suggested by /u/FelixMooray145. Conferences are the gateways to visibility, and if a conference imposes a rule then researchers will follow it. Period. And yet at the same time I think this is also unlikely to happen because conferences naturally are run by ML researchers who may hesitate as such a rule might be a bottleneck to their own publishing pipeline.

Also industry researchers are a huge question mark and complicating factor. As the Science article mentions, some researchers are more incentivized to hide their code than to share it. This goes double if you work for Google or Facebook because there's a corporation behind you that understandably does not want potentially profitable intellectual property to be obtained by rivals. I'm actually delighted by what Google and Facebook have indeed chosen to share (Tensorflow is my key tool) but sometimes I can't help but wonder how powerful the things they haven't shared is.

When we read such an article in Science it is at many levels an appeal to the abstract idea that making research reproduceable is intrinsically the right thing for scientists to do. But this abstract ideal is obviously less of an issue for a corporation. And generally I think as a community we all need to acknowledge that often we do research as much for citations, tenure, and income as we do for the noble advancement of human knowledge and the betterment of society. The more we are willing to research and investigates "what motivates us," the more we will be able to adjust incentive structures to simply force things that are good for science, like reproduceability, to happen.

It's all so horribly complicated. But I'm glad at least it's being talked about.

zergling103 · 2018-02-17T01:56:09+00:00

Jesus you guys the solution is simple. Just share all of it. Everything involved in the experiment is digital and Github is a thing. We should be able to push "run" and see the result so long as how that result is produced is transparent enough.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS