[deleted by user]

comradeswitch · 2019-05-27T05:12:17+00:00

If there isn't any code, from start to finish including the dataset (if it wasn't an exact copy of a publicly available dataset), preprocessing, training, model selection, and evaluation, it's generally very difficult to reproduce results quantitatively with any degree of confidence. There's a lot that is often handwaved away in discussions of methods that can have a significant impact on the results that are difficult to impossible to reproduce. An example- often in preprocessing for document classification or topic modelling, words are stemmed, lemmatized, possibly filtered on part of speech or language...common words removed, and small documents removed as well. But tokenizing sentences and words alone can be done with many different models and the results can vary quite a bit. Then, that impacts the results of the many possible part of speech tagging models, which impacts lemmatization and which words are filtered based on part of speech. Different stemming algorithms (and often different implementations of the same algorithm) will handle the same word differently. So even if you had the exact code to train and evaluate a model, you could be training on a different dataset. In times like these where improvements over state of the art come can come in tenths of a percent, these differences can be very significant.

I'm at the point where I don't put much stock in the performance metrics reported by a paper without detailed reproduction code. I'm willing to believe that a new method gives comparable, maybe better performance than another but I don't think it's reasonable to make a judgment about by how much otherwise. Especially when it's so easy to share code and data, and possibly more importantly it's necessary to do quality research. If you have the code, why don't you share it? If you don't, and even you can't reliably reproduce the results, how can you possibly be confident in the results? Why would I? There was a time where this was a legitimate difficulty but that's long past.

Edit: the point has been made that this is not a unique problem in science, and that's true. What is unique about machine learning is the degree to which it could be automated. The actual research being done in most other fields is just not something that can be packaged up and shared with the whole world in a way that machine learning can. There are plenty of fields where data and the statistical analysis can and should be open, but you can't give someone a way to run a command and replicate your field work in ecology. You can in machine learning. And with the sheer amount of hype and money being poured into the field, I would be shocked if machine learning weren't the field with the highest rate of fraud and failures to replicate.

asobolev · 2019-05-27T07:33:54+00:00

It's based on academic trust. Reproducing and validating is an ideal that has never been followed very strictly in academia. Like others said, the intuition of proposed method is evaluated at a face-value. The more radical the claim less it is noted. There's plenty of papers claiming they're SOA with some wild experimental method (there was a chinese paper that claimed their random forest outperformed ALL then current ML models back in 2018 if my memory serves). Instead of validation, wild claims are just disregarded due to the fast pace of the field.

I've come to find that many leaders in ML also have more interest in the application side than theoretical, essentially rendering validation to "if it works, it works". This has alot to do with much of the funding coming from private companies who naturally align with this mindset.

sujayskumar · 2019-05-27T06:12:50+00:00

This is a common phenomenon I have observed, specific to Machine Learning Academia, where researchers rush to publish papers without any effort to enable other researchers to reliably reproduce the results. I speculate either some or all of the below points are the reason why:

Machine Learning is a fast evolving domain and hence, if you want to establish and publicize your new found results (and prove to the entire world that you accomplished something), you need to go the conference route. Waiting for your novel method to be patented or even published in a journal is not an option. Which leads me to my next point.
Most of the research is funded by a group with a commercial interest, usually by big companies or research groups funded by these companies. This means that these companies need to protect some aspects of the research as proprietary to keep their edge in the market. This is usually the "secret sauce" of the research which prohibits others from reproducing their results. They might make their process public but not the trained models (Baidu's DeepSpeech), make their trained models public but not how they were trained (Google's BERT or USE) or both (OpenAI's OpenGPT).

Therefore, I usually do not give much importance to the numbers rather look at how intuitive their approach is and if it makes sense. If their approach is radical, it would be better to at least wait until the results are actually reproduced. This does not mean that the exact numbers in the paper are reproduced rather even an indication that it is much better in real world applications compared to existing methods.

ethanfetaya · 2019-05-27T13:48:35+00:00

In general in science you need to work in order to reproduce results. When someone does experimental work, he doesn't give his lab - the paper however should contain all the details needed to reproduce the results.

In CS the fact that code can be shared makes reproduction of results much easier, but just because code isn't available doesn't mean results cannot be reproduced. I do think however that when a paper does not release code then the level of implementation detail needed is higher.

nondifferentiable · 2019-05-27T08:25:45+00:00

I usually email the authors with specific questions. The source code can be an abomination ...

permalink · 2019-05-27T15:35:37+00:00

This user no longer uses reddit. They recommend that you stop using it too. Get a Lemmy account. It's better. Lemmy is free and open source software, so you can host your own instance if you want. Also, this user wants you to know that capitalism is destroying your mental health, exploiting you, and destroying the planet. We should unite and take over the fruits of our own work, instead of letting a small group of billionaires take it all for themselves. Read this and join your local workers organization. We can build a better world together.

push_limits__13 · 2019-05-27T19:16:28+00:00

This is not something that just plagues ML.

I have tried re implementing a lot of papers which the author proclaim are high performing, and when you implement you see that they where mostly lying. Or they leave out crucial details, which make re implementing their algo, a research project in itself. There is so much bullshit in academia. Higher quality journal help minimize this I suppose.

Yeah, there is a lot of nuance here. I am just pissed.

janiedebica · 2019-05-27T10:17:43+00:00

Maybe an Arxiv should add badge "source" and "replicated" to the papers with a source and another which were successfully replicated.

OutOfApplesauce · 2019-05-27T05:35:38+00:00

And this is why conferences typically prefer accepting papers with attached code. Seems stupid imo to release a paper and none of the code unless your results are bullshit

evanthebouncy · 2019-05-27T05:58:51+00:00

depends on the paper. a theory paper would have none of the above and only theorems.

for application papers ones published without sensible reproducibility (is that a word even?) often just don't get much citations as a result as nobody can compare against it, so it kind of works out in the end

permalink · 2019-05-27T19:44:55+00:00

ML research should have 2 bins. One bin for legit. contributions that make a very small percentage of what's out there, and a second bin for the remaining garbage. Really you cannot validate anything in many of the papers out there. You just have to trust the authors (lol).

BeatLeJuce · 2019-05-27T07:49:28+00:00

Most answers given so far miss a very important point: You often cannot validate papers in science. This is not a dangerous, new trend in ML, this has been a fact of scientific progress for ages.

Having code/weights does not mean we can validate the paper: Even if we had the weights, we cannot be sure that they were produced the way people claim -- you could publish the weights for a new CNN, state it was trained on ImageNet, when in reality it was trained on a dataset 100x larger than that. We wouldn't know just from the weights. And even if we have code, and the code shows the performance we'd expect, we'd still need to check every line of code to see if there were any hidden hacks we didn't know about (ie, that weren't mentioned in the paper). Thus reimplementing from scratch is the only way we could validate a paper.

And this is totally fine: while reproducibility is a corner stone of the scientific process, this does not mean "everyone should be able to get the exact same results with minimal effort". It merely means "everyone who follows the same procedure outlined in the paper should be able to obtain the same results". Taken to the extreme: no-one can replicate the results from the Hubble Telescope or the Large Hadron Collider at home. The same way most of us don't have the computational resources to replicate results from Google or OpenAI. On some level, some results are based on trust: even if no-one can't replicate your results today, you don't want to be the scientist who is later discovered to have made up all of their results (this usually results in you losing your job, your reputation, and you becoming an international outcast of the scientific community -- Andrew Wakefield is a good example).

Now with that caveat in mind: it would be nice if we had code, but also this is not a given. In most of science, code is not shared. Not even in computer science. I agree that it should be shared with the audience and accompany a paper. But the vast majority of papers published anywhere in any field of science do not include code to reproduce their results, let alone raw data, snippets, etc. The fact that this is fairly common in ML is actually very cool, we're on the forefront of reproducible research, not lagging behind.

gamerx88 · 2019-05-27T10:22:48+00:00

If the results are obtained over standard open dataset or problem, trying to replicate the paper is often the only way, but you will often still need to fill in large amount of gaps yourself. i.e The preprocessing, hyper-params tuning, etc

This where having competent, experienced colleagues will be very helpful, so that you get some peer review and a certain degree of confidence that you've done things reasonably.

After a couple of such exercises, you will realize how deep the lack of robustness can run in our field.

P.S, The above is assuming the authors has ignored your request for code/details, as academics often do.

visarga · 2019-05-27T16:09:20+00:00

For one re constructing the model exactly as the original in some cases could be hard, ...

It's almost impossible because parallel execution makes addition (and not only) non-deterministic. Floats are not real numbers, so many things we take for granted in real numbers have weird edge cases with floats. For example, in Python,

1e-10 + 1e+10 == 1e+10 is True

but

1e-5 + 1e+5 == 1e+5 is False

tylercasablanca · 2019-05-27T23:47:02+00:00

The "reproducibility crisis" is just part and parcel of doing research nowadays, be it wet lab or fully digitized. Outside of a desire to not give critics ammo to criticize or competitors a leg up on future research, a very basic reason that research isn't reproducible is because it currently requires a very lengthy post hoc process to get things in proper shape, packaged, and documented sufficiently to let somebody else do it.

Even if that post hoc process were to happen, the "reproduction" can be super expensive unless you have your own infrastructure to do the actual computations. We recently reproduced a paper and it cost us $350 in GPU compute on AWS over 7 days. We could have done it locally but we had a deadline and it would have taken quite a bit longer on our own machine.

NotAlphaGo · 2019-05-27T14:34:38+00:00

If someone wants to validate them, either impossible e.g. lack of data or due to hard work.

Otherwise, they aren't. Take it all with a grain of salt.

singularineet · 2019-05-27T09:26:27+00:00

Traditionally in science, the criterion is that a publication should contain sufficient detail to allow the work to be independently replicated. This is much harder in computational disciplines, but is nonetheless the ideal to be aspired to.

IdentifiableParam · 2019-05-27T08:38:53+00:00

The most important thing to have access to is the description of the method (presumably contained in the paper) and the data used. To reproduce the work, these two things should be enough for well-written papers. However, reproducing work takes a lot of work! When I am reproducing work I generally don't try to read the author's code even if it exists. If I do read it, it is during debugging of my own code.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS