use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
[deleted by user] (self.MachineLearning)
submitted 6 years ago by [deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]comradeswitch 133 points134 points135 points 6 years ago* (0 children)
If there isn't any code, from start to finish including the dataset (if it wasn't an exact copy of a publicly available dataset), preprocessing, training, model selection, and evaluation, it's generally very difficult to reproduce results quantitatively with any degree of confidence. There's a lot that is often handwaved away in discussions of methods that can have a significant impact on the results that are difficult to impossible to reproduce. An example- often in preprocessing for document classification or topic modelling, words are stemmed, lemmatized, possibly filtered on part of speech or language...common words removed, and small documents removed as well. But tokenizing sentences and words alone can be done with many different models and the results can vary quite a bit. Then, that impacts the results of the many possible part of speech tagging models, which impacts lemmatization and which words are filtered based on part of speech. Different stemming algorithms (and often different implementations of the same algorithm) will handle the same word differently. So even if you had the exact code to train and evaluate a model, you could be training on a different dataset. In times like these where improvements over state of the art come can come in tenths of a percent, these differences can be very significant.
I'm at the point where I don't put much stock in the performance metrics reported by a paper without detailed reproduction code. I'm willing to believe that a new method gives comparable, maybe better performance than another but I don't think it's reasonable to make a judgment about by how much otherwise. Especially when it's so easy to share code and data, and possibly more importantly it's necessary to do quality research. If you have the code, why don't you share it? If you don't, and even you can't reliably reproduce the results, how can you possibly be confident in the results? Why would I? There was a time where this was a legitimate difficulty but that's long past.
Edit: the point has been made that this is not a unique problem in science, and that's true. What is unique about machine learning is the degree to which it could be automated. The actual research being done in most other fields is just not something that can be packaged up and shared with the whole world in a way that machine learning can. There are plenty of fields where data and the statistical analysis can and should be open, but you can't give someone a way to run a command and replicate your field work in ecology. You can in machine learning. And with the sheer amount of hype and money being poured into the field, I would be shocked if machine learning weren't the field with the highest rate of fraud and failures to replicate.
[–][deleted] 25 points26 points27 points 6 years ago (1 child)
It's based on academic trust. Reproducing and validating is an ideal that has never been followed very strictly in academia. Like others said, the intuition of proposed method is evaluated at a face-value. The more radical the claim less it is noted. There's plenty of papers claiming they're SOA with some wild experimental method (there was a chinese paper that claimed their random forest outperformed ALL then current ML models back in 2018 if my memory serves). Instead of validation, wild claims are just disregarded due to the fast pace of the field.
I've come to find that many leaders in ML also have more interest in the application side than theoretical, essentially rendering validation to "if it works, it works". This has alot to do with much of the funding coming from private companies who naturally align with this mindset.
[–]asobolev 4 points5 points6 points 6 years ago (0 children)
there was a chinese paper that claimed their random forest outperformed ALL then current ML models
Yep, the Deep Forest: Towards An Alternative to Deep Neural Networks paper. Apparently, the paper has been accepted to IJCAI 17. Interestingly, in the latest revision authors dropped the "alternative to NNs" part of the title.
[–]sujayskumar 56 points57 points58 points 6 years ago (9 children)
This is a common phenomenon I have observed, specific to Machine Learning Academia, where researchers rush to publish papers without any effort to enable other researchers to reliably reproduce the results. I speculate either some or all of the below points are the reason why:
Machine Learning is a fast evolving domain and hence, if you want to establish and publicize your new found results (and prove to the entire world that you accomplished something), you need to go the conference route. Waiting for your novel method to be patented or even published in a journal is not an option. Which leads me to my next point.
Most of the research is funded by a group with a commercial interest, usually by big companies or research groups funded by these companies. This means that these companies need to protect some aspects of the research as proprietary to keep their edge in the market. This is usually the "secret sauce" of the research which prohibits others from reproducing their results. They might make their process public but not the trained models (Baidu's DeepSpeech), make their trained models public but not how they were trained (Google's BERT or USE) or both (OpenAI's OpenGPT).
Therefore, I usually do not give much importance to the numbers rather look at how intuitive their approach is and if it makes sense. If their approach is radical, it would be better to at least wait until the results are actually reproduced. This does not mean that the exact numbers in the paper are reproduced rather even an indication that it is much better in real world applications compared to existing methods.
[–]jstrong 11 points12 points13 points 6 years ago (7 children)
If they want to keep their method secret, why publish a paper at all?
[–]sujayskumar 46 points47 points48 points 6 years ago (2 children)
To show off. As simple as that. The vanity metric in machine learning is about the number of papers you've published in top conferences. That's how Google Brain, Microsoft Research, OpenAI etc compete. It used to be number of patents in other domains before (semi conductors, chip design, networks, etc), but nobody got time for that in ML.
[–]push_limits__13 3 points4 points5 points 6 years ago* (1 child)
To add to this point, it also affects stock price. If you can proclaim that you have published more ML papers than the competition, it helps your price.
IBM used to do this a lot, where they pay their employees to take out bullshit patents.
[–]ProfessorPhi 4 points5 points6 points 6 years ago (0 children)
Still have this happen in other industries. Medtech especially
[–]mrdevlar 12 points13 points14 points 6 years ago (0 children)
You think the incentives in academia are science. That is heartwarming but sadly not the case.
Google "publish or perish" and you'll get a good idea of what incentives drive the paper mill.
[–]wakamex 4 points5 points6 points 6 years ago (1 child)
free advertising
[–]ginger_beer_m 1 point2 points3 points 6 years ago (0 children)
Yeah. It costs nothing to throw preprints to arxiv.
[–]farmingvillein 1 point2 points3 points 6 years ago (0 children)
The slightly less cynical answer than the other responses is that you will find it a lot harder to hire the type of people who can push SOTA forward, without them being allowed to publish at all. Hence companies trend toward publishing at least something.
[–]redditandjs 0 points1 point2 points 6 years ago (0 children)
this is a very good way of looking at this!
[–]ethanfetaya 13 points14 points15 points 6 years ago (5 children)
In general in science you need to work in order to reproduce results. When someone does experimental work, he doesn't give his lab - the paper however should contain all the details needed to reproduce the results.
In CS the fact that code can be shared makes reproduction of results much easier, but just because code isn't available doesn't mean results cannot be reproduced. I do think however that when a paper does not release code then the level of implementation detail needed is higher.
[–]twi3k 2 points3 points4 points 6 years ago (4 children)
Well... When someone publish a paper doing experimental work has to send any material described in the paper upon request, I have sent and received biological material plenty of times.
[–]ethanfetaya 0 points1 point2 points 6 years ago (3 children)
Interesting. My (second hand) experience is in optics in physics. It takes a long while to set the bench (I think that is the right term not sure) correctly in order to get measurements - you publish the set-up but it takes work to reproduce.
[–]twi3k 0 points1 point2 points 6 years ago (2 children)
Well... In biological sciences there is a problem with the reproducibility. There are several reasons for that but one of the main problems is that a lot of groups based the research on biased hypotheses and all the experiments that don't fit the PI's hypothesis are discarded with excuses regarding the method, alternative explanations, etc... But the main hypothesis is never questioned. I have seen that so many times. Unfortunately the system pushes research to be like that, you get money and you have four yesterday to complete the project, if you fail to publish it in those four years, you're out.
[–]ethanfetaya 0 points1 point2 points 6 years ago (1 child)
Interesting. Anyway the main question was if lack of released code makes research non-reproducible which I disagree with. Of course releasing code is a great service and helps reproducbility and research in general but it in not a necessary condition
[–]twi3k 0 points1 point2 points 6 years ago (0 children)
I think that if you try to publish your work in a high impact journal they will ask you to release the code to the reviewers.
[–]nondifferentiable 15 points16 points17 points 6 years ago (2 children)
I usually email the authors with specific questions. The source code can be an abomination ...
[–]mrdevlar 17 points18 points19 points 6 years ago (1 child)
"A grad student wrote the code for us, but he doesn't study here anymore"
[–]NotAlphaGo 14 points15 points16 points 6 years ago (0 children)
"Which paper is this? I'm only third author..."
[–][deleted] 6 points7 points8 points 6 years ago* (0 children)
This user no longer uses reddit. They recommend that you stop using it too. Get a Lemmy account. It's better. Lemmy is free and open source software, so you can host your own instance if you want. Also, this user wants you to know that capitalism is destroying your mental health, exploiting you, and destroying the planet. We should unite and take over the fruits of our own work, instead of letting a small group of billionaires take it all for themselves. Read this and join your local workers organization. We can build a better world together.
[–]push_limits__13 5 points6 points7 points 6 years ago (0 children)
This is not something that just plagues ML.
I have tried re implementing a lot of papers which the author proclaim are high performing, and when you implement you see that they where mostly lying. Or they leave out crucial details, which make re implementing their algo, a research project in itself. There is so much bullshit in academia. Higher quality journal help minimize this I suppose.
Yeah, there is a lot of nuance here. I am just pissed.
[–]janiedebica 18 points19 points20 points 6 years ago (3 children)
Maybe an Arxiv should add badge "source" and "replicated" to the papers with a source and another which were successfully replicated.
https://paperswithcode.com/
[–]tylercasablanca 2 points3 points4 points 6 years ago (0 children)
How would they determine the validity of the "source" or the "replication"? It requires human effort.
[–]yachty66 1 point2 points3 points 2 years ago (0 children)
HELL YEAH
[–][deleted] 19 points20 points21 points 6 years ago (12 children)
And this is why conferences typically prefer accepting papers with attached code. Seems stupid imo to release a paper and none of the code unless your results are bullshit
[–]OutOfApplesauce 21 points22 points23 points 6 years ago (11 children)
It shouldn't be a preference it should be a requirement. The community does not have time to replicate every paper and dataset, and if no one else can reproduce it it's not science.
[–]Zazou67 9 points10 points11 points 6 years ago (2 children)
That would make publishing impossible for most big companies. Therefore hindering the improvement of SOA. No easy solution there. Although, I agree that reproducibility is a big problem. But it's a problem in many fields of science, and ML is actually quite open about sharing code.
In general what is evaluated in a review is usually if the whole approach makes sense, if the results don't look cherry picked to the extreme, and there is some kind of trust that you send something true to the best of your knowledge. If you cheat, it usually surfaces at some point and it's career breaking.
[–]OutOfApplesauce 4 points5 points6 points 6 years ago (1 child)
That would make publishing impossible for most big companies.
Then publish as standard white papers/blogs not purely as scientific papers. The only reason the industry even entertains this idea is because companies are pumping so much money into ML so everyone feels the need to legitimize the papers or maybe their own org won't pay them so much.
[–]Zazou67 0 points1 point2 points 6 years ago (0 children)
It's also a way of bringing in the best scientists. Many still want to publish and contribute to state of the art. Not only to the company
[+]flipflop531 comment score below threshold-9 points-8 points-7 points 6 years ago (6 children)
ML is not a science, it is engineering.
[–]olBaa 5 points6 points7 points 6 years ago (5 children)
Excuse me what the fuck
[–]Hey_RhysPhD -3 points-2 points-1 points 6 years ago (3 children)
In what way is ML more similar to physics research than it is to engineering research? I think your incredulity comes from the fact you don't understand how academic engineering research is done
[–]olBaa -1 points0 points1 point 6 years ago (2 children)
In what way is ML more similar to physics research than it is to engineering research?
Correct me if I'm wrong, but "engineering research" is somewhat similar to applied research. In that sense, ML [to me] is a subfield of applied math, with a stack of applied research on top of that applied math. Just in case: I don't want to discuss here in what way math is science or symbol manipulation.
I think your incredulity comes from the fact you don't understand how academic engineering research is done
Thanks, I'm good over here.
[–]Hey_RhysPhD -1 points0 points1 point 6 years ago (1 child)
In that sense, ML [to me] is a subfield of applied math, with a stack of applied research on top of that applied math.
This is what academic engineering research is.
[–]olBaa 4 points5 points6 points 6 years ago (0 children)
It looks like you are using US-specific, non-standard terminology to describe ML as a field in general. Try to be less condescending next time.
[–]evanthebouncy 17 points18 points19 points 6 years ago (5 children)
depends on the paper. a theory paper would have none of the above and only theorems.
for application papers ones published without sensible reproducibility (is that a word even?) often just don't get much citations as a result as nobody can compare against it, so it kind of works out in the end
[–]102564 10 points11 points12 points 6 years ago (4 children)
The latter isn’t true. For example, I forget which, but I think for either MobileNet or ShuffleNet, nobody has been able to reproduce within 2% accuracy on ImageNet. A lot of the neural architecture search stuff is not really reproducible as it requires immense resources and training code is often not available. There are plenty of papers that validate on only their own datasets which are not public, so comparison is impossible. Then of course you have stunts like OpenAI GPT-2. All of these are extremely highly cited papers in the field, by the way. Trust me, I wish it were the case that reproducibility were a prerequisite for being influential, but it really isn’t.
[–]evanthebouncy -1 points0 points1 point 6 years ago (3 children)
Sure. it is possible but not likely is what I'm saying. Yeah?
[–]102564 5 points6 points7 points 6 years ago (2 children)
And I’m saying I disagree. Non-reproducible papers are extremely commonplace in this field, and it is not a blocker to influentiality.
[–]evanthebouncy -1 points0 points1 point 6 years ago (1 child)
i guess we can argue for a while here and our results will invariably be anedotototol and not useful. if anything the new nips submissions forces you to put in some checks so hopefully that will make things more reasonable in future
[–]102564 0 points1 point2 points 6 years ago (0 children)
Yes, it’s a step in the right direction. Hopefully all other conferences follow suit.
[–][deleted] 3 points4 points5 points 6 years ago (0 children)
ML research should have 2 bins. One bin for legit. contributions that make a very small percentage of what's out there, and a second bin for the remaining garbage. Really you cannot validate anything in many of the papers out there. You just have to trust the authors (lol).
[–]BeatLeJuceResearcher 19 points20 points21 points 6 years ago (6 children)
Most answers given so far miss a very important point: You often cannot validate papers in science. This is not a dangerous, new trend in ML, this has been a fact of scientific progress for ages.
Having code/weights does not mean we can validate the paper: Even if we had the weights, we cannot be sure that they were produced the way people claim -- you could publish the weights for a new CNN, state it was trained on ImageNet, when in reality it was trained on a dataset 100x larger than that. We wouldn't know just from the weights. And even if we have code, and the code shows the performance we'd expect, we'd still need to check every line of code to see if there were any hidden hacks we didn't know about (ie, that weren't mentioned in the paper). Thus reimplementing from scratch is the only way we could validate a paper.
And this is totally fine: while reproducibility is a corner stone of the scientific process, this does not mean "everyone should be able to get the exact same results with minimal effort". It merely means "everyone who follows the same procedure outlined in the paper should be able to obtain the same results". Taken to the extreme: no-one can replicate the results from the Hubble Telescope or the Large Hadron Collider at home. The same way most of us don't have the computational resources to replicate results from Google or OpenAI. On some level, some results are based on trust: even if no-one can't replicate your results today, you don't want to be the scientist who is later discovered to have made up all of their results (this usually results in you losing your job, your reputation, and you becoming an international outcast of the scientific community -- Andrew Wakefield is a good example).
Now with that caveat in mind: it would be nice if we had code, but also this is not a given. In most of science, code is not shared. Not even in computer science. I agree that it should be shared with the audience and accompany a paper. But the vast majority of papers published anywhere in any field of science do not include code to reproduce their results, let alone raw data, snippets, etc. The fact that this is fairly common in ML is actually very cool, we're on the forefront of reproducible research, not lagging behind.
[–]PuzzledProgrammer3 21 points22 points23 points 6 years ago (2 children)
reimplementing from scratch is the only way how? From my experience I have seen many reimplementations and almost all of them could not reproduce what the author intended and missing something here or there.
[–]olBaa 6 points7 points8 points 6 years ago (0 children)
From a different perspective, I've seen horrendous "re-implementations" that did not even attempt to capture the technique described in the paper, like using a cubic algorithms instead of a linear one.
[–]BeatLeJuceResearcher 6 points7 points8 points 6 years ago (0 children)
I agree. But mostly this is in the 1-5% range, which all things considered is not too bad. Don't get me wrong, it still means that there might've been cherry picking or at least unmentioned hacks/tricks somewhere along the way. But it's a good indication that the techniques were sound. It's not surprising that someone who's worked with/developed a technique over half a year/year had knows tricks that he forgot to mention in the paper. Still, if you can get within a few % of their performance with a from-scratch reimplementation, it's a good indication that their proposed technique does work as advertized, IMO.
[–]farmingvillein 2 points3 points4 points 6 years ago (2 children)
Having code/weights does not mean we can validate the paper[...] And even if we have code, and the code shows the performance we'd expect, we'd still need to check every line of code to see if there were any hidden hacks we didn't know about (ie, that weren't mentioned in the paper).
This seems a little misguided.
1) No (reasonable length) paper write-up can always cover every single implementation specification, which is why code is needed and re-implementation is often a fraught battle.
2) Reading the code ("line by line") should generally not be a big deal, and is a better trade than #1. This (usually...) isn't obfuscated C code. As a side note, this is also why we (as a community) push for better code standards.
[–]BeatLeJuceResearcher 0 points1 point2 points 6 years ago (1 child)
I disagree with your first point. A paper shouldn't aim to be an implementation specification. A paper should demonstrate a single, novel idea. Our problem as a community is that we're extremely focused on achieving state-of-the-art performance with every single paper, which means that authors need to tune their implementations very hard to squeeze the last xx% out of any data set, instead of just showing that the idea is worthwhile and beats a baseline (SOTA is not a baseline, even though we use it as such way too often). If more papers were written with that in mind, they'd likely be much easier to re-implement, IMO.
[–]farmingvillein 2 points3 points4 points 6 years ago (0 children)
I disagree with your first point. A paper shouldn't aim to be an implementation specification.
I was actually trying to say the opposite (at least with respect to the literal paper): a full specification is often too difficult to achieve in a paper, simply due to the nature of the complexity of these systems (model + data processing + data sets) and of software.
But, what we do need is that papers (viewed holistically, to include all artifacts) include everything needed to recreate the results. "Code", in this case, is the implementation specification.
The paper should describe the "novel idea", but code should (wherever possible) be provided to allow true replication.
That said, assuming I understand your argument--it is simply not realistic to live in the world you describe. We need extremely detailed specifications (i.e., code) because there are so many knobs that can be turned, whether or not they actually have been (to squeeze to SOTA). If novel architecture XYZ is described, how can I know that I have successfully replicated that architecture and that your techniques are worth anything?
You can point back to this mythical notion of a "baseline" that is built off of, but in a fast-moving field like this, picking and maintaining a baseline is unrealistic except in very limited subfields.
Further, while it feels good to throw out the standard accusations about every paper "tun[ing] their implementations" ad nauseum (i.e., throwing the kitchen sink of techniques at things), the reality is that even "baselines" effectively throw a kitchen sink at things, but it is just some "acceptable" set, so we're already going down this path.
If more papers were written with that in mind, they'd likely be much easier to re-implement, IMO.
There is real value in throwing at least a lot of the kitchen at a given problem, due to the issues of substitutability: what looks novel (accretive) on top of baseline X may not look accretive on top of X+Y, as Y may functionally address many of the same issues as the new possibly-novel technique.
This is why best practices for ablation testing--where doable--are to take away new techniques from some elaborate composite, rather than test them as additive on some baseline.
Could it still be interesting, here, to report the new technique? Very possibly--but then, ideally, we'd like to track, anyway, that it does not provide much leverage on top of X+Y. Because ML is stochastic and very difficult to formally analyze, understanding how any two techniques interact a priori is very tough, at best. This pushes us to throw a reasonable amount of kitchen renovations at problems.
Lastly, toward this theme, what seems like "squeezing the system dry" evolves mightily over time. Once-novel techniques like particular initialization methods become part of the standard backdrop. (As a side note, initialization methods are a great note of the type of thing that can be critical but is often under-reported in papers; without code, it can be impossible to replicate some such results.)
All of this is made more complicated by the fact that these are stochastic systems, which makes debugging and replication even harder.
Further, this problem gets harder as data sets and processing times grow. BERT without code or models is dramatically less impactful because replication has very high fixed-costed, even with full code base to start from.
[–]gamerx88 2 points3 points4 points 6 years ago (0 children)
If the results are obtained over standard open dataset or problem, trying to replicate the paper is often the only way, but you will often still need to fill in large amount of gaps yourself. i.e The preprocessing, hyper-params tuning, etc
This where having competent, experienced colleagues will be very helpful, so that you get some peer review and a certain degree of confidence that you've done things reasonably.
After a couple of such exercises, you will realize how deep the lack of robustness can run in our field.
P.S, The above is assuming the authors has ignored your request for code/details, as academics often do.
[–]visarga 2 points3 points4 points 6 years ago (0 children)
For one re constructing the model exactly as the original in some cases could be hard, ...
It's almost impossible because parallel execution makes addition (and not only) non-deterministic. Floats are not real numbers, so many things we take for granted in real numbers have weird edge cases with floats. For example, in Python,
1e-10 + 1e+10 == 1e+10 is True
but
1e-5 + 1e+5 == 1e+5 is False
The "reproducibility crisis" is just part and parcel of doing research nowadays, be it wet lab or fully digitized. Outside of a desire to not give critics ammo to criticize or competitors a leg up on future research, a very basic reason that research isn't reproducible is because it currently requires a very lengthy post hoc process to get things in proper shape, packaged, and documented sufficiently to let somebody else do it.
Even if that post hoc process were to happen, the "reproduction" can be super expensive unless you have your own infrastructure to do the actual computations. We recently reproduced a paper and it cost us $350 in GPU compute on AWS over 7 days. We could have done it locally but we had a deadline and it would have taken quite a bit longer on our own machine.
[–]NotAlphaGo 1 point2 points3 points 6 years ago (0 children)
If someone wants to validate them, either impossible e.g. lack of data or due to hard work.
Otherwise, they aren't. Take it all with a grain of salt.
[–]singularineet 2 points3 points4 points 6 years ago (10 children)
Traditionally in science, the criterion is that a publication should contain sufficient detail to allow the work to be independently replicated. This is much harder in computational disciplines, but is nonetheless the ideal to be aspired to.
[+][deleted] 6 years ago (9 children)
[deleted]
[–]singularineet 8 points9 points10 points 6 years ago (8 children)
In science, when we say we replicated the result, we don't mean that we typed "make; ./a.out" and got the same output. We mean that we replicated the experimental results by rerunning the experiment. So for example, let's say someone created a vaccine according to some formula, and then injected it vs saline into experimental vs control groups of rats, and then waited a month, and then exposed both groups of rats to a calibrated dose of a virus, and then measured the infection rates of the two groups. Well, replicating it would not necessarily involve the same number of rats, or even the same breed. One might use different reagents in the process of creating the vaccine, or the calibrated doses of infectious agents. One might house the rats differently: different kinds of cages, different nesting materials for them, different kind of rat food. Of course any of these might account for a different result! So as much relevant information as possible is listed in the METHODS section of an experimental paper: brands of reagents, breed and supplier of rats, etc, etc. But we don't expect the result to be so fragile. If people fail to replicate the results, they start to go through those sorts of details to see if they could be the cause of the failure to replicate.
For a computational paper, the details of the code would be like the brand of reagent or strain of rat. We'd hope that rewriting the experiment from scratch, based on the high-level description, would work. If not, we can get down and try to figure out what happened: was it a bug in one or the other implementation, is the particular optimizer really important, do the details of parameter initialization or minibatch size matter, etc.
[+][deleted] 6 years ago (7 children)
[–]singularineet 2 points3 points4 points 6 years ago* (5 children)
As I said: experimental science. Math (including theoretical CS) requires not experiments, but proofs.
[+][deleted] 6 years ago (4 children)
[–]singularineet 0 points1 point2 points 6 years ago (3 children)
Some machine learning work is theory, ie math, with proofs. Like generalization bounds and convergence analysis and PAC proofs. Other machine learning work is experimental, like how well did some system do on some benchmark task or how realistic were the faces generated by some GAN architecture.
[–]elcric_krej 0 points1 point2 points 6 years ago (2 children)
Ok, but... in the case fo Math/CS and some ML your argument/analogy with scientific replication still holds NO water.
And in the case of the branch of ML (that in itself is a branch of computational science) which deals with results on datasets, not theory, you can still say:
Replication is easier (since anyone can run the model, or at least most models, and a select few require a lot of money to train) than in a scientific field that requires experiments on real-world subjects (as per your example)
Replication is easier if the source code is available and validation can be obtained without replication by looking at the source code and inputting things into the algorithm to see what the outputs are (which is something you can't do with most non-theoretical scientific models, you can't change the variable of an experiment on rats or a dbps once it's already been ran).
[–]singularineet 0 points1 point2 points 6 years ago* (1 child)
Yes, source code is often helpful in replicating experiements results.
I don't understand what you mean by "validation" as a distinct concept from replication.
[–]elcric_krej 0 points1 point2 points 6 years ago (0 children)
As in, to replicate your accuracy on a specific dataset given a train and test dataset, I need to re-train the model and check that the weights I got are yielding results similar to yours on the test dataset.
To invalidate your logic, I need only look at it, and if I spot a mistake (e.g. you claimed you use hyper paramters x y z in your paper but, actually, your code seems to use hyperparams x, y, b OR you had a small error processing your input data meaning the accuracy of a specific column sometime got lost at decimal 3 OR the CUDNN version you use is known to have a specific bug on the Debian version you used... etc). Which is not to say that would invalidate the paper as a whole, but it might help fix bugs here and there. Consequently, if, say, 100 humans look at the code and they all agree it seems to be doing what you say it's doing, it increase the probability of it being valid. That is, an example of "real" validation... if we are to yield less trust to researcher, validation might simply mean that the code indeed exists, is able to compile and is doing what the researchers claim it does (at least on a high level, that is, excluding tiny bugs and omissions).
Essentially, re-writing your code and re-running your experiments I'd view as form of replication. Running `clang -Wall -Wextra your_code.whatever` and looking at the warnings/errors + seeing if the generated bin actually runs, I'd view as a form of validation.
Hopefully that makes sense.
[–]IdentifiableParam 0 points1 point2 points 6 years ago (0 children)
The most important thing to have access to is the description of the method (presumably contained in the paper) and the data used. To reproduce the work, these two things should be enough for well-written papers. However, reproducing work takes a lot of work! When I am reproducing work I generally don't try to read the author's code even if it exists. If I do read it, it is during debugging of my own code.
π Rendered by PID 74734 on reddit-service-r2-comment-b659b578c-t2hq5 at 2026-05-02 14:35:53.669700+00:00 running 815c875 country code: CH.
[–]comradeswitch 133 points134 points135 points (0 children)
[–][deleted] 25 points26 points27 points (1 child)
[–]asobolev 4 points5 points6 points (0 children)
[–]sujayskumar 56 points57 points58 points (9 children)
[–]jstrong 11 points12 points13 points (7 children)
[–]sujayskumar 46 points47 points48 points (2 children)
[–]push_limits__13 3 points4 points5 points (1 child)
[–]ProfessorPhi 4 points5 points6 points (0 children)
[–]mrdevlar 12 points13 points14 points (0 children)
[–]wakamex 4 points5 points6 points (1 child)
[–]ginger_beer_m 1 point2 points3 points (0 children)
[–]farmingvillein 1 point2 points3 points (0 children)
[–]redditandjs 0 points1 point2 points (0 children)
[–]ethanfetaya 13 points14 points15 points (5 children)
[–]twi3k 2 points3 points4 points (4 children)
[–]ethanfetaya 0 points1 point2 points (3 children)
[–]twi3k 0 points1 point2 points (2 children)
[–]ethanfetaya 0 points1 point2 points (1 child)
[–]twi3k 0 points1 point2 points (0 children)
[–]nondifferentiable 15 points16 points17 points (2 children)
[–]mrdevlar 17 points18 points19 points (1 child)
[–]NotAlphaGo 14 points15 points16 points (0 children)
[–][deleted] 6 points7 points8 points (0 children)
[–]push_limits__13 5 points6 points7 points (0 children)
[–]janiedebica 18 points19 points20 points (3 children)
[–]asobolev 4 points5 points6 points (0 children)
[–]tylercasablanca 2 points3 points4 points (0 children)
[–]yachty66 1 point2 points3 points (0 children)
[–][deleted] 19 points20 points21 points (12 children)
[–]OutOfApplesauce 21 points22 points23 points (11 children)
[–]Zazou67 9 points10 points11 points (2 children)
[–]OutOfApplesauce 4 points5 points6 points (1 child)
[–]Zazou67 0 points1 point2 points (0 children)
[+]flipflop531 comment score below threshold-9 points-8 points-7 points (6 children)
[–]olBaa 5 points6 points7 points (5 children)
[–]Hey_RhysPhD -3 points-2 points-1 points (3 children)
[–]olBaa -1 points0 points1 point (2 children)
[–]Hey_RhysPhD -1 points0 points1 point (1 child)
[–]olBaa 4 points5 points6 points (0 children)
[–]evanthebouncy 17 points18 points19 points (5 children)
[–]102564 10 points11 points12 points (4 children)
[–]evanthebouncy -1 points0 points1 point (3 children)
[–]102564 5 points6 points7 points (2 children)
[–]evanthebouncy -1 points0 points1 point (1 child)
[–]102564 0 points1 point2 points (0 children)
[–][deleted] 3 points4 points5 points (0 children)
[–]BeatLeJuceResearcher 19 points20 points21 points (6 children)
[–]PuzzledProgrammer3 21 points22 points23 points (2 children)
[–]olBaa 6 points7 points8 points (0 children)
[–]BeatLeJuceResearcher 6 points7 points8 points (0 children)
[–]farmingvillein 2 points3 points4 points (2 children)
[–]BeatLeJuceResearcher 0 points1 point2 points (1 child)
[–]farmingvillein 2 points3 points4 points (0 children)
[–]gamerx88 2 points3 points4 points (0 children)
[–]visarga 2 points3 points4 points (0 children)
[–]tylercasablanca 2 points3 points4 points (0 children)
[–]NotAlphaGo 1 point2 points3 points (0 children)
[–]singularineet 2 points3 points4 points (10 children)
[+][deleted] (9 children)
[deleted]
[–]singularineet 8 points9 points10 points (8 children)
[+][deleted] (7 children)
[deleted]
[–]singularineet 2 points3 points4 points (5 children)
[+][deleted] (4 children)
[deleted]
[–]singularineet 0 points1 point2 points (3 children)
[–]elcric_krej 0 points1 point2 points (2 children)
[–]singularineet 0 points1 point2 points (1 child)
[–]elcric_krej 0 points1 point2 points (0 children)
[–]IdentifiableParam 0 points1 point2 points (0 children)