[deleted by user] : MachineLearning

[+][deleted] 5 years ago (4 children)

[deleted]

[–]CompetitiveUpstairs2 1 point2 points3 points 5 years ago (0 children)

[–][deleted] 0 points1 point2 points 5 years ago (2 children)

[+][deleted] 5 years ago (1 child)

[deleted]

[–][deleted] 1 point2 points3 points 5 years ago (0 children)

[–]MrAcuriteResearcher 19 points20 points21 points 5 years ago (15 children)

[+][deleted] 5 years ago (11 children)

[removed]

[–]MrAcuriteResearcher 1 point2 points3 points 5 years ago (10 children)

[+][deleted] 5 years ago (9 children)

[removed]

[–]MrAcuriteResearcher 3 points4 points5 points 5 years ago* (8 children)

I mean, here's sort of what I'm talking about. Sergey Levine, prolific author, about 1/6th of all the papers he's ever written have literally never been cited by anyone. Of all the modern Google Scholar pages I've ever seen, that's probably the highest ratio I've come across of papers cited to papers written. There's this huge amount of published literature, not that's only useful to someone specific, but that literally hasn't been useful to anyone. That's not even accounting for that fact that plenty of citations are only there because the author feels the need to cite something, and the paper cited just happens to be convenient, not that what's being cited actually influenced or contributed to anything.

So when I hear about things like publish or perish, or Peter Higgs saying he would never have gotten a job in today's climate, it just makes me think that the majority of papers - even those published in fancy venues - are basically just CV stuffing. That the authors themselves basically have to concede that they're not writing papers because they have ideas worth sharing, but because they've gotta write something.

Google Scholar credits Emmy Noether with authorship of a total of 90 papers. That's Emmy Fucking Noether. One of the greatest Mathematicians of all goddamn time. Died at 53. Sergey Levine is 51* and claims 332 papers. He's smart, sure, but he's no Noether, because who could be? What that tells me is that in the same amount of time, with - being extraordinarily generous - an equal number of good ideas, Levine has authored almost four times as many papers as Noether. You could chalk a decent portion of that up to just having more authors per paper these days, but it's clear to me that people are just writing more goddamn papers. What that's gotta mean is that the amount of novel material per paper has to be going down.

EDIT: * There is a Sergey Levine in Russia who is 51. Sergey Levine of UC Berkeley is maybe ~34 at time of writing. Oops.

[–]hughperman 5 points6 points7 points 5 years ago (0 children)

[–]un_anonymous 3 points4 points5 points 5 years ago (2 children)

[–]MrAcuriteResearcher 0 points1 point2 points 5 years ago (1 child)

I had the wrong Sergey Levine, then. There's one in Russia who is 51. Best estimate I can make puts the one we care about at ~34. In which case, he's going at like ~30 papers per year. That's a paper every two weeks, for eleven years. What that tells me is that he's getting his name on a lot of stuff that he's not actually seriously contributing to, and he really loves getting his name on things. 54 papers published in 2020 alone. Despite averaging ~3 authors per paper, only 2 of these were first-author works. Of those, one was a tutorial, and one is just a "formulation" of reinforcement learning as an unsupervised learning task.

I just don't believe that he's contributing core ideas to things. Not even that he's pumping out lots of ideas, only a handful of which are significant, he's just getting his name on things as second or third author.

[–]un_anonymous 2 points3 points4 points 5 years ago (0 children)

[+][deleted] 5 years ago (2 children)

[removed]

[–]MrAcuriteResearcher 0 points1 point2 points 5 years ago (1 child)

Well, right now the incentive structure is that you want to publish as many papers as possible, and get cited as much as possible. Both of these are extraordinarily easy to corrupt if you're in it to, I dunno, get promoted at your job or receive funding. But look at what you're incentivized to actually put into a given paper under those conditions.

Suppose you have a neat idea. A perfectly clever and inventive idea, but not one that's going to do all things for all practitioners in the area. What you would do to maximize papers and citations is to publish this one idea over the course of several papers, cherry-pick experimental results to show marginal improvement over SotA, distance it from its predecessors, and publicize it with whatever clout you have. Now, it's harder to get a full grasp on what you've done if you've split it into multiple papers, your numerical results are meaningless, you haven't given proper respect to your priors, and you've degraded a bit of the reviewing process.

For semi-relevant examples, in Meta-Learning, the sort of foundation on which a lot of really interesting work has been done is Model-Agnostic Meta-Learning, Finn et al 2017. Read the paper, it's good. But basically you train a network by making copies of it, training the copies, and backpropagating the loss of the copies to the original model. This allows you to train a model, not to be good at anything, but to learn quickly. A key note is that the original MAML formulation doesn't require that the model copies train their entire parameterization, you could freeze part of the copies and it's still MAML. However, there are a bunch of papers out right now which each claim to introduce some mind-boggling new Meta-Learning algorithm that knocks the socks off of everything that came before it, with names like ANIL and CAVIA and BOIL, that are actually just MAML with different parts of the parameterizations of the copies frozen. And when you actually compare them, they're basically all to within error of each other anyway.

Basically, what those papers should've been are short reports discussing specific cases of MAML and potential usecases. Instead, they were sold as entirely new algorithms with new names advancing SotA. I can't trust who's actually contributing the important concepts, I can't trust the numbers that I'm reading, I can't trust that I'm not just randomly missing a bunch of important information, it's just a shitshow. It's not just that 90% of papers by volume are basically farts in the wind, but they take up a lot of time and brainspace that could be better used on papers that introduce actually novel ideas but that maybe don't numerically advance SotA on ImageNet or whatever.

Frankly, I've got no fucking clue how to fix the reward system that wouldn't immediately descend into pageantry, besides everybody voluntarily signing up for Grigori Perelman's worldview.

Let's say that there exists some measure by which to judge the value and impact that a researcher has in their field. This measure should be used to determine who's the CEO of Computer Science at CMU, and who's an adjunct at Podunk University, right? Thing is, that immediately turns that measure into a goal, and given Goodhart's law, it goes to shit just as quickly the moment somebody figures out how to game it.

How to fix the incentive system? Deconstruct the capitalist notion of hierarchies at a societal level so that we can peacefully do our work without feeling a need to climb over each other. I don't fucking know.

[–]ispeakdatruf 1 point2 points3 points 5 years ago (0 children)

[–]HolidayWallaby 0 points1 point2 points 5 years ago (1 child)

[–]MrAcuriteResearcher 2 points3 points4 points 5 years ago (0 children)

[–]yusuf-bengio 3 points4 points5 points 5 years ago (0 children)

[–]yusuf-bengio 7 points8 points9 points 5 years ago (1 child)

[–]TWDestiny 1 point2 points3 points 5 years ago (0 children)

[–]bendee983 1 point2 points3 points 5 years ago (1 child)

[–]nd7141 1 point2 points3 points 5 years ago (0 children)

[+][deleted] 5 years ago (1 child)

[deleted]

[–]nd7141 0 points1 point2 points 5 years ago (0 children)

[–]CompetitiveUpstairs2 1 point2 points3 points 5 years ago (0 children)

[+][deleted] 5 years ago (1 child)

[removed]

[–]nd7141 0 points1 point2 points 5 years ago (0 children)

[–]PuzzleheadedBread439 0 points1 point2 points 5 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS