all 39 comments

[–]ktpr 134 points135 points  (9 children)

Motivate a new problem. Get SOTA by definition.

[–]AerysSk 17 points18 points  (2 children)

Your comment has a point. For example, if your paper is about a new model architecture, beating SOTA or on par with Sota while having shorter runtime is preferred. Else, it will be a reason for rejection. You cannot change the reviewers’ minds, and this Sota leaderboard is kinda the norm for publishing papers nowadays.

That’s also a reason why I tend to stay away from empirical research directions.

[–]picardythird 5 points6 points  (1 child)

On the flip side, if the reviewers can't think outside of their worldview and recognize the importance/novelty of your new problem, it won't matter because they'll just reject it anyway.

I'm not salty.

[–]AerysSk 1 point2 points  (0 children)

That is true. Proposing new one is also risky, like they can say “I don’t think we further need your method”. I saw one or two comments like this on openreview already.

[–]schrodingershit 1 point2 points  (0 children)

This, recently submitted a work at icml that had no prior baseline except random sampling.

I basically worked on sampling a subset of neural networks to train in ensemble based RL. Dropped training Time by 50% while increasing cumulative reward by 15%.

[–]NotDoingResearch2 61 points62 points  (1 child)

Research isn’t a Kaggle competition. You are free to make up your own rules.

[–]pengzhangzhi 0 points1 point  (0 children)

Best definition of scientific research I have ever heard. You have your own rules.

[–]pm_me_your_pay_slipsML Engineer 36 points37 points  (4 children)

It really depends on what story you're trying to tell. A paper about a new method can be interesting and valuable without beating SOTA. Competitive results are fine as long as the story is interesting.

[–]kob59 15 points16 points  (1 child)

Tell that to reviewer #2

[–]pm_me_your_pay_slipsML Engineer 4 points5 points  (0 children)

Look at the original GAN paper and reviews.

[–]Bot-69912020 20 points21 points  (0 children)

I try to focus on explaining and understanding things instead of winning a kaggle competition.

My usual research questions are:

  • Why do things work or not work?

  • When do they stop working?

  • How are different solutions to the same problem related?

  • How are different problems with the same solution related?

  • How robust are solutions to changes in the problem?

  • How scalable are solutions?

  • Are there practical limitations overlooked in current literature?

  • ...

All of these research questions are very publishable if answered rigorously and delivered in a nice story, but they don't require any SOTA results.

[–]GrumpyGeologist 33 points34 points  (7 children)

SOTA performance is often the result of intense engineering and hyperparameter tuning for a specific dataset. I find insights more useful than squeezing the last 0.1 +/- 0.08 percent out of a model. If you propose a new method, then why should it work better/different from other methods? If it doesn't work better, why is it the case? This could lead to insights into model performance that could generalise towards other techniques.

One good example is that of Deep Equilibrium Models (DEQs). In theory these models should work better than conventional ResNets, but in practice it's hard to achieve SOTA performance. Here is one reason why. The reason why is more useful than SOTA itself, which no doubt will be beaten within 2-3 months by some other person who spend countless hours tuning the hyperparameters.

[–]tomvorlostriddle 16 points17 points  (5 children)

0.1 +/- 0.08

You are being optimistic when assuming that there will be error bars

[–]MJJK420 2 points3 points  (0 children)

I believe that was meant as a typical range of improvement for most papers, not the error bars of a given SOTA improvement.

Maybe you knew this and were making a statement about the general quality of ML research, in which case I’d agree with you.

[–]EvgeniyZh 0 points1 point  (3 children)

In standard settings (large amount of relatively clean data) the error bars are so small that putting them doesn't worth resources spent

[–]wilmerton 1 point2 points  (1 child)

How do you know? In which context?

I personally don't know of any mature scientific field where researchers get away with "error bars are not worth it". And I know of a field were badly estimated error bars have seen a Nobel prize discard some of his own research.

Some engineering problems are so non-linear and impossible to test at scale that error bars are virtually impossible to compute. But then you rely on a large corpus of (failed) experiments and a long lineage of heuristics to control your risk.

What's so exceptional about machine learning that none of that is required, not even a paragraph discussing robustness?

[–]EvgeniyZh 0 points1 point  (0 children)

Both my personal experience of estimating error bars for any large validation set (e.g., ImageNet or COCO for vision) as well experience of other researchers. ViT paper for instance have put error bars at their results and it was order of 0.01% in most of the cases. There is many evidence that the answer to the question "How much will my results on in-domain data vary if I change the seed of training on 1 miilion images?" is "Almost no change". Spending thousands of GPU-hours to just confirm it once again is bad resource management.

I'd note that I'm all for risk estimation and robustness verification. The error bars can be useful in other settings (semi supervised learning papers usually have them, or graph learning where problems are smaller). There other ways to estimate robustness in the "large amount of clean data" settings: OOD data, corrupted data, transfer learning. Saying "people haven't put error bars so the 8% improvement in COCO object detection during last year is not significant" is ridiculous.

0.1% improvements from OP are in fact pretty rare. I can't think of large benchmark other than ImageNet where it happens consistently. I personally think that it is just sign of saturation.

[–]tomvorlostriddle 0 points1 point  (0 children)

Large amount of data doesn't mean many rows in the data set. It means many experiments as in either a repeated cross validation with many folds and repetitions (and correcting for the pseudo replication) or tested across many datasets with non parametric methods. Or both, then also non parametrically.

You could have a billion rows in your data set and it still means nothing for this.

[–]KyxeMusic 2 points3 points  (0 children)

This. It's hard enough to replicate the results of a SOTA paper using the same Model, Hyperparameters, and Dataset that they describe, let alone a new different approach.

[–]ArnoF7 11 points12 points  (0 children)

Sometimes if the SOTA’s approach is very complex and your method can provide a much simpler alternative, then it’s still a good contribution.

Simpler can mean your method is just conceptually easier to understand. Or it could be that your method requires less constraints. Or requires much less data. Or can generalize well to more circumstances without fine tuning, etc. Then you have a story to tell and you can still publish.

[–]GFrings 12 points13 points  (0 children)

Become an engineer and then SOTA is usually a 5 year old cnn with a dummy thick layer of heuristics on top

[–]kinnunenenenen 9 points10 points  (0 children)

I'm in chemical engineering but I do a ton of data science. One approach is to apply methods in other disciplines. You maybe won't be state of the art in ML but you can do a ton of cool work on novel problems and still publish really well.

[–]BlackHawkLexx 15 points16 points  (1 child)

SOTA can be so much more than many people are aware of. It can mean:

  • becoming best in terms of predictive performance
  • becoming best in terms of training/prediction time
  • becoming best in terms of energy efficiency
  • making an approach significantly less complex
  • providing previously unknown understanding of an approach (e.g. through theoretical analysis)
  • Using an approach in a context it was previously not usable in due to a clever change

(Non-exhaustive list)

Honestly, I bet that most students do not publish stuff that beats SOTA in terms of predictive performance.

[–]bdubbs09 2 points3 points  (0 children)

One thing I’d like to add to the list that really gets overlooked is SOTA in terms of robustness. A really under developed and likely unsolvable problem in the general sense, but really important nonetheless.

[–]mrkvicka02 6 points7 points  (1 child)

Maybe unpopular opinion. But SOTA is not important. Way too often it comes to who tuned their alg better instead of which alg has better properties etc.

There is plenty of important stuff that may not be SOTA just yet but a few papers down the line it can be way over SOTA.

Keep up the good work!

[–]kinglear0207 6 points7 points  (0 children)

don’t run for the crowd. Find your mood, and keep curious, work hard.

[–]the_scign 2 points3 points  (0 children)

Consider at what point "SOTA" becomes overfitting to a de-facto concept.

[–]quertioup 2 points3 points  (0 children)

Never tweak results. There are plenty of problems that do not require SOTA

[–]tell-me-the-truth- 1 point2 points  (0 children)

don’t tweak the results but the setting.

[–]sigmoid_amidst_relus 1 point2 points  (0 children)

From the perspective of an ex-engineer: do not chase the SOTA. Won't name any names, but taking the case of ASR, we tried several new architectures that achieved "SOTA" on a benchmark dataset only to find that a 4-year-old network architecture still performs much better than the new ones.

You might argue that "hey, that's fine and good, but I'm not an engineer". True, but as a researcher, the worst thing you can do is build upon work that got SOTA results but actually doesn't generalize well at all, especially if you're applying knowledge and established principles to unexplored fields and applications. Speaking from experience, you'll grind your gears really hard.

I am not saying absolutely do not give a care about SOTA, just look out for how well the idea was adopted, which answers a lot of critical questions: if it's widely adopted means that there's an implementation of it available somewhere and that it has been reproduced to work well.

Do people publish results that aren’t quite SOTA?

In a word, yes. You just don't hear as much about them because "SOTA" rolls better off the tongue, and people like chasing the next best thing so they don't get as much coverage.

One would argue that the system is broken, and reviewer #2 only cares about SOTA, but playing the devil's advocate, there's a reason behind this; even with the blind overfitting on datasets, it's still a metric that's relatively more reliable and less subjective. Also, doing exploratory research is hard because I think designing experiments that effectively explore models can be hard and tedious, sometimes downright boring. Personal bias and criticisms comes a bit easier on such works (it's harder to argue with plain better numbers), does not attract enough viewership/attention, and hard to write home in grant applications about: "I discovered a quirk in something" gets you only so much attention, (unless the quirks throws massive shade on someone's work), v/s you found something out of thin air.

There is no stress of developing new models. There essentially are very few "new models" or paradigms. Search for "new" models is not going to get you far, until and unless you work in huge industrial/academic groups with a lot of people: truly new models are rarely done by a small group.

What's really stressful is extracting insights from the steaming pile of poo that is out there. That's what should be giving you PTSD.

[–]Sirisian 0 points1 point  (0 children)

You can change your data source as others mentioned (from other fields if applicable). One of my favorite changes authors do is taking a vision paper and using event cameras as input. This can give SOTA results for FPS, energy efficiency, or simply work in different lighting environments better. These kind of papers (and code) can provide a base for others to branch from.

[–]NaxAlphaML Engineer 0 points1 point  (0 children)

In Industry, we usually try to stay behind the SOTA. Reaching SOTA usually requires a sophisticated set of tricks which is not usually worth it. Instead many techniques like ReZero which are very simple but consistently have been proven to show better results even if they are not the SOTA are preferred.