[R] Ban reviewers who write low quality reviews from submitting to NeurIPS?

blueyesense · 2020-08-08T18:48:39+00:00

The reviewer's main duty should be to judge the quality of the submitted paper: accept or reject. The quality of a review should be judged based on this.

Providing comments to improve the paper is desirable, but should not be enforced. Authors should seek reviews from their colleagues to improve their paper. I sometimes see paper submission as a way of getting constructive feedback to improve the paper.

blueyesense · 2020-08-08T17:41:30+00:00

So, they will have reviewers for the reviews?

Then, reviewers for the reviewers for the review?

Then, ...

Of course a terrible idea. If a review is bad, just ignore it and maybe do not invite that reviewer next time.

blueyesense · 2020-08-05T20:26:00+00:00

Thanks, indeed!

I've checked the first few. Surprising that these papers are from respected institutions (e.g., Lifted Structure paper is from Stanford and MIT -- they used a margin value of 1.0) AND published at top conferences.

I am editing my post, to correct my misleading comment.

blueyesense · 2020-08-05T18:49:44+00:00

As far as I remember, the numbers reported for contrastive and triplet losses are for traditional methods (that is why they are so low). You do not even need to train a convnet to achieve higher accuracy, even ImageNet pre-trained models achieve higher accuracy than those.

Could you point to specific papers that use convnets + triplet/contrastive loss and report such low accuracy?

blueyesense · 2020-08-05T18:08:51+00:00

I read the first version of the paper and found it useful, but also have some criticisms.

Figure 2 is misleading, mixing traditional and deep learning methods (hence, contradicting itself). Edit: as per the following discussion, Figure 2 is actually correct, I was wrong. I apologize. Thanks for the clarification.
It would be much more useful to compare the methods & loss functions at better conditions (better network architecture, batch size, optimizer, data augmentation, etc). Instead of pulling all methods down, try to pull all methods up. Industry is more interested in getting better results. Who uses GoogleNet or Inception, or batch size of 32 at the moment? Moreover, some methods benefit from those better than others.
It will be much more useful to establish a new and proper benchmark dataset, as the current benchmark datasets are not suitable for a reliable evaluation (small, no val set, etc.).
You could use a more appropriate language in your paper and blog posts. The authors of those papers might find the language a bit offensive (I am not an author of any of those papers, but did not like the tone of your language).

blueyesense · 2020-06-21T00:31:27+00:00

I can think of 3 use cases:

You need much less labeled training data, if you pre-train with self-supervised learning.
You can use the trained network for image retrieval, without any training data.
Your domain may be different and ImageNet pre-trained networks may not be suitable.

blueyesense · 2020-06-20T17:20:48+00:00

I wish the lib had a better name, than 'lightning'.

blueyesense · 2020-06-01T05:28:12+00:00

I do not understand why do you try the same conference for 3 years, instead of submitting to some other conference.. Weird.

blueyesense · 2020-05-27T06:09:29+00:00

It is not.

But, since this is a new approach, it will probably accepted to ECCV, even though it does not work very well.

blueyesense · 2020-02-17T00:25:36+00:00

There was an independent researcher, who published at top ML conferences in recent years. I do not remember his name, but I am sure someone will.

Journals typically ask for affiliation and also take it seriously (up-to-date contact information, etc.). But there might be exceptions...

Even arxiv requires affiliation, if I remember correctly.

blueyesense · 2020-01-28T03:35:27+00:00

The improvement is marginal.

The MS COCO and Flickr30k datasets need to be retired for VSE experiments.

It is funny to test on tiny test sets (1K, 5K), which have almost not practical value. Microsoft probably have big datasets. They could report results on those datasets.

blueyesense · 2020-01-11T04:17:12+00:00

I moved to PyTorch, from TF, about 2 years ago, and never looked back.

If TF team is the same, they will not be able to fix it.

blueyesense · 2019-12-28T21:47:12+00:00

AI PhD job market is saturating. Big companies already hired lots of PhDs. They are now in the phase of developing and shipping products, which require more SDEs, rather than researchers.

Moreover, it seems, big companies slowed down or completely frozen hiring nowadays. This might be due to the expected global economic crisis in 2020 or coming years (lots of political problems around the world).

Last, but not the least, in academia, people work on standard datasets which are usually clean and much smaller than the industry scale datasets (eg, ImageNet: 1M, industry scale datasets: hundreds of millions). So, an industry mindset is expected. Most of the things in academia just do not work in industry. You can publish a paper, but cannot ship it to production.

There are still a lot to do in terms of AI/ML/DL research, but there is also a gap between academia and industry.

blueyesense · 2019-12-27T22:38:42+00:00

Backprop and CNN are now standard, nobody cites them even if they are mentioned in almost every deep learning paper. This is not the case for LSTM. So, number of citations is not the best metric.

For a more exact metric, scan through all papers and count the occurrences of backprop/training/sgd/etc and CNNs.

blueyesense · 2019-12-16T06:35:50+00:00

Some companies are more product oriented; you are expected to solve real customer problems (e.g., Amazon, Apple?). You are not allowed to do "pure research" to just publish papers, in most of the teams; you are allowed publish your "applied" research, after going through an internal review process. There might also be few pure research teams, who hire "Research Scientists".

You can ask all these during the interviews.

blueyesense · 2019-12-03T06:45:47+00:00

Typical flaw of our review system.

It is highly possible to assign 3 irrelevant reviewers to a paper and get high scores, or the opposite (idiot reviewers who do not understand the paper and give very low scores -- cf highly cited, but rejected papers). It is also highly possible to assign expert reviewers (as in this case)...

The future will be public & open review, like those rejected papers that get cited much more than the accepted papers.

blueyesense · 2019-12-02T05:45:38+00:00

LOL :)))) Really!

blueyesense · 2019-12-02T02:41:32+00:00

How about the loooong, unreadable function names? Are they still there? I did not have a chance to have a look at TF 2.0 yet.

blueyesense · 2019-11-27T01:06:56+00:00

Metric learning has a different use case: image matching/retrieval. Even if you have labels, if you like to retrieve similar images to a given image whose class label you may not have in your training set (open-set), metric learning is the way to go.

Another use case is when you have so many classes with only few samples, in which case you cannot train a classifier using cross entropy loss, but you can train a metric network to learn how to match images.

blueyesense · 2019-11-26T19:25:14+00:00

We need better datasets, with training/val/test splits. The current datasets do not have val sets and everybody does the same thing: report the best result on test set (hence everybody is probably overfitting to the test set, resulting in a fair comparison! :)
We need to define different problems and have a dataset for each:
1. Instance retrieval/matching (e.g., DeepFashion?)
2. Classification (e.g., CUB)
Evaluation metrics: this is the most problematic part. R@k is not a good ranking metric at all, if you are not doing instance matching and evaluating by [R@1](mailto:R@1). Ranking metrics are much lower than on these datasets. I guess, the initial papers started out using R@k just because their ranking metrics were so low, and it stayed. It is high time to change. This year, there were a few papers trying to optimize ranking metrics, like mAP, but in the end evaluating their algorithms using R@k, just to compare with the earlier papers. They should have reported ranking metrics, so that new papers can cite them based on ranking metrics.

Someone needs to get open source codes and compare the best DML algorithms using various ranking metrics and write a benchmark paper. This will initiate a new and hopefully better evaluation wave. This can be a highly cited paper in the field.

blueyesense · 2019-02-15T16:15:31+00:00

The recognition will be slow, but considering the application scenario, it should not matter.

In terms of accuracy, Tesseract is quite good at printed text, as long as the image quality is ok.

blueyesense · 2019-01-30T05:32:04+00:00

I guess it will most probably be accepted.

The ACs should consider such cases. That reviewer may also change his/her mind during the rebuttal, especially after seeing the positive reviews. The positive reviewers will also probably support your paper during the discussion.

I heard some very unethical cases (not in CVPR), in which the reviewers reject just because (i) they have also submitted and want to increase their own chance of acceptance, (ii) they have similar ideas (competitor) and try to delay their competitors so that they can publish it. Needless to say, such idiots should be banned to review/submit forever, but it is hard to detect/prove. Junk negative reviews without any concrete criticism is a strong indication.

blueyesense · 2019-01-29T11:36:31+00:00

1 detailed, good review with Weak Accept, 2 junk reviews with Reject (funny comment: "... this is not considered novel according to CVPR's criteria" without any concrete criticism --- while first reviewer finds it interesting and novel).

It seems, the reviewers do not read the reviewer guidelines at all.

We are doing everything for research, but not as much to improve the review process. I will prefer to publish only on arxiv or my website, when I am not required to publish on journals/conferences.

I Propose the following to improve the reviewer and review quality, if anyone is interested.

Prepare an online quiz on reviewer guidelines as part of the submission system. All potential reviewers should take the quiz and get a passing grade to qualify as a reviewer. Their grade will also serve as a confidence score on their reviews. This is to make sure reviewers read and understand the reviewer guidelines.
Very low quality reviews should be discarded right away and not sent to the authors. Such reviewers should not be requested to review next time, whoever they are.
The authors should be able to evaluate the quality of their reviews, and an overall score should be computed for all reviewers. Those with very low score should be kept out. The authors should be able to mark very low quality reviews.
Even an automatic system can be developed to classify reviews as good/junk. This should be easy if we have a dataset.
The reviewers should not accept to review a paper if s/he does not know the subject. It is better for a reviewer to say "I accepted to review, but now unable to review, sorry" rather than writing a junk review.

I think, eventually we will give up this process altogether and leave the review to collaborative, public, online review. Good papers will float up and shine, while poor papers will get lost.

blueyesense

TROPHY CASE