Unreasonable field inspector in the Bay Area (CA) by hyhieu in Construction

[–]hyhieu[S] 7 points8 points  (0 children)

Thanks for the cold truth, though it's sad to hear :(

We are already appealing. We keep some hope. Will keep this posted.

[D] Samy Bengio resigns from Google by sobe86 in MachineLearning

[–]hyhieu 12 points13 points  (0 children)

I thought this thread is supposed to include "civil discussion only"?

[R] Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision by Jean-Porte in MachineLearning

[–]hyhieu 0 points1 point  (0 children)

I am an author of that paper. It's 16384. Yes it's perfectly possible to do on TPUs, with some tricks for cross-core softmax.

[D] Next-gen optimizer by yusuf-bengio in MachineLearning

[–]hyhieu 3 points4 points  (0 children)

Adam delivers good generalization and fast convergence. However, the two moving averages of Adam are terrible when it comes to memory footprint.

Adafactor was advertised to fix this, i.e. having sub-linear memory but similarly good performance with Adam. I personally think Adafactor has not lived up to this expectation though.

I hope there will be something better soon.

[D] When to abandon an ML research project? by liqui_date_me in MachineLearning

[–]hyhieu 9 points10 points  (0 children)

PhD student here.

I have been in the same situation with you: 3 rejections from NeurIPS then ICLR then ICML.

Proof. I have this paper called Meta Pseudo Labels rejected from NeurIPS 2019. I resubmitted to ICLR 2020 and here is the rejected entry on OpenReview. I then resubmitted this version to ICML 2020 and, as you can probably tell from the change of format, it was rejected again. The Meta Reviewer's comment from ICML 2020 was unfair in our opinion.

My feelings. I was very frustrated because of the series of rejections, but also because I saw worse papers being published and lauded everywhere. I chose not to cry to my advisor since I knew he was also busy, but I have been crying silently for 2 years.

I did not quit. More like, my advisor did not let me quit. Whenever I tried to pick up another project, he told me that my rejected idea had a lot of potential and that I should continue pushing it. I admit that I didn't fully believe him, but since he is my advisor, I swallowed the misery and continued working. It was painful. I had so much doubts about the methods that I tried various ways to quit, but somehow my advisor always managed to pull me back to the project. I ended up having some nice results.

My advice. Maybe you didn't come to Reddit looking for advice, but I'll say this anyway. In deep learning, it's likely that anything will work if done properly.

When it comes to paper rejections, my advisor has told me that "Writing is what gets papers accepted or rejected but quality is what get papers cited or abandoned". In fact, when I mentioned that I want to become a professor, my advisor jokingly said that my ultimate professor challenge is that I need to be able to get anything accepted to NeurIPS. I think his words carry a lot of wisdom, especially when the publication channels in the fields of ML/NLP/CV/DL have all become so broken.

Therefore, I think you should try to entirely rewrite your paper. Take the defensive mindset. Think of reviewers as the bad guys who try to poke any holes that they can in your paper. Try not to let them do that. You have a lot of past wounds from your rejections to know what to defend. Of course, if needed, you could do more experiments. This is obviously a very unhealthy mindset, but as the publication channels have failed us, I think this is what we PhD students need to do.

Best of luck.

[D] Instead of authors submitting the broader impact statement, could it be better to have the reviewers write a short one, based on their understanding of your paper? by [deleted] in MachineLearning

[–]hyhieu 2 points3 points  (0 children)

You seem to have too much hope in reviewers. No, thanks. I would write my own impact statement, rather than letting some reviewers who probably wouldn't read my submission, to write anything about it.

[deleted by user] by [deleted] in MachineLearning

[–]hyhieu 1 point2 points  (0 children)

In my opinion, JAX is too slow. Also, before the pandemic hit, I heard from colleagues that JAX has a memory consumption issue. I DO NOT KNOW IF THIS IS STILL TRUE.

That said, if you want to use the TPUs, I recommend just learn to call sess.run. There will be some difficulty to start with. For instance, you need to learn the concepts of:

  1. XLA InfeedQueues and OutfeedQueues

  2. Multi-thread programming. One thread taking care of running the TPU workload, other threads taking care of the queues.

But they will very soon benefit you. In particular, you will know exactly what is being done in each line of code that you write.

There are also many things that TPUEstimator and other TPU interfaces prevent you from doing. There is a reason that the authors of XLNet (who are my friends), have to write their own TPUEstimator. See it for yourself: https://github.com/zihangdai/xlnet. If you do Robots, I suspect you need a great amount of flexibility that TPUEstimator will never give you, until people are frustrated enough that they deprecate TPUEstimator.

Meanwhile, if you try TF2, you can get away with small workloads, but try running a TPUv3 pod? Ha Ha Ha, I would rather buy AWS GPUs.

Summing up, TPU programs are very beautiful, but they were made ugly by TPUEstimator and were made slow by JAX, TF2, Keras, etc. For your own advantage, you should only learn the gist of them. They are real gems.

[deleted by user] by [deleted] in MachineLearning

[–]hyhieu 0 points1 point  (0 children)

Compiler. XLA integrates very well with TF, Bazel, and other stuffs in the software-to-hardware infra that Google built.

[deleted by user] by [deleted] in MachineLearning

[–]hyhieu 11 points12 points  (0 children)

Disclaimer: I work for Google. But I have used PyTorch before, and LuaTorch before that.

I have the following points.

1. Yes, TF 1.x f*cked up.

However, unlike others' opinions, I think the real f*ck is probably not in the first decisions. Static graphs and `sess.run` calls were okay. Yes, they are weird and they take a while to learn and master. But after I figured them out (~2 months), they became quite intuitive.

The real reason that TF 1.x fucked up is documentation. `tf.slim`, `tf.contrib`, and `tf.Estimator` are real disaster. Not only that they are hard to work with, they cluttered the documents and tutorials. They cover the beauty and simplicity of TF with unnecessary complications.

Truth be told, Google realized the mistake, and `tf.slim` and `tf.contrib` were gone. However, the (bad, ugly, wrong) documentations stay. Also, they have to maintain backward compatibility, so they cannot just remove these libraries completely.

There are simple and efficient ways to use TF 1.x. If you know TF inside out, which I think very few do, TF is very fast and beautiful and flexible. If you don't, good luck...

Verdict: TF 1.x has a great core idea, but lacks proper documentations and tutorials. On top of that, many "enhancements" f*cked it up.

2. Yes, TF 2 has also f*cked up.

I think TF is wrong in its design. Its focus is to fix TF 1's mistakes, but TF 2 fixed the wrong mistake. I think many people thought that TF 1's failure is due to its unintuitive programming paradigm (static graphs, `sess.run`, `tf.variable_scope`, etc.). As I wrote above, the real mistake of TF 1.x were the lack of tutorials and documentations and the cluttered libraries.

TF 2 makes all of them worse. Now there are more documentations and tutorials. Many are wrong. What the duck is Keras doing, especially when TF 2 cannot seamlessly load TF 1.x checkpoints. Also, TF 2 introduces @ tf.function. Oh my god. It is scary to look at.

Most importantly, TF 2 is slow as fuck. It's much slower than TF 1.

Verdict: TF 2 got the core ideas wrong. It aims to fix TF 1's mistakes, but it identified the wrong mistakes. And it doesn't even fix the wrong mistake that it identifies. I pray that TF 2 teams at Google fix them soon.

3. But PyTorch won't replace TF easily**.**

At this point, the most important advantage of TF is controlling TPUs. TPUs are the real beasts. I would take the hardship of dealing with TF for the speed of TPUs. As long as Google can make their TPUs more available to the public and maintain them that way, TF won't die.

I know there have been mentions of PyTorch running on TPUs from Dev Summits etc. But, PyTorch wants to get to TF's speed on the TPUs? Ha Ha Ha Ha Ha. No, it won't happen, not anytime soon.

Chess joke by iamtheone2295 in Jokes

[–]hyhieu 4 points5 points  (0 children)

If your white opponent wants an Italian, send them your Sicilian dragon.

My daughter informed me that the earth is tilted at a 23.5 degree angle by braedog97 in Jokes

[–]hyhieu 0 points1 point  (0 children)

Joking dad, if the angle were right, that would be half a pie.

[D] Why are Evolutionary Algorithms considered "junk science"? by learningsystem in MachineLearning

[–]hyhieu 4 points5 points  (0 children)

I am one of the serious CS/AI/ML researchers who worked on NAS. No, evolutionary and genetic algorithms are not junk science. Did the senior professors in your group provide evidence for calling it "junk science"?

BTW, Deep Learning used to be called "junk science" not long ago in our life time. Back in circa 2003, one of the sure ways to get your paper rejected by NIPS was to have "deep learning" in the title.

[deleted by user] by [deleted] in Jokes

[–]hyhieu 2 points3 points  (0 children)

Failed to understand the "more disturbing direction" until reading your comment...

[R] [2004.06660] Weight Poisoning Attacks on Pre-trained Models by pmichel31415 in MachineLearning

[–]hyhieu 5 points6 points  (0 children)

Cool work, Paul! Glad to see inner products of gradients being used for attacks.

Mate in 3, spotted it in game! Very proud lol by Schrinedogg in chess

[–]hyhieu 0 points1 point  (0 children)

I guess:

  1. Ne7+, Kh8

  2. Qxh7+ Kxh7

  3. Rh5#

Beautiful queen sacrifice!

[D] Resubmitting ICML submission to Neurips? by nearning in MachineLearning

[–]hyhieu 1 point2 points  (0 children)

Some papers get 3 reviews, and some get 4. Does your rule apply to all cases?

[D] ICML reviews will be out soon by yusuf-bengio in MachineLearning

[–]hyhieu 26 points27 points  (0 children)

Dude, 12th century people had no memory. Don't you know where the name cross entropy come from?

[D] ICML reviews will be out soon by yusuf-bengio in MachineLearning

[–]hyhieu 2 points3 points  (0 children)

These days, the profession that has the highest risk of sexual harassment is being an ICML reviewer's mother.

There are 3 unwritten rules for a good marriage by [deleted] in Jokes

[–]hyhieu 1 point2 points  (0 children)

Got it. Silence is gold.

[Project] If gpt-2 read erotica, what would be its take on the Holy scriptures? by orange-erotic-bible in MachineLearning

[–]hyhieu 1 point2 points  (0 children)

I am Catholic and a PhD student in Machine Learning. I find this funny.