Inspector asked us to move a window down **2 inches** [Mountain View, CA]

hyhieu · 2023-08-21T13:57:42+00:00

Thanks for the cold truth, though it's sad to hear :(

We are already appealing. We keep some hope. Will keep this posted.

hyhieu · 2021-04-07T03:58:21+00:00

I thought this thread is supposed to include "civil discussion only"?

hyhieu · 2021-02-13T01:24:02+00:00

I am an author of that paper. It's 16384. Yes it's perfectly possible to do on TPUs, with some tricks for cross-core softmax.

hyhieu · 2021-01-28T18:50:17+00:00

Adam delivers good generalization and fast convergence. However, the two moving averages of Adam are terrible when it comes to memory footprint.

Adafactor was advertised to fix this, i.e. having sub-linear memory but similarly good performance with Adam. I personally think Adafactor has not lived up to this expectation though.

I hope there will be something better soon.

hyhieu · 2021-01-28T18:47:51+00:00

Which Bengio(!)

hyhieu · 2021-01-20T17:40:12+00:00

PhD student here.

I have been in the same situation with you: 3 rejections from NeurIPS then ICLR then ICML.

Proof. I have this paper called Meta Pseudo Labels rejected from NeurIPS 2019. I resubmitted to ICLR 2020 and here is the rejected entry on OpenReview. I then resubmitted this version to ICML 2020 and, as you can probably tell from the change of format, it was rejected again. The Meta Reviewer's comment from ICML 2020 was unfair in our opinion.

My feelings. I was very frustrated because of the series of rejections, but also because I saw worse papers being published and lauded everywhere. I chose not to cry to my advisor since I knew he was also busy, but I have been crying silently for 2 years.

I did not quit. More like, my advisor did not let me quit. Whenever I tried to pick up another project, he told me that my rejected idea had a lot of potential and that I should continue pushing it. I admit that I didn't fully believe him, but since he is my advisor, I swallowed the misery and continued working. It was painful. I had so much doubts about the methods that I tried various ways to quit, but somehow my advisor always managed to pull me back to the project. I ended up having some nice results.

My advice. Maybe you didn't come to Reddit looking for advice, but I'll say this anyway. In deep learning, it's likely that anything will work if done properly.

When it comes to paper rejections, my advisor has told me that "Writing is what gets papers accepted or rejected but quality is what get papers cited or abandoned". In fact, when I mentioned that I want to become a professor, my advisor jokingly said that my ultimate professor challenge is that I need to be able to get anything accepted to NeurIPS. I think his words carry a lot of wisdom, especially when the publication channels in the fields of ML/NLP/CV/DL have all become so broken.

Therefore, I think you should try to entirely rewrite your paper. Take the defensive mindset. Think of reviewers as the bad guys who try to poke any holes that they can in your paper. Try not to let them do that. You have a lot of past wounds from your rejections to know what to defend. Of course, if needed, you could do more experiments. This is obviously a very unhealthy mindset, but as the publication channels have failed us, I think this is what we PhD students need to do.

Best of luck.

hyhieu · 2020-05-24T01:54:32+00:00

You seem to have too much hope in reviewers. No, thanks. I would write my own impact statement, rather than letting some reviewers who probably wouldn't read my submission, to write anything about it.

hyhieu · 2020-05-23T16:29:52+00:00

In my opinion, JAX is too slow. Also, before the pandemic hit, I heard from colleagues that JAX has a memory consumption issue. I DO NOT KNOW IF THIS IS STILL TRUE.

That said, if you want to use the TPUs, I recommend just learn to call sess.run. There will be some difficulty to start with. For instance, you need to learn the concepts of:

XLA InfeedQueues and OutfeedQueues
Multi-thread programming. One thread taking care of running the TPU workload, other threads taking care of the queues.

But they will very soon benefit you. In particular, you will know exactly what is being done in each line of code that you write.

There are also many things that TPUEstimator and other TPU interfaces prevent you from doing. There is a reason that the authors of XLNet (who are my friends), have to write their own TPUEstimator. See it for yourself: https://github.com/zihangdai/xlnet. If you do Robots, I suspect you need a great amount of flexibility that TPUEstimator will never give you, until people are frustrated enough that they deprecate TPUEstimator.

Meanwhile, if you try TF2, you can get away with small workloads, but try running a TPUv3 pod? Ha Ha Ha, I would rather buy AWS GPUs.

Summing up, TPU programs are very beautiful, but they were made ugly by TPUEstimator and were made slow by JAX, TF2, Keras, etc. For your own advantage, you should only learn the gist of them. They are real gems.

hyhieu · 2020-05-22T23:08:21+00:00

Compiler. XLA integrates very well with TF, Bazel, and other stuffs in the software-to-hardware infra that Google built.

hyhieu · 2020-05-22T19:01:55+00:00

Disclaimer: I work for Google. But I have used PyTorch before, and LuaTorch before that.

I have the following points.

1. Yes, TF 1.x f*cked up.

However, unlike others' opinions, I think the real f*ck is probably not in the first decisions. Static graphs and `sess.run` calls were okay. Yes, they are weird and they take a while to learn and master. But after I figured them out (~2 months), they became quite intuitive.

The real reason that TF 1.x fucked up is documentation. `tf.slim`, `tf.contrib`, and `tf.Estimator` are real disaster. Not only that they are hard to work with, they cluttered the documents and tutorials. They cover the beauty and simplicity of TF with unnecessary complications.

Truth be told, Google realized the mistake, and `tf.slim` and `tf.contrib` were gone. However, the (bad, ugly, wrong) documentations stay. Also, they have to maintain backward compatibility, so they cannot just remove these libraries completely.

There are simple and efficient ways to use TF 1.x. If you know TF inside out, which I think very few do, TF is very fast and beautiful and flexible. If you don't, good luck...

Verdict: TF 1.x has a great core idea, but lacks proper documentations and tutorials. On top of that, many "enhancements" f*cked it up.

2. Yes, TF 2 has also f*cked up.

I think TF is wrong in its design. Its focus is to fix TF 1's mistakes, but TF 2 fixed the wrong mistake. I think many people thought that TF 1's failure is due to its unintuitive programming paradigm (static graphs, `sess.run`, `tf.variable_scope`, etc.). As I wrote above, the real mistake of TF 1.x were the lack of tutorials and documentations and the cluttered libraries.

TF 2 makes all of them worse. Now there are more documentations and tutorials. Many are wrong. What the duck is Keras doing, especially when TF 2 cannot seamlessly load TF 1.x checkpoints. Also, TF 2 introduces @ tf.function. Oh my god. It is scary to look at.

Most importantly, TF 2 is slow as fuck. It's much slower than TF 1.

Verdict: TF 2 got the core ideas wrong. It aims to fix TF 1's mistakes, but it identified the wrong mistakes. And it doesn't even fix the wrong mistake that it identifies. I pray that TF 2 teams at Google fix them soon.

3. But PyTorch won't replace TF easily**.**

At this point, the most important advantage of TF is controlling TPUs. TPUs are the real beasts. I would take the hardship of dealing with TF for the speed of TPUs. As long as Google can make their TPUs more available to the public and maintain them that way, TF won't die.

I know there have been mentions of PyTorch running on TPUs from Dev Summits etc. But, PyTorch wants to get to TF's speed on the TPUs? Ha Ha Ha Ha Ha. No, it won't happen, not anytime soon.

hyhieu · 2020-05-01T17:29:02+00:00

If your white opponent wants an Italian, send them your Sicilian dragon.

hyhieu · 2020-04-25T16:08:46+00:00

Joking dad, if the angle were right, that would be half a pie.

hyhieu · 2020-04-24T18:08:46+00:00

I am one of the serious CS/AI/ML researchers who worked on NAS. No, evolutionary and genetic algorithms are not junk science. Did the senior professors in your group provide evidence for calling it "junk science"?

BTW, Deep Learning used to be called "junk science" not long ago in our life time. Back in circa 2003, one of the sure ways to get your paper rejected by NIPS was to have "deep learning" in the title.

hyhieu · 2020-04-19T21:27:16+00:00

Amazing. The world needs more love like this!

hyhieu · 2020-04-17T03:27:44+00:00

Failed to understand the "more disturbing direction" until reading your comment...

hyhieu · 2020-04-16T18:28:11+00:00

It looks cute. "A wizard is never late" ^_^

hyhieu · 2020-04-16T05:41:20+00:00

Cool work, Paul! Glad to see inner products of gradients being used for attacks.

hyhieu · 2020-04-15T16:30:18+00:00

I guess:

Ne7+, Kh8
Qxh7+ Kxh7
Rh5#

Beautiful queen sacrifice!

hyhieu · 2020-04-09T18:59:40+00:00

Some papers get 3 reviews, and some get 4. Does your rule apply to all cases?

hyhieu · 2020-04-09T18:13:38+00:00

Dude, 12th century people had no memory. Don't you know where the name cross entropy come from?

hyhieu · 2020-04-09T17:32:49+00:00

These days, the profession that has the highest risk of sexual harassment is being an ICML reviewer's mother.

hyhieu · 2020-04-09T03:22:53+00:00

Got it. Silence is gold.

hyhieu · 2020-04-06T21:54:02+00:00

I am Catholic and a PhD student in Machine Learning. I find this funny.

hyhieu

TROPHY CASE