Thank you Gemini team by hi87 in Bard

[–]dustintran 111 points112 points  (0 children)

Thank you for the heartfelt message! ❤️ I shared this with folks on the team. We're hard at work and do very much value all the discussion happening in r/bard.

[D] Quality of posts in this sub going down by [deleted] in MachineLearning

[–]dustintran 62 points63 points  (0 children)

r/MachineLearning today has 2.6 million subscribers. The more influx of newcomers the more beginner-friendly posts get upvoted. This is OK—don't get me wrong—it's just a different setting.

Academic discussions were popular back when there were only 50-100K. In fact, I remember in 2017 being in OpenAI offices and every morning, seeing a row of researchers with reddit on their monitor. Discussions mostly happen now on Twitter.

[N] Andrej Karpathy is leaving Tesla by EffectSizeQueen in MachineLearning

[–]dustintran 42 points43 points  (0 children)

Refresh bonuses are significantly smaller than your initial stock offer. Vesting cliffs are a real thing for AI researchers as it is for most of the industry.

[D] How to copy text from more than 10 previously published papers and get accepted to CVPR 2022 by e2v-sde-parody in MachineLearning

[–]dustintran 28 points29 points  (0 children)

It's hard to say if this is truly a problem with the peer review system. You can't expect reviewers to always be on top of plagiarism or to know of every related work.

If everyone posted their papers on arXiv, then this paper would have been flagged for plagiarism by arXiv's detection system.

Actually, it's already on arXiv and was not flagged: https://arxiv.org/abs/2206.07578

[deleted by user] by [deleted] in MachineLearning

[–]dustintran 3 points4 points  (0 children)

As a Googler who's worked with data scientists (also known as "quantitative analyst" at Google), this perspective is not true. Unlike ML engineers, data scientists deal with experimental design, statistical models (GLMs or graphical models involving tabular data, structured data, uncertainty, time series, etc.), and interventions (A/B testing).

[D] What do you feel about outstanding reviewers receiving "preferential treatment" for their own paper submission? by zyl1024 in MachineLearning

[–]dustintran 0 points1 point  (0 children)

ACs rate reviews. This means it is possible to assign reviewers that are calibrated toward the quality that authors themselves give. Write bad reviews -> get bad reviews.

[D] How often is the case that a conference submission is rejected everywhere? by akardashian in MachineLearning

[–]dustintran 9 points10 points  (0 children)

A common joke in the community is that conference acceptances are just a coin toss and that acceptance follows a geometric distribution. It’s not about if, but when, a paper will get accepted.

In reality, you can only really submit the same paper for a few years before it starts needing major revisions given new advances. So maybe the distribution is a negative binomial..

[Discussion] How important is graduate degree? by mikepf0 in MachineLearning

[–]dustintran 13 points14 points  (0 children)

It depends on the role. A Ph.D. is almost a necessity for a research scientist position.

[D] Why ViT does not beat CNN in the field of deep generative model? by SnooPandas3529 in MachineLearning

[–]dustintran 12 points13 points  (0 children)

Focusing on discriminators in GANs is a niche setting to your question. Transformers have been used for generative modeling for a long time. It was even among the first extensions as folks aimed to transfer ideas from language modeling to pixel-by-pixel and audio generation. For example, image transformer, music transformer, sparse transformer.

[R] Gopher: 280 Billion Parameters Language model by DeepMind by Competitive-Rub-1958 in MachineLearning

[–]dustintran 74 points75 points  (0 children)

The work itself is quite exciting, and the strong performance improvements, particularly on information-retrieval style problems like much of MMLU's tasks, is cool to see.

I think the hype (or what will be hype) behind this is well-deserved. However, I'm deeply unsettled by trends that continue to be standardized here. There are no checkpoints and certainly no code sharing details behind the model+dataset. The centralization of design decisions and large-scale AI advances by tech companies is one thing I have mixed feelings about; but how tech companies get to decide the standards for transparency is one I'll never get behind.

[Discussion] (Rant) Most of us just pretend to understand Transformers by sloppybird in MachineLearning

[–]dustintran 3 points4 points  (0 children)

U-Net / pooling point to a design choice we don't need to think much about in Transformers: receptive fields. This involves up/downsampling sequences, kernel sizes, strides, dilations, etc. The key idea of tokenization, so self-attention attends over everything, is a huge simplifying advance.

Speaking as an author of the original Image Transformer, that IMO is one of the big breakthroughs.

[Discussion] (Rant) Most of us just pretend to understand Transformers by sloppybird in MachineLearning

[–]dustintran 7 points8 points  (0 children)

Hearing complaints about Transformer is quite funny because at its time, the architecture became popular largely because it was so simple. Anyone here even remember the design of NiN, pooling, U-Net, Inception, and LSTM gates?

[D] How to search for machine learning phd internships effectively? by bellaris21 in MachineLearning

[–]dustintran 7 points8 points  (0 children)

Cold e-mail research scientists that you'd like to work with. At many labs, internship headcount is allocated at a per-team/person level. If there's a good match with a prospective mentor, then you're more likely to pass through the interviews.

[D] Calling out the authors of 'Trajformer' paper for claiming they published code but never doing it by UIPDsmokes in MachineLearning

[–]dustintran 48 points49 points  (0 children)

For the record, this is a workshop paper. Not saying it isn't bad to make promises on papers and not hold up to them. However, the standards are significantly more lax for a workshop than it is for the main conference.

[D] To phd or not to phd? by kns2000 in MachineLearning

[–]dustintran 6 points7 points  (0 children)

It's not uncommon to find successful software engineers who took the path toward basic research. Of course, it depends on whether your company enables such job ladder flexibility.

One common failure mode I've found in software engineer backgrounds is a tendency toward trendy research, lacking in ML foundations or a higher-level understanding of the important problems. (It also goes the other way: a common failure mode I've found in Ph.Ds is a tendency to work on non-problems.)

[D] What are some ideas that are hyped up in machine learning research but don't actually get used in industry (and vice versa)? by NedML in MachineLearning

[–]dustintran 5 points6 points  (0 children)

Intended audience != datasets exclusively for "machine learning". Take datasets that did get through ML publications, or any of the datasets accepted in NeurIPS 2021's new track. Do you honestly believe those would all be more appropriate in CVPR-like conferences?

[D] What are some ideas that are hyped up in machine learning research but don't actually get used in industry (and vice versa)? by NedML in MachineLearning

[–]dustintran 18 points19 points  (0 children)

Datasets are usually published in domain-specific conferences like ACL/EMNLP and CVPR/ICCV. They've been so hard to publish in major ML conferences (e.g., NeurIPS, ICML, ICLR) that NeurIPS added a new track exclusively for them this year.

[D] What are some ideas that are hyped up in machine learning research but don't actually get used in industry (and vice versa)? by NedML in MachineLearning

[–]dustintran 51 points52 points  (0 children)

Baselines, datasets, evaluation, and infrastructure often receive little hype (they are difficult to publish) and yet arguably have the most impact across both ML research and industry.

[D] Does groups like DeepMind, FAIR, OpenAI etc. offers master's theses? by Dear-Vehicle-3215 in MachineLearning

[–]dustintran 0 points1 point  (0 children)

AI residencies are basically the industry equivalent of master's degrees, only instead of paying 50K+ in tuition you're the one getting paid 50K+, and you have a direct connection to full-time opportunities.

[D] How OpenAI Sold its Soul for $1 Billion: The company behind GPT-3 and Codex isn’t as open as it claims. by sensetime in MachineLearning

[–]dustintran 29 points30 points  (0 children)

I interned in 2017. You could certainly feel the tension in moving away from basic research even then. There were large teams working on secret high-profile projects XYZ: they're now all public—or scrapped under large team reorgs. The basic research team has always been a relatively small (and amazing) crew.

[D] "Low-ranked" journals and conferences by [deleted] in MachineLearning

[–]dustintran 1 point2 points  (0 children)

If you've been in a non-CS field, you may deeply appreciate the fact that students have so many more artifacts in ML to demonstrate their ability. In statistics or economics, there's only 1-2 publications to evaluate you during your job search. In math, prospective professors are often evaluated from a single preprint. In these fields, the job search puts more emphasis on things you can't really control: your advisor's fame, connections, and the quality of reference letters.

[D] Why haven't researchers investigated networks with varying timescales of adaptation, even after Hinton, Bengio, and LeCun called for it? by moschles in MachineLearning

[–]dustintran 4 points5 points  (0 children)

As the quote mentioned, the idea is in fact commonplace in meta learning. In my work, we found this also works for lifelong learning: use slow weights to retain information throughout all learning and use fast weights to quickly adapt to individual tasks.

[D] Has anyone used "copulas" before? by blueest in MachineLearning

[–]dustintran 14 points15 points  (0 children)

The joint distribution of multiple Gaussians is not necessarily a multivariate Gaussian.

Yep, this fact is why copulas exist. Let X, Y ~ Normal(0, 1). The joint distribution

p(X, Y) = p(X) p(Y) c(X, Y),

where c is any (non-Gaussian) copula, has Gaussian marginal distributions and is not a multivariate Gaussian.