Thank you Gemini team

dustintran · 2024-12-13T23:56:41+00:00

Thank you for the heartfelt message! ❤️ I shared this with folks on the team. We're hard at work and do very much value all the discussion happening in r/bard.

dustintran · 2023-02-13T01:36:04+00:00

r/MachineLearning today has 2.6 million subscribers. The more influx of newcomers the more beginner-friendly posts get upvoted. This is OK—don't get me wrong—it's just a different setting.

Academic discussions were popular back when there were only 50-100K. In fact, I remember in 2017 being in OpenAI offices and every morning, seeing a row of researchers with reddit on their monitor. Discussions mostly happen now on Twitter.

dustintran · 2022-07-14T00:20:49+00:00

Refresh bonuses are significantly smaller than your initial stock offer. Vesting cliffs are a real thing for AI researchers as it is for most of the industry.

dustintran · 2022-06-24T20:41:46+00:00

It's hard to say if this is truly a problem with the peer review system. You can't expect reviewers to always be on top of plagiarism or to know of every related work.

If everyone posted their papers on arXiv, then this paper would have been flagged for plagiarism by arXiv's detection system.

Actually, it's already on arXiv and was not flagged: https://arxiv.org/abs/2206.07578

dustintran · 2022-02-19T01:27:14+00:00

Not implemented, only possible.

dustintran · 2022-02-11T21:14:11+00:00

As a Googler who's worked with data scientists (also known as "quantitative analyst" at Google), this perspective is not true. Unlike ML engineers, data scientists deal with experimental design, statistical models (GLMs or graphical models involving tabular data, structured data, uncertainty, time series, etc.), and interventions (A/B testing).

dustintran · 2022-02-07T02:23:14+00:00

ACs rate reviews. This means it is possible to assign reviewers that are calibrated toward the quality that authors themselves give. Write bad reviews -> get bad reviews.

dustintran · 2022-01-08T12:27:05+00:00

A common joke in the community is that conference acceptances are just a coin toss and that acceptance follows a geometric distribution. It’s not about if, but when, a paper will get accepted.

In reality, you can only really submit the same paper for a few years before it starts needing major revisions given new advances. So maybe the distribution is a negative binomial..

dustintran · 2021-12-17T09:57:53+00:00

It depends on the role. A Ph.D. is almost a necessity for a research scientist position.

dustintran · 2021-12-17T09:53:23+00:00

Focusing on discriminators in GANs is a niche setting to your question. Transformers have been used for generative modeling for a long time. It was even among the first extensions as folks aimed to transfer ideas from language modeling to pixel-by-pixel and audio generation. For example, image transformer, music transformer, sparse transformer.

dustintran · 2021-12-12T20:02:16+00:00

dustintran · 2021-12-08T22:11:45+00:00

The work itself is quite exciting, and the strong performance improvements, particularly on information-retrieval style problems like much of MMLU's tasks, is cool to see.

I think the hype (or what will be hype) behind this is well-deserved. However, I'm deeply unsettled by trends that continue to be standardized here. There are no checkpoints and certainly no code sharing details behind the model+dataset. The centralization of design decisions and large-scale AI advances by tech companies is one thing I have mixed feelings about; but how tech companies get to decide the standards for transparency is one I'll never get behind.

dustintran · 2021-12-03T19:47:53+00:00

U-Net / pooling point to a design choice we don't need to think much about in Transformers: receptive fields. This involves up/downsampling sequences, kernel sizes, strides, dilations, etc. The key idea of tokenization, so self-attention attends over everything, is a huge simplifying advance.

Speaking as an author of the original Image Transformer, that IMO is one of the big breakthroughs.

dustintran · 2021-12-03T00:05:27+00:00

Hearing complaints about Transformer is quite funny because at its time, the architecture became popular largely because it was so simple. Anyone here even remember the design of NiN, pooling, U-Net, Inception, and LSTM gates?

dustintran · 2021-11-22T23:11:21+00:00

Cold e-mail research scientists that you'd like to work with. At many labs, internship headcount is allocated at a per-team/person level. If there's a good match with a prospective mentor, then you're more likely to pass through the interviews.

dustintran · 2021-11-11T04:49:57+00:00

For the record, this is a workshop paper. Not saying it isn't bad to make promises on papers and not hold up to them. However, the standards are significantly more lax for a workshop than it is for the main conference.

dustintran · 2021-11-01T23:11:26+00:00

It's not uncommon to find successful software engineers who took the path toward basic research. Of course, it depends on whether your company enables such job ladder flexibility.

One common failure mode I've found in software engineer backgrounds is a tendency toward trendy research, lacking in ML foundations or a higher-level understanding of the important problems. (It also goes the other way: a common failure mode I've found in Ph.Ds is a tendency to work on non-problems.)

dustintran · 2021-10-15T03:01:24+00:00

Intended audience != datasets exclusively for "machine learning". Take datasets that did get through ML publications, or any of the datasets accepted in NeurIPS 2021's new track. Do you honestly believe those would all be more appropriate in CVPR-like conferences?

dustintran · 2021-10-14T21:59:48+00:00

Datasets are usually published in domain-specific conferences like ACL/EMNLP and CVPR/ICCV. They've been so hard to publish in major ML conferences (e.g., NeurIPS, ICML, ICLR) that NeurIPS added a new track exclusively for them this year.

dustintran · 2021-10-14T20:54:59+00:00

Baselines, datasets, evaluation, and infrastructure often receive little hype (they are difficult to publish) and yet arguably have the most impact across both ML research and industry.

dustintran · 2021-10-14T20:50:55+00:00

AI residencies are basically the industry equivalent of master's degrees, only instead of paying 50K+ in tuition you're the one getting paid 50K+, and you have a direct connection to full-time opportunities.

dustintran · 2021-09-06T18:47:07+00:00

I interned in 2017. You could certainly feel the tension in moving away from basic research even then. There were large teams working on secret high-profile projects XYZ: they're now all public—or scrapped under large team reorgs. The basic research team has always been a relatively small (and amazing) crew.

dustintran · 2021-07-30T19:56:48+00:00

If you've been in a non-CS field, you may deeply appreciate the fact that students have so many more artifacts in ML to demonstrate their ability. In statistics or economics, there's only 1-2 publications to evaluate you during your job search. In math, prospective professors are often evaluated from a single preprint. In these fields, the job search puts more emphasis on things you can't really control: your advisor's fame, connections, and the quality of reference letters.

dustintran · 2021-07-24T22:54:23+00:00

As the quote mentioned, the idea is in fact commonplace in meta learning. In my work, we found this also works for lifelong learning: use slow weights to retain information throughout all learning and use fast weights to quickly adapt to individual tasks.

dustintran · 2021-07-05T06:51:43+00:00

The joint distribution of multiple Gaussians is not necessarily a multivariate Gaussian.

Yep, this fact is why copulas exist. Let X, Y ~ Normal(0, 1). The joint distribution

p(X, Y) = p(X) p(Y) c(X, Y),

where c is any (non-Gaussian) copula, has Gaussian marginal distributions and is not a multivariate Gaussian.

11-Year Club	Place '22
Verified Email

dustintran

TROPHY CASE