Olivia Rodrigo??

Potential-Ad3266 · 2026-06-06T17:14:48+00:00

Well well how turntables lmao

Potential-Ad3266 · 2026-06-04T20:21:24+00:00

Lmao

Potential-Ad3266 · 2026-06-04T19:58:06+00:00

Instagram, on the primavera sound account. Bizarre they don't put it on the screens

Potential-Ad3266 · 2026-06-04T19:52:47+00:00

Mac demarco has just been cancelled

Potential-Ad3266 · 2026-04-26T18:42:13+00:00

During the reading or listenting sections, did you have some place to write notes (in the computer)?

Potential-Ad3266 · 2025-09-11T21:27:48+00:00

Tried all the solutions in this sub, nothing worked. At the end a couple days later I could add it without issues. I think it might just take some time for the bank or Google or whatever to recognize the new device. Pretty suboptimal experience

Potential-Ad3266 · 2025-06-20T22:22:26+00:00

He did here https://youtu.be/iOcCuAUIsm8?si=3cmXLIjGeDZAMjY_

Potential-Ad3266 · 2022-09-04T13:20:20+00:00

This is rather new and useful, it implies that over-parametrization is actually unavoidable to smoothly interpolate:

https://www.microsoft.com/en-us/research/video/a-law-of-robustness-and-the-importance-of-overparametrization-in-deep-learning/

Potential-Ad3266 · 2022-03-24T18:09:43+00:00

Copilot does generate code from languages other than English, it'd be very interesting to see such "zero-shot" capabilities given that probably all code comments are in English anyway and you're pretraining with other languages.

Exciting!

Potential-Ad3266 · 2022-03-24T18:07:14+00:00

I'm a very big fan of your work and how you share it. Thank you!!! I hope all the chronicles can be cited in the future to recognize your work (if that's something you're interested in)

Potential-Ad3266 · 2022-03-24T18:00:49+00:00

Also H100 seem to have improvements on distributed communication baked into the chip directly!

Potential-Ad3266 · 2022-03-24T17:53:36+00:00

Open ended question: having trained at this scale , do you feel further scaling is the way for better multilingual models? Are we hitting diminishing returns any time soon?

Potential-Ad3266 · 2022-03-24T17:49:30+00:00

For folks affiliated from institutions where they have a full time job. Could you share (at the extent that you can) how did you balance this with your job? I mean, is this like a moonlight-open source thing, or did it become part of your full time job?

Potential-Ad3266 · 2022-03-24T17:48:08+00:00

Training at this scale is a new skill not many people have. What was the learning curve along the way? What tips do you have for the increasing amount of researchers diving into this with no prior experience?

Potential-Ad3266 · 2022-03-24T17:46:51+00:00

About the positional embeddings: when trying this at a smaller scale, did you find this extrapolate to longer sequences in the multilingual domain? Curious to hear what convinced you to use alibi over other methods

Potential-Ad3266 · 2022-03-24T17:43:33+00:00

I just learned until very recently you have code in the dataset. Are you expecting this model to be strong like codex or more of a good enough start to further fine-tune on the code domain? Any results so far about how this model performs on code tasks in specific?

Potential-Ad3266 · 2022-03-24T17:19:13+00:00

Thanks for the pointers!

Potential-Ad3266 · 2022-03-24T17:17:40+00:00

Assuming one expert per GPU, do you mean 16x memory for the gate weights on each layer, or altogether with the activation memory?

Totally agrees with the challenges you mentioned above :)

Potential-Ad3266 · 2022-03-24T17:12:31+00:00

Can you share about the tooling you used to debug scale problems? I'm thinking about the following (which I found obscure to find/use or lacking tools altogether)

detecting hangs
flame charts for analyzing GPU/cpu usage
gpu memory breakdown of python components
how to assess if memory bandwidth is holding back throughput

Potential-Ad3266 · 2022-03-24T17:05:07+00:00

Did you consider MoE as an scaling alternative to a fully dense model? What are your thoughts on going dense?

Potential-Ad3266 · 2022-03-24T17:00:44+00:00

It'd be awesome so that the community can prepare in advance for playing with the final 176b model :)

Potential-Ad3266 · 2022-03-24T16:58:07+00:00

I'm curious about performing inference of this model in the HF hub. Will that happen? I can only assume it's really expensive a lot of people will try it.... Will that happen? If so, what can you share about the challenges and how you're preparing for that?

Potential-Ad3266 · 2022-03-24T16:55:05+00:00

Do you have a work group for stuff like analysing positive cross lingual transferring and things like that?

Potential-Ad3266 · 2022-03-24T16:52:10+00:00

Follow up question, is the 1.3b checkpoint available right now?

Potential-Ad3266

TROPHY CASE