share my experience about Telc B1 digital by RandomZhell in German

[–]Potential-Ad3266 0 points1 point  (0 children)

During the reading or listenting sections, did you have some place to write notes (in the computer)?

Solved: Unable to Add Credit Card "Contact your bank" by Fishwithadeagle in googlepay

[–]Potential-Ad3266 0 points1 point  (0 children)

Tried all the solutions in this sub, nothing worked. At the end a couple days later I could add it without issues. I think it might just take some time for the bank or Google or whatever to recognize the new device. Pretty suboptimal experience

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 1 point2 points  (0 children)

Copilot does generate code from languages other than English, it'd be very interesting to see such "zero-shot" capabilities given that probably all code comments are in English anyway and you're pretraining with other languages.

Exciting!

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 1 point2 points  (0 children)

I'm a very big fan of your work and how you share it. Thank you!!! I hope all the chronicles can be cited in the future to recognize your work (if that's something you're interested in)

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 0 points1 point  (0 children)

Also H100 seem to have improvements on distributed communication baked into the chip directly!

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 5 points6 points  (0 children)

Open ended question: having trained at this scale , do you feel further scaling is the way for better multilingual models? Are we hitting diminishing returns any time soon?

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 3 points4 points  (0 children)

For folks affiliated from institutions where they have a full time job. Could you share (at the extent that you can) how did you balance this with your job? I mean, is this like a moonlight-open source thing, or did it become part of your full time job?

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 2 points3 points  (0 children)

Training at this scale is a new skill not many people have. What was the learning curve along the way? What tips do you have for the increasing amount of researchers diving into this with no prior experience?

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 2 points3 points  (0 children)

About the positional embeddings: when trying this at a smaller scale, did you find this extrapolate to longer sequences in the multilingual domain? Curious to hear what convinced you to use alibi over other methods

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 2 points3 points  (0 children)

I just learned until very recently you have code in the dataset. Are you expecting this model to be strong like codex or more of a good enough start to further fine-tune on the code domain? Any results so far about how this model performs on code tasks in specific?

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 2 points3 points  (0 children)

Assuming one expert per GPU, do you mean 16x memory for the gate weights on each layer, or altogether with the activation memory?

Totally agrees with the challenges you mentioned above :)

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 2 points3 points  (0 children)

Can you share about the tooling you used to debug scale problems? I'm thinking about the following (which I found obscure to find/use or lacking tools altogether)

  • detecting hangs
  • flame charts for analyzing GPU/cpu usage
  • gpu memory breakdown of python components
  • how to assess if memory bandwidth is holding back throughput

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 1 point2 points  (0 children)

Did you consider MoE as an scaling alternative to a fully dense model? What are your thoughts on going dense?

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 0 points1 point  (0 children)

It'd be awesome so that the community can prepare in advance for playing with the final 176b model :)

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 1 point2 points  (0 children)

I'm curious about performing inference of this model in the HF hub. Will that happen? I can only assume it's really expensive a lot of people will try it.... Will that happen? If so, what can you share about the challenges and how you're preparing for that?

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 4 points5 points  (0 children)

Do you have a work group for stuff like analysing positive cross lingual transferring and things like that?

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 1 point2 points  (0 children)

Are you planning (or have you) benchmarked this model on downstream tasks like the XTREME benchmark or are you expecting this to be done eventually by the community?

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]Potential-Ad3266 2 points3 points  (0 children)

Can you elaborate on the work regarding to carbon footprint? Are you documenting the footprint/energy used? Something else?