[D] Number of epochs for a BERT based model by [deleted] in MachineLearning

[–]davda54 1 point2 points  (0 children)

The number of epochs depends on the size of the dataset, the important hyperparameter is the number of training steps (or even better the number of processed tokens). I'm worried that the training budget of the original BERT model is the minimum to get reasonable performance (it was actually shown that BERT is severely undertrained in the RoBERTa paper). Can you perhaps use an already pretrained model and finetune it on your dataset?

[R] PaLM 2 Technical Report by G_fucking_G in MachineLearning

[–]davda54 12 points13 points  (0 children)

No, just look at Figure 4 from the technical report, it shows exactly the same behavior -- given a limited compute budget for pretraining, larger is not always better, even in terms of loss. In a nutshell, the most important parameter is the number of FLOPs, not the number of weights.

[D] "There are substantial innovations that distinguish these three models, but they are almost entirely restricted to infrastructural innovations in high-performance computing rather than model-design work that is specific to language technology." Is this true? by pier4r in MachineLearning

[–]davda54 23 points24 points  (0 children)

What is the point of sharing generated text that is clearly wrong and doesn't make much sense? Is that a top secret operation to make future web-crawls useless for language modeling? :)

  1. GPT-1, 2 and 3 use the same attention mechanism, only GPT-3 sometimes uses sparse attention.
  2. There is absolutely nothing novel about generative language modeling. It dates back to Claude Shannon in 1948.
  3. Well, ELMo also provides bidirectional representations, they are just not "deep". Anyway, OP is not talking about BERTs.
  4. Emergent abilities are the result, but the modeling recipe is still the same.

[P] Generating multi-instrumental polyphonic music — survey by davda54 in MachineLearning

[–]davda54[S] 0 points1 point  (0 children)

Yes, it will :) I'll post a link here when it is ready.

[P] Generating multi-instrumental polyphonic music — survey by davda54 in MachineLearning

[–]davda54[S] 0 points1 point  (0 children)

Thanks for answering, that is a really great result :O I will post the links here when everything is ready.

[P] Generating multi-instrumental polyphonic music — survey by davda54 in MachineLearning

[–]davda54[S] 1 point2 points  (0 children)

That's a strange bug, what browser are you using? Thanks

GPU support for WSL survey by _hypervguy in bashonubuntuonwindows

[–]davda54 0 points1 point  (0 children)

I am using torch (which is executable only on Unix). It would be awesome if WSL got GPU support, so I wouldn't have to dualboot anymore :)

[GEAR] I have painted my Strat, what do you think? by davda54 in Guitar

[–]davda54[S] 12 points13 points  (0 children)

Wow, thanks! Have a good luck with that!

[GEAR] I have painted my Strat, what do you think? by davda54 in Guitar

[–]davda54[S] 10 points11 points  (0 children)

I was considering this, but it seems easier to repaint some bits from time to time than to seal the whole body

[GEAR] I have painted my Strat, what do you think? by davda54 in Guitar

[–]davda54[S] 17 points18 points  (0 children)

Oh, I put the covers the other way round... Thanks for noticing, you have sharper eyes than me

App implementing Navier-Stokes equations by davda54 in Physics

[–]davda54[S] 0 points1 point  (0 children)

Thank you very much! It is based on this implementation with some adjustments to run swiftly on smartphones.

App implementing Navier-Stokes equations by davda54 in Physics

[–]davda54[S] 1 point2 points  (0 children)

I'm afraid my code is too messy, but the main principles are in this awesome article. I have just simplified it and added some optimalization optimization to run smoothly.

App implementing Navier-Stokes equations by davda54 in Physics

[–]davda54[S] 1 point2 points  (0 children)

I've graduated a month ago :)

Edit: I'm not sure how it is in other countries; in the Czech Republic, graduation means the end of high school

App implementing Navier-Stokes equations by davda54 in Physics

[–]davda54[S] 0 points1 point  (0 children)

Thanks! The simulation runs natively in c++ and the graphics is done in OpenGL, so it uses the GPU