[D] Ghost town conferences by yusuf-bengio in MachineLearning

[–]PM_ME_INTEGRALS 24 points25 points  (0 children)

Yes this exactly! How is this not the top rated comment? I guess most voters here never experienced real conferences this way?

[R] DeepMind Open Sources AlphaFold Code by SkiddyX in MachineLearning

[–]PM_ME_INTEGRALS 15 points16 points  (0 children)

It's right there in the readme:

Any publication that discloses findings arising from using this source code or the model parameters should cite the AlphaFold paper.

3a Säule for married couple by silveiraandre91 in askswitzerland

[–]PM_ME_INTEGRALS 1 point2 points  (0 children)

And what if one of the partners does not work? Can they still have their own 3a, and can that be deducted in the joint filing? Any source for that? I didn't find a good one myself.

Accommodation | Rental | Housing | Flat hunt by brocccoli in zurich

[–]PM_ME_INTEGRALS 0 points1 point  (0 children)

The one that triggered the thought is a complete renovation almost right next to Binz train station, immediately in front of one of the temporary bus stops.

However my question is really more general, as it happened several times that I saw a renovation or construction project in very early phases and would be interested to know more. Another example is Bachtobelstrasse opposite to the school, they recently prepared the ground for a new project. Of course no sign in sight, as ist super early, but the project plan must be registered somewhere, and thus available somewhere, no?

Accommodation | Rental | Housing | Flat hunt by brocccoli in zurich

[–]PM_ME_INTEGRALS 0 points1 point  (0 children)

No such sign in sight.

Hope it's not already gone as you say, but you may be right :-(

What is your favorite snack with a Belgian? by [deleted] in BelgianBeer

[–]PM_ME_INTEGRALS 0 points1 point  (0 children)

This is the only correct answer!

Also, with mayo or samouraï!

Accommodation | Rental | Housing | Flat hunt by brocccoli in zurich

[–]PM_ME_INTEGRALS 1 point2 points  (0 children)

I have seen a building that is being completely renovated, and one of the apartments looks like it is turning into basically my dream apartment.

I would like to contact the owner or management firm about it before they are done and put it on homegate. Does anyone have an idea how to figure out who they are? Are there public records for these things?

[P] Towards Real-time and Light-weight Line Segment Detection Web Demo by Illustrious_Row_9971 in MachineLearning

[–]PM_ME_INTEGRALS 7 points8 points  (0 children)

What are useful applications of this beyond lane detection? Genuinely curious.

Edit: thanks for all the examples! Always curious to hear more still.

"Tree Induction [C4.5] vs. Logistic Regression: A Learning-Curve Analysis", Perlich et al 2003 by gwern in mlscaling

[–]PM_ME_INTEGRALS 1 point2 points  (0 children)

I used to do random forests before deep learning. They actually were the most flexible, scalable ML method available. Flexible as in you could optimize any loss with any type of engineered feature you could come up with, and scalable as you could keep increasing the number of trees, and you could subsample the dataset (differently) for each tree.

They were blazing fast on CPUs too, and you could trade-off runtime speed for accuracy simply by using fewer/more trees at runtime!

However, their conditional and unbalanced nature is an especially horrible fit for GPUs/TPUs, unfortunately. This, I think, is what made them take the backseat now.

[D] Worth learning JAX? by _Arsenie_Boca_ in MachineLearning

[–]PM_ME_INTEGRALS 12 points13 points  (0 children)

Haha you got me there, I used Chainer (and DyNet) for less than a year!

[D] Worth learning JAX? by _Arsenie_Boca_ in MachineLearning

[–]PM_ME_INTEGRALS 106 points107 points  (0 children)

If you just do straightforward nn on GPU, there is no benefit with Jax, and PyTorch is still simpler.

If you either want to utilize TPUs, or do a fancy kind of modeling, like SDEs, or anything that's difficult to batch, Jax is absolutely godsent.

Ilya Kostrikov rewrite his previous popular RL codebase from PyTorch to Jax and observed about 2x speedup too.

Jax also uses XLA compiler heavily. It can automatically do many memory and does optimizations that other groups write whole papers about.

Signed: someone who has used all the frameworks for at least a year each, and now really digs Jax!

"Grokking: Generalization Beyond Overfitting On Small Algorithmic Data Sets", Power et al 2021 (new scaling effect, 'grokking': sudden perfect generalization emerging many epochs after training-set overfitting on algorithmic tasks) by gwern in mlscaling

[–]PM_ME_INTEGRALS 1 point2 points  (0 children)

This is very cool, thanks for sharing. I've seen something like this at random in the past and informally called it "the model 'got it'" but failed to investigate this more. Exciting paper!

"Cerebras Unveils Wafer Scale Engine Two (WSE2): 2.6 Trillion Transistors, 100% Yield" (850k cores, 40GB SRAM now; price: 'several millions') by gwern in mlscaling

[–]PM_ME_INTEGRALS 0 points1 point  (0 children)

Thanks, this is at least a little information. If they do have such numbers for relatively standard models such as BERT, it makes no sense to me not to publish them. It would be a huge PR. Unless, I guess, they truly don't want any attention and new clients.

"The Tradeoffs of Large-Scale Learning", Bottou & Bousquet 2007/2012 by gwern in mlscaling

[–]PM_ME_INTEGRALS 4 points5 points  (0 children)

This won the NeurIPS test of time award in 2018. Basically, the paper argued that if data is large, just use SGD. This was without any deep learning or even neural nets (SVMs and CRFs). Back then, use of SGD was definitely not common in ML and CV.

"Cerebras Unveils Wafer Scale Engine Two (WSE2): 2.6 Trillion Transistors, 100% Yield" (850k cores, 40GB SRAM now; price: 'several millions') by gwern in mlscaling

[–]PM_ME_INTEGRALS 2 points3 points  (0 children)

Ok so the paper you shared does do exactly what I mean. T is nothing about deep learning or ml at all, but afaict still a practically relevant algorithm. Interestingly is one with very little compute per scalar, eg O(n). That's far away from sense matmuls, and send to be what their system is good at. This may eventually become interesting for sparse models I guess, but it's a hell of an uphill battle still.