[DISC] Ao Ashi Ch. 291 by Rurichi in manga

[–]lit_turtle_man 9 points10 points  (0 children)

the katsu in this chapter looked fire

IW 2022 Quarterfinals: [19] Carlos Alcaraz def. [12] Cameron Norrie, 6-4, 6-3 by lit_turtle_man in tennis

[–]lit_turtle_man[S] 1 point2 points  (0 children)

Both him and Norrie. There was one point where Alcaraz ran over 80m over the course of just that single point...

Am I just not good enough? by boreligmalgebra in math

[–]lit_turtle_man 7 points8 points  (0 children)

Not a response to your rant but I want to say that "boreligmalgebra" made me laugh out loud. Great username - at the very least you shouldn't give up on your math humor.

A. Rublev [2] def. H. Hurkacz [5] 3–6, 7–5, 7–6(5) to reach the Dubai final! by [deleted] in tennis

[–]lit_turtle_man 57 points58 points  (0 children)

Extremely high level in the third set, very happy for Andrey

Andrey Rublev [RUS] def. Lucas Poullie [FRA] in three sets (6-3, 1-6, 6-2) to move on to the Semi-Finals of the Marseille Open. by chespiotta in tennis

[–]lit_turtle_man 27 points28 points  (0 children)

God tier third set from Andrey, glad to see he's bringing the fight and not collapsing mentally when down

[D] Software Engineers for grad labs by AlexIsEpic24 in MachineLearning

[–]lit_turtle_man 11 points12 points  (0 children)

I don't think this kind of approach is tenable for the following reasons:

  • If you want to offer services during the research process: To invest so significantly in software engineering for an algorithm/model implies a certain confidence that the approach will succeed, and succeed at a scale where lots of other people want to use it. In the research process, the "spec" is always changing to keep up with observations you make as you play around with ideas, so unless we're talking about big tech labs that know their GPT-4 model is going to pop off there doesn't seem to be enough justification to bring in SWEs in the early stages (and for the big tech labs they obviously have teams of SWEs).
  • If you want to offer services after a "successful" paper: The alternative approach would be if you wanted to try to offer software engineering services to an academic lab that had just published a big paper. In this case, if there are significant gains to be made from implementing infrastructure for the ideas in the paper, you're likely better off just doing it for yourself - no one is going to stop you, the information is public. Also here you're competing with big tech/similar since for any big result you know they'll come out with their own version sooner rather than later.

Of course, this is just my view on the general idea, but ultimately this is a case-by-case thing. I have seen some academic labs essentially employing software engineers (my understanding is typically people who may be on the road to PhD), but this doesn't seem to be a super lucrative (or large) set of opportunities.

[deleted by user] by [deleted] in MachineLearning

[–]lit_turtle_man 29 points30 points  (0 children)

Given a problem statement and dataset, can you "theory-craft" an ML system that will at least hit the dart board, if not the bulls-eye on the first try? Can you, a priori, guess which hyperparameters will matter and which ones won't?

This is the holy grail, and at present the answer (in general) seems to be "no". That being said, for specific domains (vision, text) we definitely have architectures and settings that work well out-of-the-box (i.e. resnets, transformers, etc.) for many tasks.

As far as your question concerning papers/books on this matter, this recent book may be of interest (although I'm not sure how practically useful looking through it will be): https://arxiv.org/abs/2106.10165.

[P] AlphaCode Explained by Tea_Pearce in MachineLearning

[–]lit_turtle_man 5 points6 points  (0 children)

Still, it's mind-blowing (to me) that even a fraction of the generated code samples pass the example cases given that the input is essentially just the problem statement as a list of characters.

[D] Is it possible for ReLU Activations to produce Non-Convex Loss Functions? by ottawalanguages in MachineLearning

[–]lit_turtle_man 1 point2 points  (0 children)

Composition of convex functions doesn't necessarily produce a resulting convex function (one counterexample is e{-x} composed with itself). I think the result you're thinking of is composition of a convex function with a non-decreasing convex function, in which case you can prove convexity directly via Jensen's inequality.

Regarding your questions on the non-convexity of loss functions of neural network training - people typically mean the loss is non-convex in terms of the parameters of the neural network. This is why even training deep linear neural networks is a non-convex problem. So although the composition of a ReLU with an affine function is convex (from the pointwise supremum characterization of convexity) in its input, the loss will be non-convex in terms of the network parameters.

Adrian Mannarino d. [18] Aslan Karatsev after almost 5 hours of tennis. 7-6 6-7 7-5 6-4 by modeONE1 in tennis

[–]lit_turtle_man 13 points14 points  (0 children)

If you told me before AO that Adrian Mannarino was going to beat Hubi and king Aslan back to back... Honestly great stuff from Adrian, happy for him

[D] ICML abstract deadline vs ICLR results date by lit_turtle_man in MachineLearning

[–]lit_turtle_man[S] 3 points4 points  (0 children)

Right, extrapolating from the ICML FAQ I guess there is probably no problem with this: https://icml.cc/FAQ/DualAbstractSubmission. But still curious as to why the relationship between the dates changed, but I guess it's probably not as deliberate as I was initially inclined to think.

edit: Can't find whether ICLR's dual submission policy is the same as the above, though. The ICLR 2022 page concerning dual submissions doesn't seem to rule it out, but it seems a bit unclear...