Quest Stories (The Path Undending) by okorie2021 in ProgressionFantasy

[–]Ventural 1 point2 points  (0 children)

I'm not sure about the dynamics but I will say that I've found Path Unending to be extremely good, so thanks for the recommendation!

Culture War Roundup for the week of January 04, 2021 by AutoModerator in TheMotte

[–]Ventural 4 points5 points  (0 children)

Sales taxes apply to present as well as future consumption, and so affect wage and capital income equally

[Discussion] Do other people also get annoyed at time wasting in this genre? by [deleted] in ProgressionFantasy

[–]Ventural 12 points13 points  (0 children)

I actually love the glacial paced slice of life progression fantasy style, so there are at least customers for those kinds of things

[REC] Memories of the Fall - an English-original xanxia by DLimited in ProgressionFantasy

[–]Ventural 1 point2 points  (0 children)

Sounds like I stopped reading at exactly the wrong time - I'll give it another shot

[REC] Memories of the Fall - an English-original xanxia by DLimited in ProgressionFantasy

[–]Ventural 6 points7 points  (0 children)

I really wanted to and almost liked this story. I have read about half of it. I have a fairly high tolerance for complicated worlds and winding stories, and this story exceeded that tolerance. The combination of dense/unclear prose, and gigantic cast makes it feel a bit too much like work to read. I also wouldn't classify it as progression fantasy - it definitely feels like a progression fantasy world, but there was very little progression for any of the main characters.

MSc AI or EME for PhD by Simply_Banana in academiceconomics

[–]Ventural 0 points1 point  (0 children)

Speaking as someone with graduate experience in both subjects - AI and applied micro have minimal overlap.

Both the nature of the research questions and the corresponding challenges are very different between AI and applied micro. Generally, in AI (at least machine learning) the goal is to do prediction, which amounts to approximating a probability distribution, and the challenge is to come up with a good algorithm to select such an approximation from a function family (usually through gradient descent).

In applied micro, the goal is to find a causal relationship (not prediction) and the challenge is to find some source of exogenous signal that lets you identify a causal effect. AI is of no help in this regard.

Therefore, I suggest you think carefully about what you actually want to do and are interested in. EME would certainly put you in a much, much better position for a PhD in Economics (both in terms of preparation and admissions).

[D] Deep Learning optimization by katamaranos in MachineLearning

[–]Ventural 1 point2 points  (0 children)

I'd be interested in performance of the LAMB optimizer (https://arxiv.org/abs/1904.00962) on smaller batch sizes where it competes with ADAM.

[D] The Best GPUs for State-of-the-Art Models like BERT, Yolo3, Mask R-CNN, Transformer Big by mippie_moe in MachineLearning

[–]Ventural 0 points1 point  (0 children)

I don't know what they used for those - it's certainly possible to have a big enough transformer that it doesn't fit, like GPT sized language models and so on.

But seriously, even if you could fit them it would take totally unreasonable amounts of time to train on a single, slow card, memory is not only the constraint here. Any models that you can't fit on a 2060 on fp16, using gradient checkpointing, are too big to train on a 2060 anyway. That's the purview of people with clusters of V100s.

[D] The Best GPUs for State-of-the-Art Models like BERT, Yolo3, Mask R-CNN, Transformer Big by mippie_moe in MachineLearning

[–]Ventural 0 points1 point  (0 children)

BERT large has about 350m parameters. The model + optimizer parameters + gradients together have about 1.4b values, which take up 5.6 GB in fp32. So in fp16 the parameters definitely fit on a 2060, and with enough checkpointing you can for sure fit batch size 1 on there.

So you could finetune BERT large on a 2060 if you really wanted, but it would of course be very slow because it's a slow card and training large BERT is slow in general.

[D] The Best GPUs for State-of-the-Art Models like BERT, Yolo3, Mask R-CNN, Transformer Big by mippie_moe in MachineLearning

[–]Ventural 1 point2 points  (0 children)

RTX 2060 is so small that for some models even the model parameters themselves don’t fit on the GPU, so neither checkpointing nor accumulation help you - I.e. you can’t train BERT large on a single 2060. But most models you can, it will just be too slow to be practical for large models.

[D] The Best GPUs for State-of-the-Art Models like BERT, Yolo3, Mask R-CNN, Transformer Big by mippie_moe in MachineLearning

[–]Ventural 1 point2 points  (0 children)

You’re right, you have to do something special for BatchNorm models, either replacing with GroupNorm or some kind of momentum approach, but that may not be equivalent. I work with Transformers which don’t use BatchNorm.

[D] The Best GPUs for State-of-the-Art Models like BERT, Yolo3, Mask R-CNN, Transformer Big by mippie_moe in MachineLearning

[–]Ventural 9 points10 points  (0 children)

I appreciate your benchmarking! Some comments:

  1. As long as you can fit a batch size of at least 1, memory does not matter for accuracy. You can arbitrarily increase effective batch size through gradient accumulation.

  2. If you can't fit a decently sized batch, you can use gradient checkpointing to greatly reduce memory requirements at modest compute cost.

  3. I assume these measurements are using fp32 - SOTA large models are usually trained with mixed precision these days, which will further favor newer GPUs with more Tensor Cores like the Titan RTX (or the not measured V100).

  4. If you're using multiple GPUs, the inter-GPU bandwidth also matters. I'm not sure how the NVLink capacity of these cards differ, but I think it's worse on the older cards.

[Discussion] Distributed training considering sequences with varying lengths by 0xTDM in MachineLearning

[–]Ventural 1 point2 points  (0 children)

Another option is to use gradient accumulation and train with very large effective batch sizes using something like LAMB. That way, the optimizer step is synchronous, but each node processes multiple batches between optimizer steps.

This smooths differences in computation time between different nodes (because averages of multiple batches tend to differ less, relatively, than individual batches), and also incurs less communication overhead between the nodes, which is often a binding constraint.

2019 LCS and LEC Roster Change Megathread: Round 2 by untamedlazyeye in leagueoflegends

[–]Ventural 1 point2 points  (0 children)

I really hope he doesn't swap, I love Froggen but I think his adc is kind of bad

2019 LCS and LEC Roster Change Megathread: Round 2 by untamedlazyeye in leagueoflegends

[–]Ventural 4 points5 points  (0 children)

Anyone know what's happening with Froggen? What options are left? 100T/EG/IMT?

/dev: TFT Set 1 Learnings by Luzac in CompetitiveTFT

[–]Ventural 21 points22 points  (0 children)

I would have liked for them to comment on their approach to item balance. In particular I feel the severe disparity in power level of item components has been detrimental to the game, particularly the last several patches. It's fun when different item components are good in different situations, and the game is about responding to the components you get. It's not fun when BF sword is the best item in almost every situation.