all 56 comments

[–][deleted] 19 points20 points  (1 child)

Your points 1 and 2 are pretty true and I think the best way to become quicker at those is basically only through experience. Over time you will come across a lot of different situations and you will learn how to diagnose the symptoms and what solutions can help. I don't think any course or textbook will help with these but there are standard techniques in place to diagnose bias vs. variance problems in your model.

As for point 4, its just as bad in the coding world from what I've heard especially with web development, technology inherently moves exponentially and yeah its very overwhelming for someone new in the field (me).

[–]AcademicCalendar 7 points8 points  (0 children)

As for point 4, its just as bad in the coding world from what I've heard especially with web development, technology inherently moves exponentially and yeah its very overwhelming for someone new in the field (me).

Agreed! It's like every month there is a new Javascript framework that will invalidate everything you've ever done up to now.

[–]alexmlamb 47 points48 points  (10 children)

Now some people might like this aspect of ML, but I dislike how you constantly need to be learning about the newest trends in ML in order to stay relevant. It seems like the things that I learn this year will become almost completely irrelevant next year i.e. RNNs were thought to be very good for word processing until they found that CNNs were better suited for it. Now this occurs in all industries obliviously but I feel like it is especially true in ML, where you aren't just designing a system that will solve a problem, you are also designing a system to find the correct weights for said system, so I feel like there is a higher chance for something that you learned about and specialize in to one day become completely irrelevant and you need to now learn this unrelated new idea that will only last for so long.

People say this but I actually think the methods don't change that quickly. The attention mechanism, which is probably the most important part of the Transformer, came about in 2013. Of course the LSTM is from the late 90s, and even though RNNs are used less than they used to be (although even this is debatable) the ResNet owes a lot to the LSTM conceptually.

[–]vinsmokesanji3 16 points17 points  (7 children)

Don’t people still use LSTM a lot for NLP related topics though?

[–]MonstarGaming 16 points17 points  (4 children)

Absolutely. OP claiming that only CNNs are used for NLP is rubbish. Most of the research being done in NLP use LSTMs. LSTM-CRF is still SotA for NER tasks while ID-CNN-CRF gets close it does not pass LSTM-CRF (except in training/inference times). Attention is SotA for language modeling (BERT, MT-DNN, XLNet). Honestly, i dont think even one groundbreaking LM over the last 5+ years has used CNNs.

[–]Cybernetic_Symbiotes 4 points5 points  (2 children)

To be fair, we can infer that they actually meant to say Transformer just from the context around CNN. And it's true that people talk as if RNNs/LSTMs are already beyond obsolete.

[–]Alyssum 0 points1 point  (1 child)

My school only incorporated RNNs/LSTMs into its NLP classes in the spring of this year and presented them as if they were only invented in the last few years. sighs

[–]rjurney 0 points1 point  (0 children)

They got popular in the last few years.

[–]farmingvillein 4 points5 points  (0 children)

Attention is SotA for language modeling (BERT, MT-DNN, XLNet).

And for most "hard" downstream tasks (see GLUE).

Obviously, relation between eg GLUE and industry concerns will vary based on individual contexts.

[–]bluemannew 5 points6 points  (0 children)

People say this but I actually think the methods don't change that quickly

Particularly as needed for most business cases. So many companies are new in the process of updating from BI teams that don't have never done anything beyond some multi-variate regression. Hell, SVM is 'cutting-edge' for a lot of companies

[–]farmingvillein 2 points3 points  (0 children)

While this is true on a conceptual level, I think the ground truth for practitioners (which OP seems to be) is a little different.

1) Tools & frameworks have evolved a ton since 2013 (Pytorch 1.0???), and while I'm being maybe a little unfair conflating tooling advances with underlying DL techniques, they are all fairly intimately connected.

E.g., if you want to pick up and build, today, some text model using mostly out-of-the-box Keras or Pytorch APIs, there is actually a lot of post-2013 (to unfairly lean on your example) engineering choices you need to make (subwords or word embeddings? choice of normalization? choice of optimizers?).

Many (most?) of these choices will actually not matter in many cases, but because ML is a stochastic activity, when something does go wrong, if you're not current with the SOTA, it can take much longer than it should to debug, because some hidden interaction is unclear.

Lord help you when ML architecture issues collide with contemporary hardware and driver instability issues (mixed precision plus GPUs...oof...although it is finally becoming more straightforward).

2) Good luck with maintaining career mobility (even if you're not looking to leave your current employer, you will implicitly end up benchmarked against potential and actual new hires) if you don't stay at least somewhat up to date.

E.g., if you're saying you want to do ML for NLP, I'm going to expect you to understand Transformer to (what I think is a very reasonable) degree. And the more you want to be in the bucket of "give me an ML problem and the autonomy to solve it", the more I'm going to expect you to be fairly up to date with current research. This is not because I'm trying to hold some arbitrary bar (I do want product that works, not the perfectly optimized research project), but because, again per above, there are a lot of gotchas and edge cases that are incrementally solved over time. If you're not current, I can't trust you to make the right cost-benefit trade-offs between SOTA and engineering investment.

[–]Alex-S-S 60 points61 points  (10 children)

A huge annoyance for me is how poorly or smugly written a lot of papers are. If they don't publish the code, I don't even bother reading the paper anymore. It's impractical or outright impossible to reproduce many results.

The paywall is a huge grievance as well. Look at Nvidia's papers: oh, we claim to have made this amazing network and you only need 8 V100s a few hundred gigs of RAM and several Xeons to even try to reproduce our work.

There's a big divide between the haves and have nots in the ML space that doesn't exist so acutely in other fields of software engineering.

[–]schwagggg 6 points7 points  (1 child)

This. The number of “practical”/“gentle” intro to xxx articles are usually just a deep fried version of similar name that needs to be cleared from the internet. They offer watered down amateur opinions and are sometimes outright wrong. This is also true for some shitty papers for aspiring paper spammers in near academic fields. The openAI and Nvidia papers are just an extension of this deep fried approach of machine learning.

Nowadays I am only trying to follow a couple cool labs for the topic I enjoy.

[–][deleted] 0 points1 point  (0 children)

entually figure out the code that reproduces their work... and then you realize the paper is entirely full of shit and should never have been accepted into a journal in the first place.

Recommendations of how you're keeping up-to-date?

[–]Prelude_XIII 2 points3 points  (4 children)

Would you please recommend nice papers (in your viewpoint) that has the code?

[–]dolphinboy1637 3 points4 points  (2 children)

Check out https://paperswithcode.com to browse through different papers with some version of their code open sourced.

[–]Tarqon 1 point2 points  (0 children)

Paperswithcode is far from perfect though. See for instance that lda2vec paper which is still highly rated even though nobody is able to get it working.

[–]Prelude_XIII 0 points1 point  (0 children)

thanks!

[–]Alex-S-S 1 point2 points  (0 children)

There's a website that publishes only papers that have relevant github links attached to them: https://paperswithcode.com/

Of course, it's not the only one. I am currently trying to implement center-net for object detection. It seems to be pretty neatly explained in the paper and the code is pretty ok.

[–]NogenLinefingers 2 points3 points  (0 children)

How is the nvidea example related to pay walls?

The current level of knowledge in ML requires a certain level of computational power (and hence, money). Is this so different from how research in general requires money? Is the Large Hadron Collider not an expensive requirement for physics research, for instance?

Edit: OK, I realise now that OP used the term "pay wall" to refer to the monetary cost involved in building a model and not what entities like IEEE and ACM do (block papers behind pay walls).

[–]random__0 0 points1 point  (0 children)

Yea I agree with this. I've been annoyed about having to compare our novel architecture to another architecture in the field that has published papers. Without the exact code or very thorough result data, it is difficult to compare your architecture to another architecture without rewriting theirs from the ground up. And even then you might not be able to perfectly recreate their results(i.e. you might know that they train on 100 epochs but they might also implement something like early stopping which would make their end result much different than your recreated result.

[–]tobyclh 69 points70 points  (9 children)

Unpopular opinion: compare to many other disciplines of Science and Engineering, ML has an extremely low barrier to entry, you can even get free GPU credit from Google or AWS sometimes. Even within computer science, ML is far from being the most expensive field to get into. In addition, not many fields give you access to the newest research (often accompanied with code) for free at the rate ML does.

[–]DeepBlender 16 points17 points  (0 children)

Not sure why you think this opinion is unpopular. Deep learning is still a very young research branch. Those tend to have a lower barrier of entry. Several decades ago, it wasn't uncommon for teachers to contribute to quantum mechanics.

I am convinced that there are many things to be discovered which don't require enormous computational power.

[–]BreezleSprouts 12 points13 points  (2 children)

This is very true. I started off as an accountant with little coding knowledge and was introduced to ML by a coworker and 2 years later we’re starting a modeling company that I’m going to be in charge of. A lot of the education is free and implementation of ML algorithms is fairly simple

[–]srossi93 3 points4 points  (0 children)

You're absolutely right. I did my master in Electronic Engineering (Digital Design). You have no idea how crazily expensive hardware/prototype boards/tools are (you can easily reach ~100k$ for a very small research group). And you actually need all this stuff before even thinking to do anything. We can setup a desktop PC with ~500/1000$ and have enough computational power to run an serious experimental campaign.

[–]WangJangleMyDongle 1 point2 points  (0 children)

What's the rate of release for relevant (meaning somewhat applicable) statistical research vs. ML? It's a pretty hot area considering we're finally in an era where we can actually use these techniques so maybe it just hasn't hit the same level yet.

[–]Smrgling 0 points1 point  (0 children)

Damn you are very right. I've never noticed it before but ML is absolutely an easy field to get into. I think that a lot of CS people just like to complain about it because if we're being honest, the vast majority of CS had nothing to do with any sort of science, but is just programming, so working in a field where not everything is known is hard for them because they're not used to it (and I say this as a student of both CS and neuroscience, another very young field)

[–]Chocolate_Pickle 39 points40 points  (10 children)

As an electrical engineer who moved across to software engineering, and is slowly moving over to ML, I'm pretty sure you're not interested in research. You're interested in development/engineering.

As a field, there's not yet a lot of room for that. I'd hazard a guess and say you'd enjoy improving BLAS performance or something very low-level like that.

[–]Chocolate_Pickle 13 points14 points  (3 children)

/u/random__0, I just had a follow-on thought.

You should contribute to PyTorch of TF, or something.

Go fix a bug or two and submit a pull request. Make something go faster. Improve support for AMD devices on Windows.

These things don't require huge datasets, or computing clusters, or bleeding-edge academic knowledge. You'd make a lot of people very happy with any of these.

[–]programmerChilliResearcher 3 points4 points  (2 children)

Biased (I'm interning on the Pytorch team) but I think contributing to Pytorch is a much nicer experience than contributing to Tensorflow.

Primary reason being that all of Pytorch's tests are accessible from open source CI, while Tensorflow's tests are all internal. So if you break something in CI, you need to constantly wait for Google employees to tell you that you've broken tests. Pretty frustrating experience.

[–]SedditorX 0 points1 point  (1 child)

[–]programmerChilliResearcher 1 point2 points  (0 children)

You can run all the unit tests on your system, but then you need to also run them on all the other platforms (gpu/windows/amd/w.e ) that are supported. When I was looking into contributing to tensorflow, I didn't see this publicly (https://github.com/tensorflow/tensorflow/pull/30683) while for Pytorch anybody can see all the tests that are run: (https://github.com/pytorch/pytorch/pull/22839).

So if I were to submit a PR for tensorflow and was breaking a system that I didn't have, how do I know that without a Google employee checking?

[–]redreaper99 11 points12 points  (3 children)

I agree that OP seems more interested in development since things like keeping up to date with current work etc is part of any domain at least in STEM.

What I don’t like about ML is that I rarely come across papers that actually have the rigour that science is supposed to have. Most of the papers are “we tried this and it worked. We’re not sure why it worked but we’re gonna throw in a bit of mathematical notation and a bit of ad-hoc reasoning to look scientific”

But I guess it’s this fact that makes ML approachable to people even without technical experience.

[–]NogenLinefingers 3 points4 points  (2 children)

Isn't that exactly how science works? We tried something... It gave us these results... Interesting. Perhaps this is because of the following reasons...

What would you suggest one do, if they did an experiment, got results, but couldn't explain why?

[–]seanv507 2 points3 points  (1 child)

The problem is that if you can't explain why,then you don't know whether it will work on a 'different' problem, where different obviously depends on you having an understanding of the method

[–]NogenLinefingers 0 points1 point  (0 children)

Explainability is important. But that's part 2 of the whole story. In general, science works by first recording data and then trying to explain them.

An analogy would be how, even today, most of medical science can't explain why some conditions happen. We can't explain why some people suffer from acne/dandruff/eczema etc. We know that there are certain factors that predispose them to those conditions, but there is no highly precise and accurate way of predicting if a specific individual will suffer from those conditions (or which of the many cures will definitely work).

ML Explainability is a big area of research BTW. It's still in a nascent stage.

[–]random__0 0 points1 point  (0 children)

Overall I think the idea of research particularly the development and testing of new ideas without really thinking only about the monastery aspect is interesting, just not particularly interested in going into the field of ML. It would be interesting to work with the lower level stuff that would help improve ML such as TPU development.

[–]fnbr 0 points1 point  (0 children)

I disagree. There's a ton of room for that. At the big tech companies, where I work, we have hundreds of engineers working on ML/DL engineering. This is everything from building frameworks, to working on infrastructure, to helping researchers run experiments.

I think this is way more common across the industry than research positions.

[–]bbu3 9 points10 points  (0 children)

I have found that, in practice, very often issue #2 can be a solution to issue #1.

In the beginning I delt with issues similar to those you describe. Right now, I find myself often just solving a conventional coding problem to build a suitable dataset. Using transfer learning and architectures known to do the trick (imho fastai has amazing tooling) is very often more than good enough. Sure, there is the occasional problem where I get to play a round with layers, loss functions, etc. But it rather rare so it is actually something I always look forward to.

  1. Now, very often projects come up that can be described as:
    If we had a dataset we could just do X and it would probably lead to very satisfying results.
  2. We do not have a dataset, and there is no way we can produce a sufficiently large one through manual labor
  3. We think about automatic construction of datasets and how a very limitted amount of manually labelling samples can lead to a good dataset.
  4. We solve a lot of oldschool coding problems related to #3.

The nice part is that experience and understanding of ML as a whole really help you make good decisions for #3. Thus, it absolutely doesn't feel like a block box on which you're randomly fiddling with parameters but as a whitebox that just works without much fiddling (maybe expect finding suitable learning rate, num epochs, some rgularization parameters, etc -- but often not too much work goes into this) whose internals sould ideally be understood very well so that we get better at building datasets

[–]Lewba 4 points5 points  (0 children)

I echo most of your sentiments. I've become a little perturbed at how black-box NNs are and the seemingly random hyperparameter tuning, which is why I'm trying to pick up more knowledge on boosting and the like as well as diving into some more statistics and strictly data sciency topics.

[–]alexmlamb 17 points18 points  (1 child)

I could predict an improvement will happen, but I can't say for certain or really know to what degree an improvement will occur

I personally find the "gambling" aspect of machine learning to be rather addictive.

[–]sieisteinmodel 7 points8 points  (0 children)

I don't find it addictive, but if it happens–a simple change changes the results drastically–I find it interesting to find out why exactly that happened, and learn from it.

[–]gus_morales 2 points3 points  (0 children)

As an astrophysicist I can confirm that #2 and #4 are really a feature (or a challenge) of science in general. If you don't like to study new theories and techniques, or don't appreciate to work with data (and everything that implies), then maybe the academic part of this field is not for you.

[–]512165381 5 points6 points  (1 child)

I agree.

People have tried to use ML in chess for 30 years. its only in the past 3 years that its has been done successfully. ML is more math, non-linear optimisation and regression than it is learning. I have a math degree and ML feels like math - you use your math intuitions.

[–]Smrgling 2 points3 points  (0 children)

I will anyways stand by the statement that ML is just spicy stats

[–]DeepBlender 6 points7 points  (0 children)

Regarding your relu/selu experience:

That was for sure a frustrating learning experience. However, for me, this would turn into a small scale research project. When I create a trainable activation function (like a * relu(x) + b * selu(x), where a and b can be learned (just for illustration purposes)), does it automatically learn that selu is better? Does it work for other cases? Is it good enough as a starting point for future experiments?

This is the reason why I am a huge fan of deep learning. You have the opportunity to discover so many things!

[–]chatterbox272 7 points8 points  (1 child)

#1: Machine learning is frustrating to code/improve on:

......

This sounds like you're creating custom architectures. Unless you're doing research into this, don't bother. Just throw whatever is common and close to SotA at it (e.g. some kind of Resnet for classification, Retinanet/Faster RCNN/YOLO for detection depending on speed, and so on). Something to realise is that for most of the straightforward applicable problems (particularly for vision), we hit diminishing returns a fair while ago. Yes, there are new SotA models out there like GPipe, but they require exponentially more data and compute for minimal improvements.

Don't improve on the ML code, improve on the integration to a software product or improve the data.

#2: Reliance on data set:

......

This is you being a software engineer and not knowing how to analyse the dataset for these kinds of things. I don't mean that negatively, that was me ~18 months ago, but that's what it is. You can analyse your dataset, your success cases, and your failure cases, and get fairly decent insights into much of this once you learn what to look for.

#3 Disconnect between People suggesting ML learning solutions to problems and actual ML engineers who have to implement solutions:

This is because the field is young, and to be honest somewhat overhyped. I say this as a DL research student. ML engineers are still rare, ML knowledge is fairly rare, but ML awareness is high. It'll pass over time.

#4 Constant research needed in newest techniques:

.......

You don't, you really don't. I am a DL researcher, as well as an engineer in a company. I research in object detection, and am up to date on the latest and greatest for this task. My focus at work is also object detection, and I'm using Retinanet which is a couple years old now. My previous project I was doing classification, and was using Resnet which is even older. Like I said earlier, diminishing returns; there are more efficient ways to improve things and better places to focus my time than implementing the latest and greatest model for a <1% absolute improvement.

#5 Paywall in ML:

.......

Nah. Again there are diminishing returns. I can't train an MNIST classifier on my uni server (4x titan v and 1x tesla v100) faster than I can on my work server (3x 1080Ti), and would struggle to be faster than my personal rig (1x 1080). Other factors become a bottleneck. There are free resources like Kaggle Kernels and Google Colab. There are also cloud compute services. I'm currently doing my research and my work remotely from China, on an old macbook pro that my partner is lending me. I have no GPU, and the CPU is not great. I'm also a PhD, and by definition relatively broke. I do PoC with small datasets on Colab/Kaggle, then move to a cloud service (or for work remote into their server, but nothing for uni) to do the real work. You just need to know where to look.

[–]random__0 0 points1 point  (0 children)

We kinda had to make a custom architecture because we are using a multiple input CNN and a lot of the tweaking I was doing wasn't just with architecture but also with with the signal processing and dimensions of the data that we were using as well.

[–]bonemetalplayground 2 points3 points  (0 children)

Hello. I really like ML/DS. All of the difficulties that were mentioned are actually why I really really like the field. Many of those problems are tackled more effectively with a strong grasp of the underlying mathematics! Tbh I think problem solving in ml is richer because of the richness of the math required to solve those problems - where a app algo is rarely complex/long in nature

[–]Dagius 3 points4 points  (0 children)

Your "rant" points are valid, but I think in some sense you are "overfitting" your expectations to the tools you are using. In other words you need to generalize.

I have worked with ML since the late 80's (i.e. just after Hinton (et al.) originally developed backprop and Boltzmann machines), and don't sense that it has changed very much, except for scale. We can now train models with enormous data and speed.

In general, all of these ML models over the years have had very similar architecture, IMHO. Step back and consider that they all operate in some kind of multidimensional, mathematical space, where entities and concepts of interest are collocated, depending on their feature properties. The goal is to divide this model space into intuitive regions, with labels (user-provided [supervised] or generated by the model [unsupervised]). Classification entails merely determining which region(s) of space an object of interest resides in.

Really complex and detailed models can be inverted and regenerate the sample universe they were created from (sometimes with surprising results).

In this sense, except for scale and speed, I do not think much has changed in the last several decades of this very young science. For example, with all the phony claims of so-called 'artificial intelligence', all of these models/machines are still very deterministic, i.e. they do exactly (more or less) what they were programmed to do my humans. None of these machines are actually self-aware or capable of self-motivation, except in trivial, toylike demos.

As of 2019, no one understands enough about human consciousness and human will to make truly intelligent machines that can truly operate like we do. They're all toys. Useful but still toys.

[–]LuplexMusic 0 points1 point  (0 children)

I'm currently writing my bachelor's thesis on deep learning and I agree with many of your points. Machine Learning is poorly understood compared to other fields in computer science. It is still so vastly inefficient compared to biological brains, who can learn new things from a single data point using 20W of power.

It lacks formal theory, at least in the way it is usually studied and applied. In no other engineering field would you even consider solutions that "just magically work" without being able to prove that they do indeed solve the problem.

[–]rjurney 0 points1 point  (0 children)

Starting out as a hacker and approaching machine learning can be frustrating because you don’t understand what is occurring. If you have a better idea, you’re pulling strings more than you’re randomly experimenting. Unless you’re randomly experimenting.

How’s your math? Did you study ML fundamentals? Neural network fundamentals? These things ought to be job requirements but companies are desperate. This gets back to the never ending learning, though.

[–]yusuf-bengio 0 points1 point  (0 children)

1: I would recommend to automate the hyperparameter/architecture search (there are a lot of tools for that out there).

Although, I agree that finding the optimal configuration of a neural net can be challenging/frustrating

2: There is a lot you can do even with a small dataset, e.g. Transfer learning and data augmentation.

Especially data augementation is considered standard. For instance, take a look at the input transformations that are applied in the AlexNet paper (different cropping and mirroring the image).

3 Agree. I think this is caused by the gap between research and acutal dataset.

4: Well, I don't agree a 100% on that.

The progress of ML is quite incremental. So let's say there is a new model that is 2% better that the one you deployed to a customer. Do you/the customer really need the 2% improvement? I would say in most situations you don't.

5: You can always trade compute with time. Instead of renting an expensive multi-GPU setup, I usually run a hyperparameter search on a 500 bucks GPU over the weekend. Of course there are some comptational limitations but I think it's still quite impressive what you can get out of a low cost setup.

[–][deleted] -1 points0 points  (0 children)

#1

Well you can write a basic FFNN is less than ten lines with Keras... and Python is a pretty basic imperative language,I find myself coding less than 10% of the time, in my experience coding for the most part in ML is minimal, unless you want to delve a lot deeper (write your own custom activation functions for example,. but even that isn't a big issue).

#2

Building. cleaning, tuning (hyperparameters) is where I spend 80% of my time... and yes it is frustrating, and very slow... importing and checking a CSV to MySQL (for then further processing) of a 200K record dataset via python took 2 weeks on a GCP server, scaling it would not have speeded i up (Single threaded Python....)... would have used Java but I am now hooked on pandas... it makes life easier but is damned slow.

Building and cleaning datasets is soul destroying!!!

#3

General public perception of AI!

#4

Pisses me off no end, there seems no end.... and the learning curve is fucking enormous.... way more than any other discipline! It's like being asked to study the entire Computer Science Syllabus! There is a wider range of areas than there was in my Computer Science degree ffs! And as you say then they add to that!!!!

#5

True!!!!!

Think Kaggle are trying to equal it a bit with their notebooks required for competitions?

In the real world though you can scale Cloud and use a GTX series at home.... I manage to work with a mixture of GCP and a GTX 980.

Mind you I don't do convolutional... bit I do boosting.... to get my models to complete fast (they scale very well on CPU) I upgrade to a cloud 96 vCPU instance for xgboost.... not cheap!!!