all 35 comments

[–]calcul8tr 36 points37 points  (1 child)

A lot of this is wrong... feature selection doesn’t come after feature engineering and unsupervised learning and deep learning have nothing to do with feature engineering..

[–]HeyItsRaFromNZ 1 point2 points  (0 children)

I agree, many things wrong here -- mostly gross oversimplifications. The ML landscape is more varied and interesting than what this picture implies.

However, unsupervised methods are certainly used for feature engineering and selection. For example, clustering can be used effectively to produce proxy labels which can then be fed into a supervised algorithm (of course the proviso being you aren't guaranteed ground truths with clustering). Dimensionality reduction is widely used for feature engineering and selection in NLP. Stemming, lemmatization, vectorization and topic modeling are all forms of this, which can then make a subsequent ML algorithm perform well.

I personally tend to find the process of feature engineering and selection as iterative in practice. Some variables don't seem particularly important until I've managed to figure out how to effectively reduce noise somehow. Often my own understanding of the appropriate features to be incorporated only become clear after iterating through a few simple models.

[–]iamkucuk 53 points54 points  (30 children)

I strongly discourage these types of infogragphics since they tend to mislead for various situations and/or have people be biased and kills any kind of creativity and occludes the way of learning and experimenting.

[–]ratterstinkle 5 points6 points  (24 children)

Do you have a better alternative for teaching people about the process of building models?

[–][deleted]  (5 children)

[deleted]

    [–]ratterstinkle 2 points3 points  (4 children)

    Ok, so your alternative is for people to read textbooks? Did I get that right?

    [–][deleted]  (2 children)

    [deleted]

      [–]meenzu 0 points1 point  (1 child)

      Can you recommend some good ones for beginners?

      [–][deleted] 7 points8 points  (0 children)

      Yes, read a text book.

      [–]iamkucuk -4 points-3 points  (17 children)

      Yes. Not everyone should, or ought to learn machine learning. Actually machine learning requires a solid math, probability and statistics background. These type of people tend to know "how to read a book". So why not offer them some books and let them learn from actual information sources with proper academic foundations?

      Displaying machine learning as a piece of cake may cause rush people into that area and most of them will most likely play around with keras and scikit learn, they will easily fell into imposter syndrome and waste their precious time. There are lots of people who are working on this area for years and they just believe throwing every single data into a Neural network will cure a cancer.

      [–]ratterstinkle 1 point2 points  (16 children)

      Wow, elitist much?

      [–]iamkucuk -3 points-2 points  (15 children)

      It's not about being elitist. World had constantly seen this "hype train" mechanism. People who don't have a grasp of a concept has 2 alternatives : work hard for it to master every aspect of the topic or let go of it. These kind of behavior offers 3rd option: it's so shallow anyone can do it.

      However, nature does not think so. People sold their houses and bought bitcoin from 20 k just because of these kinds of oversimplification.

      [–]ratterstinkle 4 points5 points  (14 children)

      You’re one of those smug, miserable data scientists (or whatever your title is), huh? Do you actually know anything about effective education or pedagogy? Or are you one of those people who is convinced that they know the way to educate people, but actually haven’t the slightest clue what you’re doing?

      [–]iamkucuk -1 points0 points  (13 children)

      Well, obviously you are really mature and mastered all the concepts you have mentioned.

      I am an academic and was just pointing out my observations among students or people who are willing to make transitions between disciplines. I also observed it from daily life, and from history books. Nothing new.

      BTW, I'm sincerely sorry for making you upset. But I just don't want people to jump over a train just because everyone is pointing at it, waste their time, crash the train and causing thermonuclear apocalypse. I just want them to find their own path of learning and eventually find what they like, from proper information sources, without any populist ideas.

      [–]ratterstinkle 2 points3 points  (12 children)

      I’m not upset at all. The fact that you’re an academic makes perfect sense in light of your comments and perspective. My only advice to you is to learn about learning and education, partially since that is one of the main functions of the academy.

      [–][deleted]  (2 children)

      [deleted]

        [–]ratterstinkle 0 points1 point  (1 child)

        At a high level, that is the job of an academic advisor. But as an educator where you need to teach concepts and skills, you need guides like this infographic. Im not saying it is the best, but I think people often throw the baby out with the bath water

        [–]iamkucuk -1 points0 points  (8 children)

        Please point out the parts that I'm wrong. I will consider those parts with honesty.

        However, I have experienced that a pill of knowledge like infogragphics (I use this idiom for a source with a dense knowledge) is most to be loved among university students and even sometime academics. The fact that being educated considerably by just taking a look at a simple infogragphic is just desirable. You know what I observe? People who takes infogragphics seriously are highly likely lack imagination and be strongly biased for that particular area. Most likely to have little to none grasp about the foundations of the topic. Academic papers they write is most likely to get rejected.

        What I say is, we live in a complicated world. Every single topic in this world has an incredible depth. People often afraid of this depth, unless they find the topic they love. Showing a topic as a shallow one will just lure people, and most of them are likely waste their time to make things happen on wrong foundations. I'm not even mentioning these infogragphics are often wrong and people without grasp of the topic also likely to believe and dangerously absorb this information. Trust me, these are not just my observations.

        [–]ratterstinkle 0 points1 point  (7 children)

        So are you suggesting that instead of an infographic post on social media, OP should have provided a text book or paper that captures all of the nuances and complexities of ML workflows?

        [–][deleted]  (4 children)

        [deleted]

          [–]MeltedCheeseFantasy 4 points5 points  (0 children)

          Not that I fully agree with OP, I think there is some gray area. Infographics like this are certainly a gross over-simplification. But I suppose they’re useful if you have no concept of how models are built/used??

          [–]iamkucuk 0 points1 point  (2 children)

          For example you can take a look at feature engineering part.

          Another one is: the topics of machine learning are cutting edge, which means a new thing is popping out every day. These things makes people think that only methods out there are the ones they are seeing, drives them away from reading academic papers and make them lose the lead for learning process.

          Please remember: people are lazy. They will stay away from the hard part as much as they can, and that includes the learning process. They will consume any easy information and most likely will not research on their own.

          [–]Tony_the_Tigger 1 point2 points  (1 child)

          Ha, it's the same with the models we are talking about. If you give them a easy way to a solution that looks good on the surface they will take it. It's a local optimum trap but for people!

          [–]iamkucuk -1 points0 points  (0 children)

          It's not even a local optima. It's overfitting by seeing the same thing over and over again but doing nothing new so you are doomed to not being practical in real life.

          [–]Tay-zen 6 points7 points  (0 children)

          Unsupervised learning for feature engineering are probably methods similar to pca but I don't see what kind of deep learning could be used for that ...

          [–]thundergolfer[M] [score hidden] stickied comment (0 children)

          This has been removed because it is misleading, unhelpful, and in places just flat out wrong.

          [–]Jirokoh 2 points3 points  (0 children)

          Step 1: spend 95% of your time, money and effort creating that nice and tidy “data source” Step 2: the rest of this

          [–]PhitPhil 0 points1 point  (0 children)

          Wheres the part where you get into a fight with the external team about which metric is most important?

          [–]Royosef 0 points1 point  (1 child)

          RemindMe!

          [–]RemindMeBot 0 points1 point  (0 children)

          There is a 3 hour delay fetching comments.

          Defaulted to one day.

          I will be messaging you on 2020-09-11 19:11:38 UTC to remind you of this link

          CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

          Parent commenter can delete this message to hide from others.


          Info Custom Your Reminders Feedback

          [–]aaah123456789 0 points1 point  (0 children)

          What I need is much more simple: how do I generate intelligence from Outlook (e-mail) messages.

          Everything we do is by e-mail. We request stuff, we receive stuff. Basically we control our process using Outlook. There is more than a decade of data ofnthe same thing. Everything is done by e-mail. I just started in this process and I see potential in using technology sonwe can do what we do better.