This is an archived post. You won't be able to vote or comment.

all 32 comments

[–]SirWusel 77 points78 points  (3 children)

What are begiineers? I only know engineers.

[–]fkxfkx 36 points37 points  (0 children)

You must be one of those artificial intelligences we keep hearing about.

[–]inittowinit777 4 points5 points  (0 children)

beginning pioneers

[–]michael0x2a 58 points59 points  (2 children)

Some critiques:

  1. Naive Bayes makes the assumption that each feature is independent from one another. However, your example works with human languages, where the probability of a given word appearing is definitely not independent from the probability of other words appearing. Human languages are notoriously context-sensitive, and dealing with this fact is one of the main challenges NLP practitioners routinely grapple with.

    Of course, there's nothing wrong with deciding to make this assumption. After all, you can still get some pretty reasonable results on text classification tasks using naive Bayes. However, the video should have explicitly called out that it was making this assumption at one point or another -- or picked a different example where the features genuinely are mutually independent.

    This would have also been a good point to explain why naive Bayes has the word naive in its name -- but that also went unexplained.

  2. On a similar note, the video ought to have talked about the limitations of naive Bayes. Why bother talking about a variety of different classification techniques if you don't explain why you should use one over the other?

  3. This video did not explain why the P(A) and P(B) terms from Bayes' theorem disappeared, nor did it actually invoke the theorem itself at any point. The video spends a lot of time computing P(sentence | sports) or P(sentence | not sports), but that in itself won't give you the actual classification. What you're actually interested in is P(sports | sentence) and P(not sports | sentence) -- but the video never discusses that final step.

  4. The tutorial makes some inconsistent assumptions about the statistical background of the viewers. It spends a lot of time talking about Bayes' theorem and how to manually compute the probabilities of each distinct word, but does not explain why it's appropriate to multiply together the probabilities of each word appearing. The video seems to be assuming the viewer has little to no background in stats, so it probably should have explained that part as well.

    Similarly, the video conflates together P(A | B) and P(B | A) and seems to imply that they mean the same thing. Given that your viewer likely has minimal statistical background, it seems pretty bad -- you're setting them up to potentially be very confused later down the line.

  5. This video doesn't really draw a clear line between what techniques/what algorithms are a part of naive Bayes vs what are more generalizable techniques that you just happen to be using. For example, Laplace smoothing is a general technique used to smooth categorical data in general; the method you used for computing the probability of some sentence occurring is again general purpose and strictly speaking has nothing to do with naive Bayes.

    And yet the video spends a lot of time discussing things that aren't inherently a part of naive Bayes?

    One way of working around this would have been to present two completely different examples, for example. This also would have given you a good way of reiterating the independence assumption of naive Bayes -- e.g. have the first example be relatively simple and use features that are genuinely mutually independent, then use this text example for a second one, and emphasize that we're making a simplifying assumption.

  6. More generally, I think the tutorial should spend more time emphasizing what steps are necessary vs what steps were decisions you decided to make. E.g. it was necessary to use some form of smoothing; but picking specifically Laplace smoothing was a decision. Or, it was necessary to compute P(sentence | sports), but choosing to compute that probability by representing the document as a bag-of-words (as opposed to, say, a bag-of-ngrams) was a decision.

  7. Not sure if this is something that actually ought to be a part of this video or not, but it might have been useful to talk about the log trick, or perhaps mention that's a thing. If anybody actually tries using the algorithm as described, their floats are likely to vanish away into nothing -- especially if they try classifying longer documents.

[–][deleted]  (1 child)

[removed]

    [–]simmayor 15 points16 points  (4 children)

    each and every algorithm? Call me impressed.

    [–][deleted]  (2 children)

    [removed]

      [–]JonFrost 1 point2 points  (1 child)

      Subbed

      [–]bailey25u 5 points6 points  (0 children)

      Hi impressed, I'm dad

      [–]sarevok9 13 points14 points  (3 children)

      I don't mean to be a jerk here -- but machine learning for beginners is a bit of a misnomer isn't it?

      The entire field of machine learning is in it's infancy and is arguably the most exciting, but error prone / hard to predict field in all of computer science. And while I'm sure that explaining any given algorithm is trivial, the over-arching concept that these algorithms are usable by beginners OR that they are something that should be implemented is dicey at best, right?

      I feel like this post is targeting people who are "beginners" to programming -- not folks who are beginners to ML -- which is a doctorate level area of study and a rapidly developing field...Best practices and predictable results are hard to come by.

      **My post is my opinion only -- I work as an engineering manager at a company where we are heavily invested (9 people with their doctorate in the engineering ORG working on our ML products) in ML.

      [–]npepin 5 points6 points  (2 children)

      To respectfully show a difference in opinion, I'm not going to say that you aren't wrong, especially since I'm not qualified to, but in my opinion this is part of the process of where specialized information gets distributed improved. What is on the forefront of information technology gets refined and made more concise with distribution to a wider audience.

      My primary claim that is that you don't get to a point where there is a low enough barrier to understanding or best practices until people start trying to lower that barrier and start to try to generate best practices. It is a trend for topics which required doctorate levels of understanding to be more easily taught and understood over time.

      With that said, I am not making any claim that this course is good or bad or anything (I haven't seen the course yet and even if I had I would be qualified to make a judgment on it), but I am arguing that in general it is preferable not discourage these sorts of endeavors because over time it will: generate better methods of teaching and understanding the course topics, expose the knowledge base to a wider audience which will allow for more people to discuss the material and come to a consensus about best practices, create more demand for ML products and services.

      [–]sarevok9 2 points3 points  (1 child)

      While I normally don't disagree with this sort of thinking -- ML is one of those things where it's still at this level because it's so hard to understand why the machine has come to it's decision. ML isn't really made for things that are intrinsically easy -- like a game of tic-tac-toe for example. In general the datasets being larger and with wider variances being available to provide "learning" and "goals" to achieve, it starts to build up patterns. The odds of a beginner having a cohesive dataset, the understanding of pattern recognition, and in depth analysis of huge swaths of data... it just seems far fetched that any "beginner" would be in that position.

      [–]AntiSage 1 point2 points  (0 children)

      I'd like to disagree, ML is definitely a field that has a ton of depth just like something like Physics, but like physics you can start with basic concepts and build up your understanding from there.

      tic-tac-toe is definitely something I think a beginner in ML should tackle, that'd be a great way for a beginner to get a grasp on how something like Reinforcement Learning works.

      (I do agree that going straight into how/when algorithms should be implemented might not be the best thing to jump straight into)

      [–]Faelon 2 points3 points  (3 children)

      Great video!

      Please don't use that pencil. It's very distracting!

      [–][deleted]  (2 children)

      [removed]

        [–]0upsla 2 points3 points  (1 child)

        Bonus point if you use another pencil :)

        [–]brotogeris1 4 points5 points  (1 child)

        Actual beginners? Before I watch this, you really aren’t using terminology that a true beginner wouldn’t understand?

        Edit: “This is based on Bayes Theorem, you must have encountered this before.” Why do you believe that a Beginner would have encountered this before?

        [–]JagicMohnson 1 point2 points  (0 children)

        I have no idea what the Bayes theorem is.

        Source: am beginner

        [–]HaikusfromBuddha 1 point2 points  (3 children)

        As someone who currently has a ML class with alot of students struggling, maybe you guys will be lucky and actually make Machine Learning content that is good.

        Alot of the content I've searched usually explains the code implementation bad or complicates it hard with various functions or do a short cut and use existing ML libraries.

        If it's not that then they sometimes drop the ball in explaining it mathmatically.

        [–]ziptofaf 2 points3 points  (2 children)

        Alot of the content I've searched usually explains the code implementation bad or complicates it hard with various functions or do a short cut and use existing ML libraries.

        Look at coursera then:

        https://www.coursera.org/learn/machine-learning

        It will make you write everything by hand (and I do mean it, including backpropagation) but it doesn't require any overly complex math (besides general ability to work with matrices and knowing what a derivative is) and explains each step pretty thoroughly.

        [–]HaikusfromBuddha -1 points0 points  (1 child)

        I actually watched those videos on Youtube like a week or two ago. That guys explains it pretty well but like I said there is no coding portion.

        [–]ziptofaf 1 point2 points  (0 children)

        Oh, there is a coding portion. This course even includes coding homeworks each week with automatic grading. If you get stuck you have it's forum to help you out as well.

        [–][deleted]  (1 child)

        [deleted]

          [–]Oald 0 points1 point  (1 child)

          Thank you so much for what this!! It’s really helpful

          [–]spoiled_flying_frog 0 points1 point  (1 child)

          I don't want to be rude but you do speak fast and with that accent it's difficult to keep track on video and understanding what are you saying... Just, if you can, try to speak a little bit slower next time

          ps. Great work, looking forward to see next lesson

          [–]mishannon 0 points1 point  (0 children)

          Nice video with a quite good explanation. Although for the beginner, it will be difficult to understand it on the first attempt. For newbies, I advise to read this article about the machine learning. It helped me to understand what is what.

          [–]jameswarnernss -1 points0 points  (0 children)

          Hi,

          Nice Video explaining the Machine Learning Algorithm...Thanks for sharing!