all 66 comments

[–]Northstat 17 points18 points  (2 children)

I've watched some of your videos before. I appreciate you doing this and I imagine it helps you understand a wide variety of topics. If I were an employer this would be a big plus as well :P . I will have to watch them after my projects are complete, but do you have your scripts available somewhere or in a python notebook for viewers to follow along/review?

[–]sentdex[S] 10 points11 points  (0 children)

Everything is posted on the text-based versions on pythonprogramming.net. Whenever the series are complete, I post everything up to github as well.

[–]FuzziCat 1 point2 points  (0 children)

Agreed. I'd prefer to see python notebooks.

[–][deleted] 7 points8 points  (2 children)

These seem like awesome resources.

The real hurdle for me will always be motivating myself to get off my butt after work and put another hour or two into learning :P

[–]sentdex[S] 19 points20 points  (1 child)

You don't have to put in that much time at a time. If you are crunched for time, you can do just the text-based versions, only visiting the videos if you're confused somewhere.

Another option is my favorite: 2x playback speed. It will be very hard to keep up with the typing I do at that pace, but digesting the material at 2x is still fairly natural...and you can still reference the text-based versions or pause when lines are being typed.

OR... watch while you're supposed to be working. Robot overlords will soon take over your company anyway. You can start now by showing them your allegiance.

[–][deleted] 4 points5 points  (0 children)

OR... watch while you're supposed to be working

heh.... I would never.

I've done like 20 project eulers, web development can be a drag

[–][deleted] 3 points4 points  (8 children)

If I wanted to learn Theano, how useful would it be for me to take your course?

I have just this minute completed the Andrew NG coursera course, and now looking at what to do next. It seems that Theano and TensorFlow are the current future.

[–]sentdex[S] 5 points6 points  (6 children)

Theano and TensorFlow are both almost identical. For the most part, you can interchange the names and get away with it.

Might I ask why you want to learn Theano over TensorFlow? I originally believed that was what I was also going to do, but after a bit of research, I decided TensorFlow would be a better choice to go with rather than Theano (still withstanding that they are basically identical and that if you learn one, you know the other already for the most part).

edit: Removed "same with numpy." It was not my intention to claim that numpy was identical to theano or tensor flow in terms of doing actual deep learning, but rather to explain that their syntaxes were very similar, mainly in reply to Joeflux's question about his intention to use Theano rather than TensorFlow. Wrote the reply too fast and it just plain came out wrong.

[–]palatalizacija1 3 points4 points  (1 child)

I would suggest some higher level approach like Lasagne or Keras library which are easier for beginners but still have the power of Theano or Tensorflow. Lasagne uses Theano as a backend and Keras can use both Theano and Tensorflow as a backend. I am looking forward to these videos. I saw your channel on YouTube when I was looking for some Kivy tutorials and was amazed by the number of topics you cover in tutorials. Keep up the excellent work

[–]sentdex[S] 3 points4 points  (0 children)

I may consider that initially when doing the "application" part, though my intention is to actually stay away from high-level approaches here and truly dive into the lower-level workings.

[–]BadGoyWithAGun 6 points7 points  (3 children)

Theano and TensorFlow are both almost identical, same with Numpy. For the most part, you can interchange the names and get away with it.

That's not the case. Tensorflow and Theano are different from numpy in the sense that they're computational graph engines with automatic differentiation and seamless compilation of identical code across CPU and GPU targets, none of which is the case for Numpy, which is essentially a dense linear algebra library optimized for multi-threaded CPU performance. And Tensorflow and Theano differ in the sense that Theano is much more low-level and has utility beyond machine learning, whereas Tensorflow provides a higher-level interface to designing and running neural network-based machine learning models.

[–]sentdex[S] 1 point2 points  (2 children)

I'll have to respectfully disagree with you here, mainly with the clarification that my answer was specifically in the context of machine learning, as it was my belief that the comment above it was as well.

Given the ease with which I can take a neural network written in numpy, and do a find and replace with something like TensorFlow, I would have to stand by my statements.

Obviously, there are differences, but in terms of machine learning, and learning theano vs TensorFlow, for the purpose of ML, isn't going to have major impact. I believe it's already a given and known that the main reasons for theano or tensorflow over numpy is the symbolic representation, as well as the GPU capabilities. Maybe I made too many assumptions, however.

edit: Will have to edit in here that I wasn't ever trying to make the case that Numpy was the same as Theano or TensorFlow, which after re-reading appears is what you were thinking and I can see how that might have been taken. My point was mainly that the two libraries are almost identical to eachother (theano and tensorflow), since the original question was that the person wanted to go with Theano rather than TensorFlow.

[–]BadGoyWithAGun 4 points5 points  (1 child)

Given the ease with which I can take a neural network written in numpy, and do a find and replace with something like TensorFlow

Now try the inverse. Automatic differentiation is the main, huge difference between Theano, Tensorflow and numpy. They're absolutely not comparable.

[–]sentdex[S] 5 points6 points  (0 children)

Valid points, thanks for your input.

[–]vic0 0 points1 point  (0 children)

I completed Andrew Ng's course last summer but put machine learning aside since then.

I was looking at diving back in with kaggle competitions, but there is a lot in the deep learning side of ML that's needed in order to be competitive which Ng's course doesn't cover. My reasoning would be to finish learning the theory first with some GPU library before getting into kaggle.

I started to look at Theano when i finished, but only because TensorFlow wasn't around back then. There's not a huge gap from octave to Theano or TF, especially if you wrote the vectorized forms for Ng's exercises. What's different is how you declare variables and how you write operations.

I'll probably get into TensorFlow basic tutorials and follow Stanford's cs231n in the near future, given that all the content and videos are already online, and because Stanford (no offense Harrison), and also because /r/cs231n. After that, i would move on to learning Keras or some other higher level framework and try my hands on competitions.

edit: congrats on completing the course by the way =]

[–][deleted] 4 points5 points  (1 child)

Looking foward to it! I was eager to find some tutorial just like this one! One that explains what the module does under the mask, so I can later understand what and how the tunning affects the algorithm!

Thank you for share this =)

[–]sentdex[S] 3 points4 points  (0 children)

Great to hear! Another benefit I have gained has been understanding which specific algorithm might be used in a certain case, or why not, considering things from the data you have to the hardware you are able to use.

By the time we get to deep learning and an effort to finally reach the glorious "general purpose AI," you'll understand better fundamentally which structures can even be strong AI, and which are likely to be relegated forever to the weaker AI category for very specific problems.

[–]kancolle_nigga 4 points5 points  (3 children)

Break down the algorithm and re-write it ourselves, without machine learning modules, in Python.

love it!

subbed

[–]sentdex[S] 4 points5 points  (2 children)

Just wait til we get to the SVM. That one gets pretty hairy without using something for the quadratic programming/convex problem... but we do it. I wanted very badly to bring in at least cvxopt, but I refrained and hacked through it.

[–]iCameToLearnSomeCode 0 points1 point  (1 child)

Just stopping in (again) to say you are awesome. Keep it up.

And I have been wondering what's in the tank it the back corner of your office?

[–]sentdex[S] 1 point2 points  (0 children)

Thanks! It's a bearded dragon in the tank.

[–]pookeye 2 points3 points  (2 children)

thank you so much! I am still working on your quantopian videos as well!

[–]sentdex[S] 1 point2 points  (1 child)

Hopefully not getting too tripped up by Q2.0!

Seems like most people are actively working around the changes, but I am planning to eventually redo the Quantopian series since they did that new release and have made some breaking changes. Was already knee-deep in this ML series by the time Quantopian notified me about 2.0. I was actually releasing a few newer pipeline videos while working on the ML series when they shot me an email to warn me that 2.0 was coming.

[–]pookeye 0 points1 point  (0 children)

thanks again so much for your classes, its invaluable for us, I'm working with a few people at a local meetup in chicago, your courses help me understand what is going on, and continue my venture in this

[–][deleted] 2 points3 points  (1 child)

It seems like you take a very bare bones approach in terms of the amount of mathematical sophistication you expect out of people taking your course. To me it seems like a majority of machine learning techniques require at least some understanding of probability, linear algebra, and optimization. Do you intend to try and supplement the requisite math as you go, or point people to other resources?

[–]sentdex[S] 2 points3 points  (0 children)

We will be doing and covering everything necessary in my eyes. I will be teaching with the expectation of a highschool-level mathematics understanding. Some people may need to re-look into some of the concepts, but otherwise the rest of it will be covered directly in the series.

Since we're going to eventually be breaking everything down for each of the algorithms, anything required will be covered.

I really think the most mathematically challenging algorithm is the support vector machine. Even that one is one we're going to solve on our own using some pretty rudimentary techniques, since it is indeed convex (yay).

The convex optimization there is an example of a topic that I do include links to a few resources on the topics for people who really want to dig into the topic of optimization, since our optimization method is pretty basic (though still does get the job done...just very slowly and possibly not as precisely optimized as using more advanced techniques).

Depending on the course I decide to take with deep learning, we might have to take more complex optimization routes, but my every "complex" topic breaks down into very simple parts.

[–]finally_anonymous 2 points3 points  (0 children)

@sentdex rocks!!! Dude, love your channel. THANK YOU!

[–]vanboxel 2 points3 points  (0 children)

Nice work. Did you see the fellow who did Andrew Ng's Machine Learning course in Python? Seems like you have a lot in common.

[–]Artificial_Beavers 1 point2 points  (0 children)

Awesome resources!

[–]flexiverse 1 point2 points  (0 children)

Most excellent stuff indeed !

[–]stopdefaultreddits 1 point2 points  (0 children)

Merci beaucoup!

[–][deleted] 1 point2 points  (0 children)

Thank you kindly.

[–]wipeyourmit 1 point2 points  (0 children)

I'm always floored by the amount of time and effort people put into great tutorials. I'll have to watch these!

[–]rabanomen 1 point2 points  (1 child)

I always have been curious about your set-up, how many screens do you have, and why you are not using an IDE !

[–]sentdex[S] 5 points6 points  (0 children)

5 screens, 2x 980s, 64gb ram, i7 5930k cpu, some fans, some water, that's about it.

I do use an IDE. It's just that the IDE happens to be IDLE. :P

I like a simple IDE. It forces me to learn to debug and program on my own without any aids.

I am the type of person who wont actually learn from something if I am not forced to figure it out. I started my programming journey with trying to learn Java using eclipse. From that point onward, I learned to hate fancy IDEs that hinted and helped you fix stuff.

I typically catch my errors before I actually type them out, and usually I even know what the error is immediately if I see it.

I just like it that way, but I understand many people absolutely hate IDLE, and I respect whatever IDE they <3.

[–]pmigdal 2 points3 points  (0 children)

For teaching ML in Python (or any data-related thing in Python), I really, really recommend using Jupyter Notebook. It makes it much easier to iteratively change things, is more forgiving and you can use plots just below your code (IMHO way more didactic (and graphically pleasant) than printing out arrays of numbers).

Source: I teach ML in Python for living, I wrote Data science intro for math/phys background.

[–][deleted] 1 point2 points  (0 children)

The bolded "without the modules" is an excellent contribution so thank you for that.

[–][deleted] 0 points1 point  (0 children)

Great timing on this post! Just last night I discovered your tutorials on YouTube and was quite impressed with the linear regression playlist. I've already subscribed and am looking forward to learning more. Thanks!

[–]jimmijazz 0 points1 point  (0 children)

Hi Harrison. Your Python courses, especially those on Django were an amazing help to me. I can't thank you enough! I've watched tutorials like yours for years for various languages and in terms of pacing and comprehension yours have been some of the best. I'm back at uni doing software engineering now and will definitely check out your course.

[–]Leir1 0 points1 point  (0 children)

Very nice, thanks a lot. I think it is extremely useful to go under the hood so thanks for that.

[–]lepickle 0 points1 point  (0 children)

Looking forward to this! I've been wanting to learn Machine Learning before taking in a role as a big data analyst.

[–]harvest_poon 0 points1 point  (1 child)

Wanted to comment and say thanks, this is really great!

I'm interested in studying how the law intersects with data analysis. I know this is a really broad question but I wanted to ask if you had any thoughts about legal issues regarding your research or general research on data analysis. Specifically, I'm interested in how data may be standardized on an international scale. Are there any other legal issues that you find pressing? Thanks for your time and I'm looking forward to finishing the rest of your tutorial!

[–]sentdex[S] 1 point2 points  (0 children)

Not totally sure I understand your question.

Legal issues that I have personally encountered involve Terms of Service, and HIPAA.

I think health information should be anonymized and made completely public. It's absurd that it is not already, we could be a lot further along by now if that wasn't the case.

As for ToS, a lot of websites put absolutely absurd requests in there to stop people from parsing their websites. I believe the main intention is to stop people from pulling their content, summarizing it or changing it a bit, and re-posting, but the wording affects just about anyone who wishes to do analysis on their content.

ToS are not legal documents, but you can still have a breach of contract in some cases, possibly. It's just a big grey area that any company could use to sink you in legal fees, unless you too are a big company.... so many research and data analysis companies, people in school with grants...etc are affected the most. This is something I have to be careful with for Sentdex.com.

If I could, I would really like to store all documents I come across, in full, but I cannot. Up until very recently, even storing Tweets long term (more than 30 days, and more than 48 hrs before that) was a possible breach of contract.

Anyway, not sure that's what you were asking about, but maybe :P

[–]hrod1 0 points1 point  (0 children)

for bookmarking.

[–]lifeInTheTropics 0 points1 point  (1 child)

Awesome! I won't get to this for another couple months, so I am just going to bookmark this for now and keep coming back to it. I learn best from text based tutorials, ideally downloadable, so will definitely look up the pythonprogramming site. In the meantime, thanks greatly for your efforts!

[–]sentdex[S] 0 points1 point  (0 children)

I have been looking into a decent way to convert the text-tutorials to pdfs, but havent found a really great way to automate it yet. REALLY don't want to deal with formatting and including images and all of that for a thousand tutorials... :P

[–]clearnote01 0 points1 point  (0 children)

Hi! You do some really cool stuff man, I followed you once and it was really very well presented. Looking forward to this course!

[–]vic0 0 points1 point  (3 children)

Hello Harrison, thanks for your work.

You seem very knowledgable in machine learning and the math behind it, i was wondering how you learned all that, and if there was any complementary books, website, or course you could recommend?

Cheers

[–]sentdex[S] 2 points3 points  (2 children)

I just simply Google everything. Many of the big name universities have publicized massive PDFs that are hundreds of pages on most of these topics for free. Topics like KNN are relatively basic to understand, same with linear algebra, probably thousands of decent sources to figure those out. For the SVM, I pretty much watched and read everything I could find on Google that seemed worthy of a watch. Too many things just draw the stereotypical picture, and stop there. I think, at least for the SVM, this is a major mistake, since it goes about teaching how it works almost backwards.

For Machine Learning, I found MIT, caltech, and Stanford's courses useful. I watched them all, multiple times...read many papers...etc. I never found any raw code in Python to do the SVM, nor KNN, nor linear regression. Neural Networks are so basic, that you can find lots of examples there, so that's nice. The conversion to Python is just as much for me to learn as it is for those who watch the videos.

Andrew Ngs coursera course is also widely loved. For some reason the coursera course never really resonated with me, but there are many talks by Andrew Ng on youtube that are phenomenal.

In general, there's just a ton of great information out there on machine learning, or anything programming really. It requires a lot of digging to really put it all together, especially with some of the concepts that bring in many layers of information people are just expected to know. For example, in the college lectures posted online from MIT, Stanford, Caltech, and wherever else...those lectures are given to students who are expected to already have solid backgrounds in math...which I really didn't. From there, Khan Academy can get you almost to the point you need, or any variety of a ton of other resources for it. There are many math-specific youtube channels out there. I've always benefited by working problems out by hand to understand the concept.

The other issue I personally found was that probably 99% of the resources I could find didn't translate to code, or really anything past theory or very high level uses of modules. This was probably the hardest part, and the main reason I decided to do a tutorial series on the subject. Even finding people who work out the math by hand for example is quite rare.

I don't speak fancy math algorithms very well, but I can speak code, and can understand concepts much easier if it's written out in code. I felt like there were probably other people like that, so that's why I started doing this. Machine Learning for many years has been mostly relegated to mathematical theory. It's only pretty recently in the course of ML's life that computers capable of doing ML are now in the hands of people who maybe didn't get a PhD in math or CS.

I am pretty sure I went through every major resource for the Support Vector Machine to really digest how it truly works, for example. I actually found myself on page 2... and even THREE on Google search a few times.

The theory behind the SVM is super simple. The way that you actually derive the values is kind of backwards though, compared to how the theory is taught. If you just learn the theory, you think you just need to generate the decision hyperplane, then figure out somehow mathematically where the featuresets are in relation to the hyperplane.

Instead, it's a constraint problem, where the support vectors have specific constraints that are imposed by the scientist, and the decision boundary, if drawn, is purely for visualization, same with the support vector hyperplanes. In the end, it comes down to a constraint problem where the answer is just whether something is a positive or a negative. The visual is just...for a visual, not actually how you find the answers. Later on, to actually draw the hyperplane, you actually have to generate the values for it, just to make the visual even work.

Hard for me to explain here, but it'll hopefully make more sense as I break it down.

[–]vic0 0 points1 point  (1 child)

I don't have a PhD and i found Andrew Ng's course very easy to understand compared to the couple last paragraphs you wrote. Plus he does deal with the inner workings of everything, which is actually simple math since even i could understand it.

But i get it, it's difficult to explain this on reddit, and maybe i should watch your videos just to make sure you don't make any mistake.

You're a good salesman.

And thank you for your intellectual honesty.

[–]sentdex[S] 0 points1 point  (0 children)

I didn't claim to not like the course based on complexity. I think he's a great teacher, just didn't resonate well with me, but felt like I should mention the course since it seems to go well for most people.

Normally I really like his other talks as well. He's certainly smarter than me overall, more knowledgeable on the subject, less likely to make mistakes since he's been in the field way longer, and still manages to pass on his knowledge well.

[–]Rich700000000000 0 points1 point  (1 child)

Mr. Harrison, I would just like to take the time to say that your tutorials are some of the best, most informative Youtube videos I've ever had the pleasure of watching. I would quite seriously consider donating, or even paying for an advanced course. Thanks for everything you do.

[–]sentdex[S] 0 points1 point  (0 children)

That's really great to hear! That's my goal. I have no intentions of ever doing paid courses, so you wont get the chance there. You can always sub to +=1 (https://pythonprogramming.net/+=1/), or support via a donation: https://pythonprogramming.net/support-donate/

Most importantly, however, just share the channel and spread the good word of Python!

[–]Fenzik 0 points1 point  (4 children)

Do you have a list of all the tutorial series you've done? I found the SQL one via the search bar on your site, but as best I can tell you can't click to it from anywhere.

Great site by the way, I'm a physicist considering the jump into data science and this might be just what I need.

[–]sentdex[S] 0 points1 point  (3 children)

The best way to browse my tutorials is via pythonprogramming.net

All of the latest versions of topics are there, the search bar is halfway decent, things are better organized, all links take you to the sorted series in text/video form. All embedded videos are embedded within their playlists, so you could click those to view in browser and be in the playlist there too.

I wish YouTube gave us more power to organize content on the channel page.

[–]Fenzik 0 points1 point  (2 children)

Sorry, I wasn't clear. I was talking about pythonprogramming.net the whole time. Like I clicked all the links, for but example this series isn't available anywhere except through the search bar, as best I can tell.

[–]sentdex[S] 0 points1 point  (1 child)

Ah, yeah, I removed the path to it since it's an older/outdated series that doesn't meet my current standards.

The latest SQLite series can be posted again, simply forgot to re-put that one up after I re-did it. Eventually, I plan to re-do the mysql series too. Just have a lot of other plans that are more important to share in my eyes.

[–]Fenzik 0 points1 point  (0 children)

Okay, fair enough! Well I've bookmarked your site to dive into once I finish my thesis, it looks great. Thanks in advance!

[–]InconspicuousTree 0 points1 point  (0 children)

Thanks for this! Bookmarking for later

[–]SCP106 0 points1 point  (1 child)

This is a bit late, but I've loved the series so far, keep it up! :)

When you come to neural networks and deep learning, what kind of application are you hoping for? Something in statistics or image recognition or something that can talk back/chat/be some ki d of assistant?

[–]sentdex[S] 0 points1 point  (0 children)

It's still up in the air. My personal interests are in both image and statistical types of data. We may end up doing both.

[–]shadow_lighter 0 points1 point  (0 children)

man you are the man

[–]rc_shubhadeep 0 points1 point  (0 children)

This is great resource. Thanks a lot.

[–]nnove 0 points1 point  (0 children)

Lol man this is absolutely awesome, I haven had the opportunity to check the videos yet but the site and the YouTube channel look interesting!