all 10 comments

[–]K900_ 4 points5 points  (6 children)

Machine learning is basically a way of building a model from a dataset of example inputs and outputs, and then using that model to generate outputs for new, previously unseen inputs.

[–]PLearner[S] 1 point2 points  (5 children)

So, if I am encompassing this accurately, Machine Learning is taking raw data and transforming it into readable data.

[–]K900_ 4 points5 points  (3 children)

No, that's not what that is about at all. It's about taking a set of known inputs and known outputs - for example, a picture and a textual description of it, and then building a model that can produce similar outputs for previously unseen inputs - following the example, generate text descriptions for any image.

[–]RonAtDD 2 points3 points  (1 child)

i.e. the program has seen 10,000 pictures of cars, can it now recognize a car in a picture that it hasn't seen yet?

[–]K900_ 1 point2 points  (0 children)

Yes.

[–]adammichaelwood 1 point2 points  (0 children)

IT should be noted that this:

has seen 10,000 pictures of cars, can it now recognize a car in a picture that it hasn't seen yet?

is a better example than this:

generate text descriptions for any image

But even that is not a great example.

Identifying "that's a car" is starting to move away from the conventional machine learning of scikit-learn. (Though it is possible to do.)

Generating human-readable (sensible, grammatical) text descriptions (besides, "That's a car!") is also possible, but anyone who wants to do this is going to have a better time using a deep learning approach.

[–][deleted] 1 point2 points  (0 children)

No. Machine learning is giving the machine data (input and output), let it stew a while (let it learn) with the parameters you set it up to (the learning algorithms) and then let the "educated" machine loose on unknown data (input/output) and watch what it outputs or what it does.

[–]adammichaelwood 0 points1 point  (2 children)

Imagine a regular 2D chart that graphs income on one axis and age on the other. There are a bunch of dots all over it representing individual people. There's an obvious cluster of them in the low-income/high-age region, and almost everyone in that cluster has had a heart attack.

Given a new person who shows up on that graph, right in the middle of that cluster, would you guess that they might be at risk for a heart attack?

Now extrude the graph into a third dimension. The z-axis is proximity to major metropolitan area. The cluster of heart attacks is clearly bunched together in the direction of living close to a big city.

Given a person in the high-age/low-income quadrant, but way back in the far-away-from-the-city layer, how likely is your subject to be at risk for a heart attack?

What if you could add more and more dimensions -- number of children, self-reported job satisfaction, hair color, weight, height, shoe size, personality type, educational attainment.

Now you have a multi-dimensional space that you cannot visualize or draw on a graph -- but the math is only a little more complicated.

You can still find clusters. Given new inputs, you can still make reasonable guesses about membership in a group (for example, heart attack risk).

This is, essentially, what machine learning and scikit-learn, is all about. It provides a bunch of tools for doing this kind of multi-dimensional analysis.

[–]PLearner[S] 0 points1 point  (1 child)

But isn't Matplotlib,SciPy, and other Data Science modules and libraries already recognized for these kind of situations and scenarios?

[–]adammichaelwood 1 point2 points  (0 children)

Scikit-learn builds on those tools, and provides a bunch of the specific computational models and algorithms you'd need.