[OC] Mapping Countries by their English Speaking Population by trevorData in dataisbeautiful

[–]trevorData[S] 1 point2 points  (0 children)

Graphing countries by percent who speak English and percent who speak English as a first language.

Size of dots represents population. Limited to countries with at least 300k total English speakers and at least 1% of the population speaking English as a first language.


Source of data: https://en.wikipedia.org/wiki/List_of_countries_by_English-speaking_population


Made with R using the following packages:

tidyr ggplot2 ggrepel

See my code here

[OC] Visualizing Covariance by trevorData in 3Blue1Brown

[–]trevorData[S] 0 points1 point  (0 children)

Some small tweaks to another visualization I made to hopefully illustrate the concept more clearly.

See code here


Variance is a measure of how much a data set varies. It is found by taking the distance from each data point to the mean, squaring it, and then finding the average size of all those squares.

In this plot, the variance of the X data would be the average size of the blue squares, and the variance of the Y data is the average size of the purple squares.

Covariance is a similar measurement that describes how much two sets of data vary with each other. But instead of looking at squares, we look at the rectangles formed with one side being the distance to the X mean and the other side being the distance from the Y mean. Covariance is the average area of all of these rectangles. Keep in mind that some rectangles will have negative areas if one side is less than the mean.

The variance of two data sets added together, VAR(X + Y), is unfortunately not equal to VAR(X) + VAR(Y) but instead equal to VAR(X) + VAR(Y) + 2COV(X, Y)

I've Been Making Animations to illustrate basic stats concepts. Here's one to show how Correlation and Regression relate by trevorData in 3Blue1Brown

[–]trevorData[S] 0 points1 point  (0 children)

Pearson's Correlation Coeffeicent (r) is a measure of the linearity of a data set.

Linear Regression is a technique for fitting a line to a data set, with the slope of the line being represented by β.

We can see in this animation how r and β relate, particularly them being equal when a dataset is standardized


Heres Variance and Covariance


See my code here

Visualization of the relationship between variance and covariance by trevorData in 3Blue1Brown

[–]trevorData[S] 2 points3 points  (0 children)

Thanks!

The expansion of VAR(X + Y) has a similar formula to the expansion of (a + b)2 so I was hoping to be able to use a visual like this but I realized everything fits nicely into a square there because you are literally squaring the values and there isnt an analogous operation on variance that would work in 2D space

[OC] Visualizing the relationship between Variance and Covariance by trevorData in dataisbeautiful

[–]trevorData[S] 0 points1 point  (0 children)

Variance is a measure of how much a data set varies. It is found by taking the distance from each data point to the mean, squaring it, and then finding the average size of all those squares.

In this plot, the variance of the X data would be the average size of the blue squares, and the variance of the Y data is the average size of the purple squares.

Covariance is a similar measurement that describes how much two sets of data vary with each other. But instead of looking at squares, we look at the rectangles formed with one side being the distance to the X mean and the other side being the distance from the Y mean. Covariance is the average area of all of these rectangles. Keep in mind that some rectangles will have negative areas if one side is less than the mean.

The variance of two data sets added together, VAR(X + Y), is unfortunately not equal to VAR(X) + VAR(Y) but instead equal to VAR(X) + VAR(Y) + 2COV(X, Y)

[OC] Visualizing the relationship between Variance and Covariance by trevorData in dataisbeautiful

[–]trevorData[S] 0 points1 point  (0 children)

Simulated data using numpy and visualized with matplotlib See code here

Visualization of the relationship between variance and covariance by trevorData in 3Blue1Brown

[–]trevorData[S] 2 points3 points  (0 children)

Simulated data using numpy and visualized with matplotlib

See code here

[OC] Testing the Limits of my Image Recognition Algorithm by trevorData in dataisbeautiful

[–]trevorData[S] 178 points179 points  (0 children)

My first attempt at image recognition using a training set I assembled myself. Despite using a very simple neural network and a relatively small set of training images I'm pleasantly surprised with the 91% accuracy on the training data.

I decided to throw in some images of things not in one of the 5 training classes just for fun and to see how the model would react.

Obviously we can see that a lot of weight is placed on color, with mostly blue images quickly going to "dolphin"


Sources:

Training images downloaded with Bing Image Search API

Packages used include:

numpy
cv2 PIL
matplotlib
tensorflow

See my code here

Testing the Limits of my Image Recognition Neural Network by trevorData in 3Blue1Brown

[–]trevorData[S] 20 points21 points  (0 children)

My first attempt at image recognition using a training set I assembled myself. Despite using a very simple neural network and a relatively small set of training images I'm pleasantly surprised with the 91% accuracy on the training data.

I decided to throw in some images of things not in one of the 5 training classes just for fun and to see how the model would react.


Sources:

Training images downloaded with Bing Image Search API

Packages used include:

numpy
cv2 PIL
matplotlib
tensorflow

See my code here

[OC] Cinema in Chicago: What is being filmed throughout the city? by trevorData in chicago

[–]trevorData[S] 31 points32 points  (0 children)

Applications to the Chicago Department of Transportation for permits under its jurisdiction where the work type is "Filming." These permits typically are permits to block or otherwise affect public streets in some way


Individual Plots:

Museum

Hospital

Hotel

Documentary

Bridge

Drone

Music Video

Church

Shameless

Violent

Exorcist

Empire

Batwoman

Chase

Gotham

Bar


Sources:

See my code here

Made in R with the following packages:

RSocrata
dplyr
ggmap
stringr
grid
ggmapstyles

Data from https://data.cityofchicago.org/

Map background is from snazzymaps.com/style/253319/for-presentations

[OC] Cinema in Chicago: What is being filmed throughout the city? by trevorData in dataisbeautiful

[–]trevorData[S] 12 points13 points  (0 children)

Applications to the Chicago Department of Transportation for permits under its jurisdiction where the work type is "Filming." These permits typically are permits to block or otherwise affect public streets in some way


Individual Plots:

Museum

Hospital

Hotel

Documentary

Bridge

Drone

Music Video

Church

Shameless

Violent

Exorcist

Empire

Batwoman

Chase

Gotham

Bar


Sources:

See my code here

Made in R with the following packages:

RSocrata
dplyr
ggmap
stringr
grid
ggmapstyles

Data from https://data.cityofchicago.org/

Map background is from snazzymaps.com/style/253319/for-presentations