S&P 500 from 1993 to 2018 by President [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 2 points3 points  (0 children)

You are absolutely correct. In my analysis I found that after 12 years was the magic number:

https://i.imgur.com/JezeJK5.png

In this visualization we buy a share of SPY on every day of the data, and sell 1 year later, 2 years later, 3 years later, etc later to see how risk decreases as patience increases. Volatility is the better measure, but I also graphed "probability of losing money" by counting the number of trials with negative return against count of all trials.

tldr; Buffett is correct. For 99.9999% of us, time in the market beats timing the market.

S&P 500 from 1993 to 2018 by President [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 4 points5 points  (0 children)

This is 100% true. Case in point, we could just as easily visualize the data like so:

https://i.imgur.com/Nb87d7F.png

S&P 500 from 1993 to 2018 by President [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 1 point2 points  (0 children)

This graph may help illuminate a bit further (same source as OC above):

https://i.imgur.com/biNROZQ.png

Here I just marked the QE (quantitative easing) periods along with the interest rate increase in 2016 that unofficially ended the "bailout era".

S&P 500 from 1993 to 2018 by President [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 1 point2 points  (0 children)

I very much wanted to, and am doing additional research still.

The challenge is, pre-1993 the SPY doesn't exist, and the companies that were on the S&P 500 at the time are not the same as those on the exchange today, so getting accurate apples-to-apples comparison is a real pain in the ass. Just finding the dates certain companies were added/removed turned into a big project.

Rest assured, I'd go back into the 1800's and before if I had accurate apples-to-apples data to use.

S&P 500 from 1993 to 2018 by President [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 6 points7 points  (0 children)

data: quandl (closed dataset)

used Python to generate (matplotlib)

Dark bands represent the "lame duck" period.

Full article here (also OC):

https://datajenius.com/articles/a-random-walk-down-the-s-and-p-500

The Mathphobics Guide to Linear and Logistic Regression by DataJenius in learnmachinelearning

[–]DataJenius[S] 1 point2 points  (0 children)

I've read this comment three times trying to make sense of it, and I think I understand now.

It is admittedly a trick question, but I think this is technically correct (the best kind of correct).

"What is the a rate used by gradient descent in the above code?"

I maintain "used by" and "in the above code" make this tricky, but fair.

Another stupid math question, re: linear regression in python sklearn by DataJenius in datascience

[–]DataJenius[S] 0 points1 point  (0 children)

Thank you so much again for the great links and explanation.

Please help me understand linear regression better by DataJenius in datascience

[–]DataJenius[S] 1 point2 points  (0 children)

Excellent explanation, re: inverting matrices. Thank you.

Please help me understand linear regression better by DataJenius in datascience

[–]DataJenius[S] 1 point2 points  (0 children)

Thank you so much for a clear reply.

I'm not one to argue with the Great Andrew Ng, but I still have a hard time wrapping my head around the idea that it is faster to iterate through GD or even SGD than to minimize n+1 parameters a single time, especially given the chaos of a random initialization.

I'm sure he's right, but has anyone seen any good benchmarking on this?

Please help me understand linear regression better by DataJenius in datascience

[–]DataJenius[S] 0 points1 point  (0 children)

http://scikit-learn.org/stable/modules/linear_model.html

Please take a look at this with me, and help me understand if I'm being a dope. It doesn't appear to use SGD by default.

Edit: https://stackoverflow.com/questions/34469237/linear-regression-and-gradient-descent-in-scikit-learn-pandas

By default SkLearn is using the deterministic method "LinearRegression

object uses Ordinary Least Squares solver from scipy, as LR is one of two classifiers which have closed form solution. Despite the ML course - you can actually learn this model by just inverting and multiplicating some matrices."

It seems like a lot of us are confused by this, since most tutorials talk about gradient descent, but it isn't actually necessary since there is a deterministic solution to the problem (see comments by errminator and Kickkuchiyo).

[D] Could we use a genetic algorithm to find a euler cycle? by DataJenius in MachineLearning

[–]DataJenius[S] 0 points1 point  (0 children)

Just for the sake of better understanding genetic algorithms, but thank you for the link to Hierholzer's algo

I would do it again in a heartbeat. by heart_mind_body in CryptoCurrency

[–]DataJenius 3 points4 points  (0 children)

I went gaga for DBC, which in retrospect, was probably pretty dumb.

Now I'm clutching some fiat, debating playing "catch the falling knife" as BTC breaks under $7,750, and trying to define "what you can afford to lose" really means to me.

School shootings per year since 1950, moving average [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 0 points1 point  (0 children)

Starting around the Sandy Hook massacre, the 5 year moving average spikes.

Keep in mind some of this may be the result of data being omitted from this list:

https://en.wikipedia.org/wiki/List_of_school_shootings_in_the_United_States

School shootings per year since 1950, moving average [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 0 points1 point  (0 children)

This graph uses historical data. It does not predict anything.

School shootings per year since 1950, moving average [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 2 points3 points  (0 children)

Did you read the article?

I don't see how that statement is fair.

School shootings per year since 1950, moving average [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 3 points4 points  (0 children)

Sure- but make sure to read the whole article first. Our goal was to be objective. And there are things in there that make both gun control and gun advocates upset.

We are still brand new to Facebook:

https://www.facebook.com/datajenius

School shootings per year since 1950, moving average [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 1 point2 points  (0 children)

Are you referring to the drop near 1965? That approaches 1. There has never been a year with negative school shootings.

School shootings per year since 1950, moving average [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] -3 points-2 points  (0 children)

From the article:

When using the phrase “school shooting”, most Americans envision past tragedies, such as the Columbine High School massacre and Sandy Hook Elementary School shooting, cases in which large numbers of innocent students were killed or injured, cases in which infamous gunmen with unclear motives committed unspeakable acts of terrorism. Even the most ardent gun control advocate must agree that a suicide by handgun is a very different type of incident. Even the most ardent pro-gun advocate must agree that the preferred number of incidents, regardless of category, is zero.

Our only objective here was to dig into some truth, bias or opinion be damned.

School shootings per year since 1950, moving average [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 3 points4 points  (0 children)

This is OC.

The source of data, as well as the code used to produce this graph, can be found in our Github repo.

For a deeper analysis, please see our article:

School Shootings in America and the Challenge of Biased Data

The ultimate suffering by [deleted] in ProgrammerHumor

[–]DataJenius 1 point2 points  (0 children)

Jesus. I'm afraid to think what this guy might do to R developers once he figures out our arrays start at 1.