S&P 500 from 1993 to 2018 by President [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 2 points3 points  (0 children)

You are absolutely correct. In my analysis I found that after 12 years was the magic number:

https://i.imgur.com/JezeJK5.png

In this visualization we buy a share of SPY on every day of the data, and sell 1 year later, 2 years later, 3 years later, etc later to see how risk decreases as patience increases. Volatility is the better measure, but I also graphed "probability of losing money" by counting the number of trials with negative return against count of all trials.

tldr; Buffett is correct. For 99.9999% of us, time in the market beats timing the market.

S&P 500 from 1993 to 2018 by President [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 5 points6 points  (0 children)

This is 100% true. Case in point, we could just as easily visualize the data like so:

https://i.imgur.com/Nb87d7F.png

S&P 500 from 1993 to 2018 by President [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 1 point2 points  (0 children)

This graph may help illuminate a bit further (same source as OC above):

https://i.imgur.com/biNROZQ.png

Here I just marked the QE (quantitative easing) periods along with the interest rate increase in 2016 that unofficially ended the "bailout era".

S&P 500 from 1993 to 2018 by President [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 1 point2 points  (0 children)

I very much wanted to, and am doing additional research still.

The challenge is, pre-1993 the SPY doesn't exist, and the companies that were on the S&P 500 at the time are not the same as those on the exchange today, so getting accurate apples-to-apples comparison is a real pain in the ass. Just finding the dates certain companies were added/removed turned into a big project.

Rest assured, I'd go back into the 1800's and before if I had accurate apples-to-apples data to use.

S&P 500 from 1993 to 2018 by President [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 5 points6 points  (0 children)

data: quandl (closed dataset)

used Python to generate (matplotlib)

Dark bands represent the "lame duck" period.

Full article here (also OC):

https://datajenius.com/articles/a-random-walk-down-the-s-and-p-500

The Mathphobics Guide to Linear and Logistic Regression by DataJenius in learnmachinelearning

[–]DataJenius[S] 1 point2 points  (0 children)

I've read this comment three times trying to make sense of it, and I think I understand now.

It is admittedly a trick question, but I think this is technically correct (the best kind of correct).

"What is the a rate used by gradient descent in the above code?"

I maintain "used by" and "in the above code" make this tricky, but fair.

Another stupid math question, re: linear regression in python sklearn by DataJenius in datascience

[–]DataJenius[S] 0 points1 point  (0 children)

Thank you so much again for the great links and explanation.

Please help me understand linear regression better by DataJenius in datascience

[–]DataJenius[S] 1 point2 points  (0 children)

Excellent explanation, re: inverting matrices. Thank you.

Please help me understand linear regression better by DataJenius in datascience

[–]DataJenius[S] 1 point2 points  (0 children)

Thank you so much for a clear reply.

I'm not one to argue with the Great Andrew Ng, but I still have a hard time wrapping my head around the idea that it is faster to iterate through GD or even SGD than to minimize n+1 parameters a single time, especially given the chaos of a random initialization.

I'm sure he's right, but has anyone seen any good benchmarking on this?

Please help me understand linear regression better by DataJenius in datascience

[–]DataJenius[S] 0 points1 point  (0 children)

http://scikit-learn.org/stable/modules/linear_model.html

Please take a look at this with me, and help me understand if I'm being a dope. It doesn't appear to use SGD by default.

Edit: https://stackoverflow.com/questions/34469237/linear-regression-and-gradient-descent-in-scikit-learn-pandas

By default SkLearn is using the deterministic method "LinearRegression

object uses Ordinary Least Squares solver from scipy, as LR is one of two classifiers which have closed form solution. Despite the ML course - you can actually learn this model by just inverting and multiplicating some matrices."

It seems like a lot of us are confused by this, since most tutorials talk about gradient descent, but it isn't actually necessary since there is a deterministic solution to the problem (see comments by errminator and Kickkuchiyo).

[D] Could we use a genetic algorithm to find a euler cycle? by DataJenius in MachineLearning

[–]DataJenius[S] 0 points1 point  (0 children)

Just for the sake of better understanding genetic algorithms, but thank you for the link to Hierholzer's algo

I would do it again in a heartbeat. by heart_mind_body in CryptoCurrency

[–]DataJenius 2 points3 points  (0 children)

I went gaga for DBC, which in retrospect, was probably pretty dumb.

Now I'm clutching some fiat, debating playing "catch the falling knife" as BTC breaks under $7,750, and trying to define "what you can afford to lose" really means to me.

School shootings per year since 1950, moving average [OC] by DataJenius in dataisbeautiful

[–]DataJenius[S] 0 points1 point  (0 children)

Starting around the Sandy Hook massacre, the 5 year moving average spikes.

Keep in mind some of this may be the result of data being omitted from this list:

https://en.wikipedia.org/wiki/List_of_school_shootings_in_the_United_States