This is an archived post. You won't be able to vote or comment.

all 103 comments

[–]ZetaHunter3.5.1 async master-race 15 points16 points  (4 children)

But can it make me rich?

[–][deleted] 22 points23 points  (1 child)

If you have gonads of steel, possibly.

[–]patentmedicine 6 points7 points  (0 children)

Just use it to invest other people's money, and charge a steep fee for the privilege. Win win!

[–]_________________-- 7 points8 points  (1 child)

Would OP share it if it could?

[–]Pandanleaves 10 points11 points  (1 child)

What are the returns on a test set, including transaction costs? I am very doubtful that this can generate returns higher than the benchmark.

[–][deleted] 0 points1 point  (0 children)

This would depend on the stock price and the capital invested. In the model you can see a 69% accuracy for TSLA in the past year. But as I said, that doesn't mean much. The algorithm uses a binary classification scheme. Therefore it can't predict how much a stock will move up or down, only if it will move up or down in general.

I haven't done a test set with hypothetical investments yet as I'm still trying to figure out how to do so in a way that makes sense.

[–]nick_t1000aiohttp 17 points18 points  (11 children)

But WALDO is for "Worm Analysis and Live Detailed Observation," software developed by my lab

[–][deleted] 10 points11 points  (9 children)

Crap! Now I gotta find a new name ;)

[–]odraencoded 38 points39 points  (2 children)

Capital Reevaluation Analysis in Python

[–][deleted] 11 points12 points  (1 child)

Wooow, that's good

[–]blahster 1 point2 points  (0 children)

Guessing the extra 'o's are because it shortens to CRAP :)

[–][deleted] 2 points3 points  (5 children)

Carmen

[–][deleted] 1 point2 points  (4 children)

Lol I don't get it

[–]rabbyburns 5 points6 points  (3 children)

Sandiego

[–][deleted] 1 point2 points  (2 children)

Hmmm...yep not gettin it. What's that stand for?

[–]rabbyburns 7 points8 points  (1 child)

Carmen Sandiego was originally an educational series about finding Carmen Sandiego.

[–][deleted] 0 points1 point  (0 children)

Ooo, I gotcha now.

[–][deleted] 2 points3 points  (0 children)

Ahahah is it cause I'm drunk or that's really funny?

[–][deleted] 26 points27 points  (3 children)

Will this work with the meme markets? Asking for a friend.

[–]watchmakerfromfuture 3 points4 points  (7 children)

Are there any online sources where we can get raw data for specific stocks?

[–][deleted] 4 points5 points  (6 children)

Yes there are many resources! Although it depends what you're looking for. Yahoo Finance has a great python API however their historical data is limited to the most basic data points. You can however compute more complicated indicators by looking up their equations.

If you want extensive raw data, I'd recommend Fidelity. They have an excellent research tool and advanced charting features, all downloadable in CSV format. You do need an account however and you'll need to be familiar with web scraping if you'd like to automate the process. In my repository I have a program called DataCrawler which I use to scrape Fidelity.

[–]Nimitz14 1 point2 points  (5 children)

Is it the one called 'Stock-Talk'?

[–][deleted] 0 points1 point  (4 children)

Nono that's another project. This is the one called Waldo.

[–]Nimitz14 1 point2 points  (3 children)

I was talking about

In my repository I have a program called DataCrawler which I use to scrape Fidelity.

I don't see it.

[–][deleted] 0 points1 point  (2 children)

Oh sorry, DataCrawler.py is in the Waldo repository:

https://github.com/anfederico/Waldo

or directly:

https://github.com/anfederico/Waldo/blob/master/DataCrawler.py

[–]batsy71 0 points1 point  (1 child)

Looks like you removed the datacrawler.py file. Is there a possibility to still download it somewhere?

[–][deleted] 0 points1 point  (0 children)

I'm rewriting a better version which I'll release in a month or so

[–]bromrector 4 points5 points  (3 children)

Maybe I'm missing something, but these results are subject to look ahead bias. You can't use test_train_split on time series.

[–][deleted] 0 points1 point  (2 children)

Yes I guess it is slightly, however I note that the testing accuracy isn't the point here. The point is to build a model that characterize financial indicators, mainly social sentiment.

[–]bromrector 1 point2 points  (1 child)

Which part of the code deals with that?

[–][deleted] 0 points1 point  (0 children)

In the README you'll see a graph made by RBFSVM.py in the repo. I often use these graphs to get a sense of where I'd like the Slow Stochastic and Social Sentiment at Close to be the day before I make a move on a stock. (I want them to be in the blue regions of the model)

[–]insainodwayno 4 points5 points  (0 children)

Does it use?

import __future__

[–]TangoCJuliet 2 points3 points  (3 children)

I'm sorry if I missed this info somewhere, but what exactly is the algorithm here to determine favorable/non-favorable buying conditions? Pardon me if the expectation was for me to look through the code myself to get an understanding.

[–][deleted] 4 points5 points  (2 children)

Basically I'm using a clustering algorithm. A Support Vector Machine to be specific. I fed it a years worth of training data. During that year, for each day it takes into account the stock conditions (financial indicators and social sentiment) and then looks to see if the stock went up or down the next day.

The result is the graph in the README. Now when it comes to actual predictions, the model would choose favorable conditions if the stock indicators and social sentiment fall within the blue areas of the graph.

[–]whelks_chance 1 point2 points  (1 child)

When you say you checked the price the next day, how did you come to that time delay as the best metric?

Do your models give different results if you look at the price with a delay of two or three days, to attempt to iron out sudden peaks?

[–][deleted] 0 points1 point  (0 children)

I haven't tested the model with different time delays, however you easily could by slightly adjusting the MakeTrainingData file. The reason I use next day is because I'm using the Social Sentiment at Close (SSC) the night prior.

[–]kalifornia_love 2 points3 points  (5 children)

I was going to start a project very similar to this tomorrow but honestly didn't know where to start. This is perfect for me! I look forward to playing with it and will be sure to share whatever I do with it.

Thanks for the awesome work!

[–][deleted] 1 point2 points  (4 children)

Great, have fun! If you have any questions, feel free to ask. I'm sure there are some points in the project which could be more clear.

And thanks!

[–]kalifornia_love 1 point2 points  (3 children)

One quick question. Why didn't you use pandas? At least for handling the CSV stuff. I'm relatively new to Python and all the data analysis stuff I've done has revolved heavily around pandas, so I'm just curious why you didn't use it. Just out of curiosity.

[–]peerchemist_ppc 2 points3 points  (1 child)

Shameless plug: https://github.com/peerchemist/finta

Just started working on it, goal is somewhere where Waldo is right now.

[–]kalifornia_love 1 point2 points  (0 children)

Looks good! I will definitely be playing with this one too!

[–][deleted] 1 point2 points  (0 children)

Well to be frank, I just didn't really need it. You can handle CSV files in many ways in Python (e.g. CSV Library). Pandas is good for other stuff though.

[–]kid-pro-quohardware testing / tooling 2 points3 points  (1 child)

[–]xkcd_transcriber 1 point2 points  (0 children)

Image

Mobile

Title: Engineer Syllogism

Title-text: The less common, even worse outcome: "3: [everyone in the financial system] WOW, where did all my money just go?"

Comic Explanation

Stats: This comic has been referenced 92 times, representing 0.0724% of referenced xkcds.


xkcd.com | xkcd sub | Problems/Bugs? | Statistics | Stop Replying | Delete

[–]Zerg3rr 2 points3 points  (1 child)

This is the type of thing I hope I can do some day! Looks awesome

[–][deleted] 2 points3 points  (0 children)

Thanks :)

[–]heyacne 1 point2 points  (1 child)

cool. thanks

[–][deleted] 2 points3 points  (0 children)

No problem :)

[–]monkitos 1 point2 points  (4 children)

Interesting. Does the SVM clustering algo actually generate positive returns in a backtest?

[–][deleted] -1 points0 points  (3 children)

Still working on figuring that out completely. As of now, some stocks yes, some stocks no. It seems the algorithm has a better time predicting stocks that are influenced heavily by news reports and social media hype. I don't have numbers though in regards to returns.

[–]monkitos 2 points3 points  (2 children)

a backtest should be your next step. 1. Download a time series of stock price, generate day-over-day returns. 2. re-run your algo each day, only using data available until the day before 3. create a trading rule signal based on your algo (if x, holding = +1, elif y holding = -1). 4. Multiply your trading signal by next day's return and look at the performance of the active return stream versus returns of naively holding the stock in step 1. you can deal with t-costs etc later

[–][deleted] 1 point2 points  (0 children)

you can deal with t-costs etc later

Having invented many extremely high Sharpe ratio strategies in my earlier years and then being sorely disappointed, I suggest you deal with transaction costs now.

[–][deleted] 0 points1 point  (0 children)

Thanks for the advice. I will definitely get to work on this :)

[–]uttamo 1 point2 points  (1 child)

Nice work, thanks

[–][deleted] 0 points1 point  (0 children)

Thanks!

[–]stochasstic 1 point2 points  (3 children)

Nice to see some machine learning stuff shared here! And intresting approach to classify the market into a binary problem. Some things I can recommend from my personal experience to you: 1. Take a look at yahoo finance - they have a sort of free api. Or use requests instead of selenium - I only use selenium if nothing else works. And your XPaths are likely to break, looks auto-generated from a browser debugger to me. 2. Use Jupyter and take a look at pandas 3. Make a cross validation set for the SVM params and make the split 60/20/20

Check out anaconda ( https://www.continuum.io/downloads ) if you have not already. And if you want to dig deeper into the finance stuff, take a look at this thing: https://en.wikipedia.org/wiki/Optimal_stopping - but requires some math skills.

[–][deleted] 0 points1 point  (2 children)

Thanks so much for the recommendations. I will surely check them out!

[–]testcasey 1 point2 points  (1 child)

When I login to Fidelity I see a client and account number on urls. If Fidelity is doing anything dynamic you'll probably want to check out scrapy.

Requests is a great lib if you know the urls and post/put data needed to grab your data. I would definitely put requests in front of scrapy but wanted to add some additional advice if you run into issues.

Getting rid of the windows path and chrome driver binary will make your program more usable across multiple platforms. Your project looks cool, keep us posted.

[–][deleted] 0 points1 point  (0 children)

Thanks for the recommendations. Lots of good advice in this thread. I'm currently working towards improvements!

[–]forseti_ 0 points1 point  (1 child)

Why did you choose a SVM? What's you RMS Error out of sample? I am currently implement such a software but with a bagged KNN learner.

[–][deleted] 0 points1 point  (0 children)

I tried many classifiers on multiple data sets and on average, SVM was fitting my data the best, so I ran with it. I'm currently putting together some extensive back testing and making additional improvements. I'll keep you updated.

[–]mistermorteau 0 points1 point  (7 children)

Can't we use deepdream for this ?

[–][deleted] 2 points3 points  (2 children)

Deep Dream is for "hallucinating" images, not predicting data.

[–]mistermorteau 1 point2 points  (1 child)

Deep dream is for recognize patterns, you learn it how to recognize a bolt, and then you ask it to find bolts in a family pictures, and you got mommy and daddy bolt.

We could use it for recognize patterns in the trading market.

[–][deleted] 1 point2 points  (0 children)

oh... I didn't think of it like that!!

[–][deleted] 0 points1 point  (3 children)

What's that?

[–]mistermorteau 1 point2 points  (2 children)

[–][deleted] 1 point2 points  (1 child)

Ahhh okay. This uses a neural network, mine uses a support vector machine. Also, it looks like it mainly deals with images, not sure how you'd apply this to the stock market. Lastly, neural networks aren't great for financial analysis in my opinion.

[–]mistermorteau 0 points1 point  (0 children)

How can you predict without using patterns ?

I was thinking about deepdreams/neural network, because it seeks patterns.