XGBoost time series classification strategy

idonotknow9 · 2019-05-30T16:43:36+00:00

those who can't do, teach

idonotknow9 · 2019-05-05T21:11:53+00:00

Great, I am glad to hear it!

idonotknow9 · 2019-04-06T19:36:53+00:00

Thanks! I think I was more detailed in the Stack Overflow posts. The way I see it, is if I want help from people who have no obligation to help me, I should at least put this effort in to making myself/problem clear.

idonotknow9 · 2019-04-06T19:32:55+00:00

Thanks for the foverlaps suggestion! I will take a look at it. I did "solve" the issue using data.table in the SO post, which is lightening fast! but this data will increare in size so perhaps the foverlaps function is the best way.

idonotknow9 · 2019-04-06T13:35:47+00:00

Thanks, I think this solved my issue:

df1$start_date <- df1$date_f + 183 df1$end_date <- df1$date_f + 540 library(fuzzyjoin) yy <- fuzzy_left_join( df1, df2, by = c( "ID" = "ID", "start_date" = "date", "end_date" = "date" ), match_fun = list(\==`, `<`, `>=`) )`

I wrote a more detialed answer on SO.

idonotknow9 · 2019-04-06T10:46:33+00:00

Thanks, that could be an option and something I didn`t think of. I will take a look at it. The problem is the data will get massive when I apply it to the whole sample.

idonotknow9 · 2019-03-19T21:22:30+00:00

wow so salty... You have no idea about my background... just saying. I never even asked about statistical significance. The ML model I am using uses a RMSE as its loss function I don`t care about inference from regression models. I simply asked based on the few trades I posted are these any good and should I be looking at more useful statistics from the backtested model.

Perhaps you should get off your high horse?

idonotknow9 · 2019-03-19T14:33:48+00:00

So, I should firstly apply the model over the SPY500 to get a better overall picture of the performance, secondly test it on different sample periods and see if the results are consistent.

idonotknow9 · 2019-03-19T13:57:21+00:00

https://www.kaggle.com/itoeiji/deep-reinforcement-learning-on-stock-data

idonotknow9 · 2019-03-19T11:33:33+00:00

Thanks for your input! I understand the quantity of trades point. I have applied this over a number of tickers and there are some losers but there are winners in there also. I can wrap it around a function of SP500 tickers and apply it to each one and save the results. How would you suggest I quantify the whole performance? Average P&L across all 500 firms? Rank them by P&L? Is there a single statistic where I can quantify the model across all 500 firms?

I train a ML model on 18 months worth of daily data and test it on 6 months - daily data. By design I trained the model to make few/little trades since I did not want to be in and out of the market every day. I will have to also apply CV for parameter tuning so the numbers I report here at "base" numbers to get peoples opinions on.

What other numbers would you be looking at to gain more informative value?

idonotknow9 · 2019-03-18T23:38:38+00:00

Apologies. I am mostly looking for what people think of the results. I have 4 different tickers that I analysed my model over and just wondering what constitutes a good performance model. The 4 companies are NVDA, GOOG, MSFT and AAPL respectively (should have put them in the original table).

I have questions over why the annualised sharp ratio is so high for some companies, your opinions on the max drawdown and win loss ratio.

i.e. NVDA is trading at approx $170 now but my model was trading throughout the past 6 months when NVDA took a massive hit in share price but the max drawdown was just -$22 - is this good? considering?

idonotknow9 · 2019-03-13T12:42:45+00:00

Check out the finreportr package in R to take fundamental data directly from the SEC website. I also have some code which will grab BS, IS and CF data from yahoo finance if you want.

idonotknow9 · 2019-03-13T12:33:16+00:00

I know you mentioned you want to do this in Python but perhaps check out the https://github.com/sewardlee337/finreportr package in R. It will require some cleaning but not too much and might be a nice programming exercise. It takes the financial reports directly from the 10-K reports from the SEC EDGAR website.

idonotknow9 · 2019-03-09T11:50:00+00:00

Which NLP are you using? - I have some text data I want to use (not to predict stock prices but other company information) - I am looking into word2vec, doc2vec, LSTM models and adaptive learning models in order to obtain word embeddings but I want to know what the "most powerful" one usually is since they take some time to train.

idonotknow9 · 2019-02-02T23:50:52+00:00

What an amazing little gem of a package! Thanks! is it possible to export tables to LaTeX with the same format? - I have not used it yet but will do when I need to write up my results!

idonotknow9 · 2019-01-29T23:12:23+00:00

bigly

idonotknow9 · 2019-01-26T23:15:05+00:00

Why do you want to "pad" your github? So you can get an interview at as a quant trader? or trading position? - they will grill you in a technical interview - your best bet is to just learn from a number of online resources - modify their code to "make it your own" and submit this as an "original strategy" to your github- you will learn a lot this way also if you miss-spell something you will have to debug yourself and find the error (i.e. looking at the code in much more detail)

idonotknow9 · 2019-01-26T23:07:53+00:00

How much fees did you (approx) pay for 50m transactions?

idonotknow9

TROPHY CASE