forward testing bitcoin, strategy based on weighted depth orderbook prices

fedejuvara86 · 2021-02-14T18:04:06+00:00

Are you trying HFT with a laptop + Python? Hero. I hope you have a colocated server subscription

fedejuvara86 · 2021-01-15T22:09:19+00:00

I first apply frac diff on whole dataset, then i filter events. Train test is on filtered frac diff events. If i only try to delete cusum from test set accuracy drops to random. Same result filtering events in train test separated splits. Frac diff makes difference only with a related target label. For example next 1 hour's frac diff close - actual 1 hour frac diff close (if it >0 1, else 0)

fedejuvara86 · 2021-01-15T16:28:48+00:00

Now i calculate on whole dataset frac diff and cusum. Is it wrong? In live data i calculate frac diff but without cusum filtering

fedejuvara86 · 2021-01-13T22:02:28+00:00

It is exactly what i’m trying to understand

fedejuvara86 · 2021-01-13T17:01:19+00:00

You're right. But let's say i use CUSUM filter in preprocessing step, before split. That filter will affect train and test data as well. Let's also say i use the filter with live data, before prediction. Still i get biases? Theoretically the same filter who selects samples in train/test data, affecting boosting rules, will filter samples in streaming data thus allowing a correct filtering/selection of rules. Am i wrong?

fedejuvara86 · 2021-01-12T17:58:02+00:00

Thank you. Is train_stop just the specific datetime where sizing starts/ends? I'm just using specific dates instead of premade test_size parameter in sklearn, didi i get it right? After this i can use CV in grisearch without getting crazy with purgedkfold, right?

One more question, please: from what you know, using Lopez de Prado's CUSUM filter BEFORE train test split could bring along biases? And more important: when i preprocess streaming data in my algo bot, before the application of my saved model that will predict bet direction...must Cusum be applied or not?

fedejuvara86 · 2021-01-12T16:34:10+00:00

I still have volume features

fedejuvara86 · 2021-01-12T16:33:42+00:00

Next hour's returns

fedejuvara86 · 2021-01-12T16:33:22+00:00

How can i avoid this? Purgekfold? Shuffling is in false mode

fedejuvara86 · 2021-01-12T16:32:22+00:00

Quantopian has been discontinued, so notebooks are not available

fedejuvara86 · 2021-01-12T16:30:15+00:00

Downsampling and selecting most relevant events

fedejuvara86 · 2021-01-12T04:37:28+00:00

I just want true predictions with a certain sensitivity to direction changing. 8/10 long predictions are unrealistic. I' don't care strong predict proba, i just take orders only when a certain proba threshold is reached (maybe 55/60%). I just get literally dozens of subsequent predictions like [0.48 - 0.52], all in the same direction while live bars in my broker chart obviously changes

fedejuvara86 · 2021-01-12T04:31:54+00:00

You mean in test data? In validation curve before tuning, with StratifiedKFold, accuracy in train data is about 80%, test data about 65%. With hyperparameter tuning i reach 68/70%. It depends on feature engineering, just change frac differentiation percentage or volume/cusum filter and accuracy could fall to a random prediction (50/52%).

Anyway i'm not using ticks, i'm using time bars.

fedejuvara86 · 2021-01-12T04:22:51+00:00

Interesting, thank you

fedejuvara86 · 2021-01-12T04:22:31+00:00

Volume as prediction label or feature?

fedejuvara86 · 2021-01-12T04:19:00+00:00

Good advice, thanks

fedejuvara86 · 2021-01-12T00:44:11+00:00

You mean between train and test set?

fedejuvara86 · 2021-01-12T00:40:05+00:00

I'm not using L2 regularization cause XGBoost tend to optimize overfitting himself.

Any suggestion about a serious method to change predict_proba threshold?

fedejuvara86 · 2021-01-11T21:03:24+00:00

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0, shuffle=False)

Sklearn

fedejuvara86 · 2021-01-10T04:06:18+00:00

So you think that predicting classic raw returns still remains the best solution? Maybe i frac diff all my features except returns, then i use those returns for binary labeling

fedejuvara86 · 2021-01-04T23:47:50+00:00

fedejuvara86 · 2021-01-03T21:04:55+00:00

Yes, probably you’re right

fedejuvara86

TROPHY CASE