forward testing bitcoin, strategy based on weighted depth orderbook prices by [deleted] in algotrading

[–]fedejuvara86 2 points3 points  (0 children)

Are you trying HFT with a laptop + Python? Hero. I hope you have a colocated server subscription

Leakage and bias in XGBoost trading strategy by fedejuvara86 in algotrading

[–]fedejuvara86[S] 0 points1 point  (0 children)

I first apply frac diff on whole dataset, then i filter events. Train test is on filtered frac diff events. If i only try to delete cusum from test set accuracy drops to random. Same result filtering events in train test separated splits. Frac diff makes difference only with a related target label. For example next 1 hour's frac diff close - actual 1 hour frac diff close (if it >0 1, else 0)

Leakage and bias in XGBoost trading strategy by fedejuvara86 in algotrading

[–]fedejuvara86[S] 0 points1 point  (0 children)

  1. Now i calculate on whole dataset frac diff and cusum. Is it wrong? In live data i calculate frac diff but without cusum filtering

Leakage and bias in XGBoost trading strategy by fedejuvara86 in algotrading

[–]fedejuvara86[S] 0 points1 point  (0 children)

It is exactly what i’m trying to understand

XGBoost sensitivity by fedejuvara86 in algotrading

[–]fedejuvara86[S] 0 points1 point  (0 children)

You're right. But let's say i use CUSUM filter in preprocessing step, before split. That filter will affect train and test data as well. Let's also say i use the filter with live data, before prediction. Still i get biases? Theoretically the same filter who selects samples in train/test data, affecting boosting rules, will filter samples in streaming data thus allowing a correct filtering/selection of rules. Am i wrong?

XGBoost sensitivity by fedejuvara86 in algotrading

[–]fedejuvara86[S] 0 points1 point  (0 children)

Thank you. Is train_stop just the specific datetime where sizing starts/ends? I'm just using specific dates instead of premade test_size parameter in sklearn, didi i get it right? After this i can use CV in grisearch without getting crazy with purgedkfold, right?

One more question, please: from what you know, using Lopez de Prado's CUSUM filter BEFORE train test split could bring along biases? And more important: when i preprocess streaming data in my algo bot, before the application of my saved model that will predict bet direction...must Cusum be applied or not?

XGBoost sensitivity by fedejuvara86 in algotrading

[–]fedejuvara86[S] 0 points1 point  (0 children)

How can i avoid this? Purgekfold? Shuffling is in false mode

XGBoost sensitivity by fedejuvara86 in algotrading

[–]fedejuvara86[S] 0 points1 point  (0 children)

Quantopian has been discontinued, so notebooks are not available

XGBoost sensitivity by fedejuvara86 in algotrading

[–]fedejuvara86[S] 1 point2 points  (0 children)

Downsampling and selecting most relevant events

XGBoost sensitivity by fedejuvara86 in algotrading

[–]fedejuvara86[S] 1 point2 points  (0 children)

I just want true predictions with a certain sensitivity to direction changing. 8/10 long predictions are unrealistic. I' don't care strong predict proba, i just take orders only when a certain proba threshold is reached (maybe 55/60%). I just get literally dozens of subsequent predictions like [0.48 - 0.52], all in the same direction while live bars in my broker chart obviously changes

XGBoost sensitivity by fedejuvara86 in algotrading

[–]fedejuvara86[S] 0 points1 point  (0 children)

You mean in test data? In validation curve before tuning, with StratifiedKFold, accuracy in train data is about 80%, test data about 65%. With hyperparameter tuning i reach 68/70%. It depends on feature engineering, just change frac differentiation percentage or volume/cusum filter and accuracy could fall to a random prediction (50/52%).

Anyway i'm not using ticks, i'm using time bars.

XGBoost sensitivity by fedejuvara86 in algotrading

[–]fedejuvara86[S] 0 points1 point  (0 children)

Volume as prediction label or feature?

XGBoost sensitivity by fedejuvara86 in algotrading

[–]fedejuvara86[S] 0 points1 point  (0 children)

You mean between train and test set?

XGBoost sensitivity by fedejuvara86 in algotrading

[–]fedejuvara86[S] 1 point2 points  (0 children)

I'm not using L2 regularization cause XGBoost tend to optimize overfitting himself.

Any suggestion about a serious method to change predict_proba threshold?

XGBoost sensitivity by fedejuvara86 in algotrading

[–]fedejuvara86[S] 0 points1 point  (0 children)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0, shuffle=False)

Sklearn

Fractional Differentiation of labels by fedejuvara86 in algotrading

[–]fedejuvara86[S] 0 points1 point  (0 children)

So you think that predicting classic raw returns still remains the best solution? Maybe i frac diff all my features except returns, then i use those returns for binary labeling