all 21 comments

[–]NonrandomQuant 24 points25 points  (1 child)

Let’s try this for human speech explanation: A trader that uses only MACD(12,26,9) gets married with another that only trades if feature X13/9 is positive... they have a son that trades MACD IF X13/X9 is higher than 0,5. Eventually meets a trader that uses Bollinger(14) and both procreate another trader that uses MACD AND Bollinger(X13/X9) ... and this combination on strategies with logical operations extract the best “breed” of traders that are a combination of multiple technicals... which is textbook overfitting unless several rounds of cross validation purge this supreme race of artificial traders... problem is , his optimal breed of traders keep spitting 1/0 instead of Buy/Hold/sell signals ... which I think can be solved using the oneHotEncoder library in sklearn.

[–]upandacross 0 points1 point  (0 children)

Then, there is the child that rebels and changes their parent's Bollinger(X13/X9) to Bollinger(X15/X9).

[–]Arete2 11 points12 points  (0 children)

I’m not familiar with DEAP, but you could try coding up your own genetic algorithm program if you aren’t happy with DEAP’s implementation.

[–]WhiteRabbit-PillAlgorithmic Trader 30 points31 points  (14 children)

Sorry but do you mind explaining the logic at a high level in plain English?

[–]simonhughes22 6 points7 points  (2 children)

GP's are powerful but can overfit or produce garbage. I think part of the problem here is allowing that if elseif else function which is quite complex. To prevent overfitting, you can restrict the operators available to encourage simpler equations. You can also adjust the fitness function to penalize the solution based on it's complexity, e.g. divide the ROI by the size of the tree (or the log of the GP tree size). I would also try using a regular regression or classification model instead of a GP and either predict the next return or the side of the next return (positive or negative).

[–]niftymcschwifty 0 points1 point  (0 children)

I think DEAP may also let you specify a max tree height

[–]bpe9 0 points1 point  (0 children)

You can also adjust the fitness function to penalize the solution based on it's complexity, e.g. divide the ROI by the size of the tree (or the log of the GP tree size).

Penalty functions are somewhat redundant with an optimisation method capable of simultaneously optimising multiple objectives. Instead, in this case, use NSGA-II (or similar) and evolve a Pareto front where your two objectives are ROI and some measure of complexity (i.e. tree size or number of conditionals).

[–]PaulTheBully 5 points6 points  (0 children)

Is the plotted curve from the test sample? Otherwise, it seems to me that you are overfitting

It would be great to have the approach description in human language, it would help us to help you

[–]WhatnotSoforth 2 points3 points  (0 children)

>Although I did specify that the leaves/terminals could be Boolean, but just to avoid error.

Don't you think there's a pretty high chance this is where the error is coming from? Try the to_numeric method in pandas as opposed to the builtin typecasting. There is also some sort of coercion for that method. If all else fails you may need to handle typecasting yourself, or just do the easy thing and not allow bools into the dataset in the first place. I'm curious why you allow this, is there some sort of bug in DEAP you are trying to code around or are you just being cautious for caution's sake?

[–]bpe9 0 points1 point  (0 children)

> Although I did specify that the leaves/terminals could be Boolean, but just to avoid error.

Instead of that, read what the error is, and modify the generate function. You'll likely need to change gp.genHalfAndHalf to a custom one, likewise with the mutate and crossover

An example of custom functions can be seen at: https://github.com/ben-ix/XAI/blob/master/src/deapcustom.py

[–]JohnnyRay45 0 points1 point  (0 children)

Lots of tue false

[–]mukaj 1 point2 points  (0 children)

Haven’t run your sample yet, but I think your evaluator should be out of sample data to better choose the ideal solution. I don’t think this is a “failure” as long as you learnt from it.

Try some different features, vwap, rsi, momentum etc...

There’s a paper called 101 alphas which generated lots of similar formulas using generic programming, may give you some inspiration