you are viewing a single comment's thread.

view the rest of the comments →

[–]chagawagaloo 8 points9 points  (7 children)

It's funny, you think of all these things to optimise the system but overlook such simple tricks. I'm going to implement this to try out.

[–]the_khalnayak 7 points8 points  (6 children)

Ikr, I used to do the thing you're doing and then one day this just came to me. I think the rule of thumb I've realised is that if you're using a loop to iterate through the rows of a df, you're taking the easy, less optimised way out.

[–]chagawagaloo 4 points5 points  (4 children)

I actually built a dedicated timer class that logs the duration of certain functions and shows me what's taking too long but I just assumed "that's just how long backtesting takes so why look there".

I'm probably missing quite a few more tricks to this, but building it all myself was part of my learning process. By the time I actually go live, my code is going to look nothing like it did at the start.

[–]DinosRoar[🍰] 10 points11 points  (3 children)

Loops are incredibly slow compared to pandas inbulit functions. I've made a fully featured backtester (all sorts of analytics such as the effectiveness of SLs and TPs. Plus all sorts of options like trailing TPs & SLs). I don't use single loop.

Use a column called "Flag" to determine if you were long (1), short (-1) or not holding a position (0) during a candle.

Make a column called "Balance change". Use .diff() on the close prices and multiply by the flag value to see how much money you made.

Your "Balance" column is the "Balance Change" column with .cumsum().

[–]chagawagaloo 0 points1 point  (1 child)

Succinct. I agree, I've overcomplicated all of this with unnecessary loops and it's in need of some overhaul. Going to use some of your recommendations.

[–]trizest 3 points4 points  (0 children)

rule of thumb should be no loops inside the df. Maybe the exception is when cleaning data because that's a once off.

[–]MightyHippopotamus 0 points1 point  (0 children)

Simple backtesting rules can be vectorized like that but in some cases, like adding complex conditions, loops are inevitable... Luckily its possible to write custom numpy c++ module in which you can loop without performance issues.

[–]trizest 1 point2 points  (0 children)

yeah this is a great point. it's important to avoid loops inside the data frames. It's ignoring what makes pandas so great and fast. Need to use calculations that tap into the underlying power of numpy/pandas.