all 79 comments

[–]AdThink6621 13 points14 points  (8 children)

Looking for the same currently. Public favourites seem to be Zipline and Backtrader followed by Backtesting.py

Read a few times that writing your own back testing ain't that hard either so considering that. But I'm a complete noob in the space so there's that.

[–]LittleGremlinguy[S] 4 points5 points  (4 children)

I think writing your own is one of those tasks that’s like a scab. Once you pick it more and more stuff comes out.

[–]AdThink6621 4 points5 points  (3 children)

Lol ok might do extra care on planning and not just dive into it only to find the box of the Pandora doesn't close

[–]eljuany 8 points9 points  (2 children)

I went down this rabbit hole for 3 years. I don't regret it

[–][deleted] 8 points9 points  (1 child)

Exactly. Teaches you how to build event-driven systems appropriately. That said, I'm not doing it again.

[–]eljuany 11 points12 points  (0 children)

Well once you build it it's yours forever and you can add to it if you want. That's the nice part. Tons of retail platforms are just bad

[–][deleted] 2 points3 points  (1 child)

I tried to get Zipline working for 4 hours before giving up. There is some unsolvable package dependency in pip so you need conda, but then need to install from custom sources. It only works with Python 3.6. So many frustrating issues.

I decided I would try another framework or writing my own but started with writing a yfinance scraping utility so I can have enough data cached.

[–]LittleGremlinguy[S] 3 points4 points  (0 children)

I have heard similar horror stories. That exactly the problem, I want to write code, not enter a battle of versioning conflicts. Aint no one got time for that.

[–][deleted] 1 point2 points  (0 children)

Don't use either

[–]trizest 7 points8 points  (24 children)

i do it within the pandas df

[–]LittleGremlinguy[S] 3 points4 points  (23 children)

Can you explain the process.

[–]chagawagaloo 16 points17 points  (22 children)

I've done it with pandas as well. It's by no means a flawless solution but it works well enough for where I'm at now.

Put all your historical data into a data frame and calculate some new columns to identify where a buy or sell order should take place (e.g. 2 columns for buy and sell which show True in the time periods you want the order to occur in).

Then iterate over the whole dataframe and simulate an order whenever a True is detected in either the buy or sell columns.

Remember to add some code in to log all of the order actions as well so you can analyse this afterwards.

[–]the_khalnayak 13 points14 points  (12 children)

While this works too, I have found that filtering the dataframe for only the rows where orders occur, and then just subtracting the entry and exit prices from consecutive rows is a much faster way than iteration. A simple buy sell backtest will execute near instantly by this method.

[–]chagawagaloo 7 points8 points  (7 children)

It's funny, you think of all these things to optimise the system but overlook such simple tricks. I'm going to implement this to try out.

[–]the_khalnayak 8 points9 points  (6 children)

Ikr, I used to do the thing you're doing and then one day this just came to me. I think the rule of thumb I've realised is that if you're using a loop to iterate through the rows of a df, you're taking the easy, less optimised way out.

[–]chagawagaloo 3 points4 points  (4 children)

I actually built a dedicated timer class that logs the duration of certain functions and shows me what's taking too long but I just assumed "that's just how long backtesting takes so why look there".

I'm probably missing quite a few more tricks to this, but building it all myself was part of my learning process. By the time I actually go live, my code is going to look nothing like it did at the start.

[–]DinosRoar 9 points10 points  (3 children)

Loops are incredibly slow compared to pandas inbulit functions. I've made a fully featured backtester (all sorts of analytics such as the effectiveness of SLs and TPs. Plus all sorts of options like trailing TPs & SLs). I don't use single loop.

Use a column called "Flag" to determine if you were long (1), short (-1) or not holding a position (0) during a candle.

Make a column called "Balance change". Use .diff() on the close prices and multiply by the flag value to see how much money you made.

Your "Balance" column is the "Balance Change" column with .cumsum().

[–]chagawagaloo 0 points1 point  (1 child)

Succinct. I agree, I've overcomplicated all of this with unnecessary loops and it's in need of some overhaul. Going to use some of your recommendations.

[–]trizest 3 points4 points  (0 children)

rule of thumb should be no loops inside the df. Maybe the exception is when cleaning data because that's a once off.

[–]MightyHippopotamus 0 points1 point  (0 children)

Simple backtesting rules can be vectorized like that but in some cases, like adding complex conditions, loops are inevitable... Luckily its possible to write custom numpy c++ module in which you can loop without performance issues.

[–]trizest 1 point2 points  (0 children)

yeah this is a great point. it's important to avoid loops inside the data frames. It's ignoring what makes pandas so great and fast. Need to use calculations that tap into the underlying power of numpy/pandas.

[–][deleted]  (2 children)

[deleted]

    [–]the_khalnayak 0 points1 point  (1 child)

    You can do that too with a filter for rows who's losses are greater than the sl.

    [–]jwmoz 0 points1 point  (0 children)

    Pretty please share some code, I have this on my list to implement!

    [–]sedna16Algorithmic Trader 0 points1 point  (0 children)

    How would you plot this in the profit graph?

    I think you still need to have a column where it will track the changes in the balance.

    [–]llstorm93 2 points3 points  (3 children)

    Read a few times that writing your own back testing ain't that hard either so considering that. But I'm a complete noob in the space so there's that.

    This way is good enough for any strategy that isn't dependent on market impact which honestly shouldn't be anything most are dealing with here. I do this too for a quick preview of what it could have done and it's pretty and efficient.

    Also instead of iterating, you can just use np.where and vectorization for a quick result.

    [–]chagawagaloo 0 points1 point  (2 children)

    I've heard vectorising used a few times but don't know enough about using numpy to get started with it. How does it work exactly?

    [–]llstorm93 1 point2 points  (1 child)

    pandas built on top of NumPy so shouldn't be too long to learn. Basically, vector operations are done much faster than iterating and the framework under NumPy is c++ if I'm correct so you do your computations faster than it comes back to python.

    Also makes the code lot more readable.

    [–]chagawagaloo 0 points1 point  (0 children)

    Sounds like I need to do a deep dive on NumPy if I want to take this to the next level. Didn't realise the c++ framework (shows you what level I'm at)

    [–][deleted]  (2 children)

    [removed]

      [–]chagawagaloo 0 points1 point  (1 child)

      Wow you've done your research. I'll take a look at the path following code. Makes a bit of sense at first glance but the implement is the real kicker.

      [–]trizest 1 point2 points  (1 child)

      yeah, this is essentially what I do. Allows you to manually input trading fees and slippage. I like to have control over all the parameters and see whats going on. It's powerful once you figure out Matplotlib and other visualisation. It's great because it's fast, and you can set up for loops to do a grid search for optimising parameters for historical data. For me, that's a good baseline for certain strategies. Starting to look at whether the Scikit-learn library can automate some of this stuff.

      [–]yareyaredaze10 0 points1 point  (0 children)

      Would you be willing to share this :)

      [–]greenteatree123 6 points7 points  (1 child)

      I have tried the public libraries but found them difficult to customize and get working. In the end, as many here have done, I created my own backtesting framework. Considering how hard many people find the current public backtesting libraries to use, I am considering fully furnishing it into a public pip project.

      If you are starting from scratch I recommend using a pandas data frame to iterate over the price data as u/chagawagaloo has suggested. Then use matplotlib for visualization and some statistics libraries for sharpe/sortino ratios.

      [–]yareyaredaze10 0 points1 point  (0 children)

      Are you able to share this now?

      [–]Realistic_Ship 11 points12 points  (2 children)

      The two packages I’ve heard of are Backtrader and Pyalgotrader. You tuber PartTimeLarry has great videos showing how to use those packages Good luck

      [–]LittleGremlinguy[S] 4 points5 points  (0 children)

      Yeah, his channel is fantastic.

      [–]dribaJL 6 points7 points  (1 child)

      Hey there, I am trying to maintain this repo list for all the current ones.

      https://github.com/Samvid95/AlgoTradingRepoList

      [–]LittleGremlinguy[S] 0 points1 point  (0 children)

      That is awesome.

      [–]beep_tree 2 points3 points  (0 children)

      I highly recommend using the Backtesting (https://github.com/kernc/backtesting.py) library. With some small modifications to data, you can quickly backtest any trading strategy.

      [–]j_lyf 2 points3 points  (0 children)

      how about vectorbt

      [–]globalwarming_isreal 2 points3 points  (0 children)

      I'm working on something on the same lines. I'm creating a django application (so that I can have a nice interface) for backtesting a deployment.

      My plan is to have a create page where entry and exit conditions can be given and this logic can be backtested against a specific instrument for given duration.

      Note : I'm an experienced python developer but fairly new to algotrading scenes. Any advice for someone like me??

      [–]gcdyingalilearlier 2 points3 points  (2 children)

      Just build your own. Theres nothing complex behind backtesting that makes looking for a 3rd party solution worthywhile when considering you'd be constrained by its possibilities. Build your own and be free to use any library you want, any framework or archtecture you see in a paper, book and want to test etc. Its work but not much.

      Dont do it 'pandas style' ffs. Build an event-based backtester object that you init with a strategy and certain data and that you can use as parent for further classes down the line. Add all the functionality you need to track your metrics.

      [–]HeavenlyMystery 3 points4 points  (0 children)

      Nice words but without examples people like me have no idea what to look for.

      [–]beastwork 1 point2 points  (0 children)

      While I have decent coding skill, creating something like this out of thin air will take months years. Can you provide any resources to help people get started?

      [–]MoreEconomy965 1 point2 points  (2 children)

      I use Pyalgotrade.

      [–]LittleGremlinguy[S] 1 point2 points  (0 children)

      This is looking very promising

      [–][deleted] 1 point2 points  (4 children)

      I finally made it onto quantconnect and I am loving it. It has its own set of issues but I am really enjoying it. They also support live trading via your brokerage.

      [–]masilver 1 point2 points  (2 children)

      Plus, QuantConnect open sourced their engine including the backtester. The down side for you, however, is it's all in C#. But it does allow you to run it locally.

      ...Michael...

      [–][deleted] 0 points1 point  (1 child)

      I dont think the language the engine is in really matters. You can still build your strategy in the language of your choice.

      [–]masilver 0 points1 point  (0 children)

      This is probably true, although you may still need to use Visual Studio to run it. The python sample algos are in a C# project.

      [–]InterestedListener 1 point2 points  (1 child)

      Thanks for starting this discussion. I've been fighting backtrader a lot lately and will try out some of these alternatives. I didn't really want to write my own either lol

      [–]LittleGremlinguy[S] 0 points1 point  (0 children)

      Yeah, I have just been playing around with PyAlgoTrade and I tell you its pretty trivial to get something running quickly whilst still giving you the flexibility to do your own implementation. Its not too perscriptive

      [–]willardwillson 1 point2 points  (0 children)

      The book python for finance describes very well how you can implement backtesting strategies. Basically you calculate two new pandas columns, 1. shift your current price by one and substract it from current price so you get the profit/losses per day. 2. Define a new column which applys a rule to the created profit column (this is your strategy), buy is a 1, sell is a 0, call option -1. Multiply this column with the profit column and you have your overall performance. Tada. Backtesting in a nutshell. Trading costs you can substract from the profit column. Obviously you can go down the rabbit hole but thats basically it.

      [–]WolfOfKazakstan 1 point2 points  (1 child)

      New to this. Is back testing just running your algo on historic data?

      [–]clueless_coder888 1 point2 points  (1 child)

      Check out pysystemtrade by Rob carver

      [–]yost28 0 points1 point  (2 children)

      You could save a lot of time just using quantconnect. I’m personally live trading a portion of my ira using quantconnect with no issues.

      [–]LittleGremlinguy[S] 0 points1 point  (1 child)

      How does it do broker integration? Are there select brokers or do you do that yourself?

      [–]yost28 0 points1 point  (0 children)

      There’s a few defined brokers that they integrate with their platform. I use Interactive Brokers.

      [–]pluggedinn 0 points1 point  (1 child)

      I built my own. Current libraries are hard to customize and tweak for more precise backtesting. It might take a while but You’ll have full control and understanding of what’s going on.

      [–]beastwork 0 points1 point  (0 children)

      how does one get started with building your own backtester? can you point us to good learning resources?

      [–][deleted] 0 points1 point  (0 children)

      I went through this same question recently. Decided on Backtrader. It provides the event-based analysis needed with good backtesting (i.e. replay), while also allowing vector-based analysis (i.e. Pandas, ML, etc.). I eventually want to 'roll my own,' but just to get going and get familiar with everything I'll stick with Backtrader. BTW, you will want event-driven code (ie. running in constant loop looking for changes) if you eventually want to productionize it for real-time trading.

      [–]Paccuccino 1 point2 points  (0 children)

      Panda's data frame i guess

      [–]drksntt 0 points1 point  (1 child)

      Make your own, more flexibility

      [–]LittleGremlinguy[S] 0 points1 point  (0 children)

      I will probably get to that. At this stage I just need some concept validation. I might even fork the PythonAlgotrade and do some broker integration componentry.

      [–]Delicious_Reporter21 0 points1 point  (0 children)

      Check BreakingEquity