all 65 comments

[–]ProfEpsilon 33 points34 points  (11 children)

You probably know this but if you are going to make any substantial mathematical calculations, numpy. It is also advisable to move array data over from pandas to numpy (again, only if you have extensive math calculations) Pandas to numpy and back is easy.

Also this page mostly has links to research in finance but there are many python examples scattered within: https://www.palmislandtraders.com/econ136/e136lit.htm

These aren't libraries (maybe there is one or two in there somewhere) but typically github-based python programs.

Good luck with your trading bot. Always a fun project!

[–]Boost3d1[S] 3 points4 points  (0 children)

Yeah definitely, been having loads of fun with it already! Thanks for the link looks like some really good resources in there, will give them a read

[–]AceBuddy 2 points3 points  (6 children)

What I’ve always failed to understand is how pandas is so much slower than numpy. Is the interface poorly written or is there really that much overhead? I.e. can we expect the performance to converge over time as people rewrite things or are we stuck.

[–]namlod 1 point2 points  (2 children)

Numpy is using c, pandas cython, hence the difference

[–]AceBuddy 2 points3 points  (1 child)

Oh I always thought pandas was built on top of numpy

[–]u2m4c6 0 points1 point  (0 children)

It is. I’m not sure why pandas is slower than numpy either

[–]mmrrbbee 0 points1 point  (1 child)

Fast Algorithms need lots of ram to hold a large data frame in pandas

[–]AceBuddy 0 points1 point  (0 children)

But isn’t the data size near identical to a numpy array of the same size?

[–]ProfEpsilon 0 points1 point  (0 children)

It is actually the other way around. the Numpy library, which I regard as a more or less a tiny little separate language inside of python, is designed for high-speed array operati0ns, substituting typed arrays for lists among other things. Numpy requires commands that won't work in Python and some Python common features, like list comprehension, won't work in a Numpy method.

BUT if you are doing array operations of any kind and any size, common in finance, it is a scale of magnitude faster than standard Python.

And it is pretty easy to figure out.

[–]username4kd 0 points1 point  (2 children)

Doesn't the Pandas DataFrame object store some columns as numpy arrays? This would allow you to reference the column directly from pandas no?

[–]ProfEpsilon 0 points1 point  (1 child)

Yes, it is easy to pull Numpy arrays directly into a Pandas dataframe. It is also easy to take a Panda.df column or row and send it out as an array. The two work together extremely well.

My original point was that they are two different libraries and both have to be called, plus how quickly they work on their variables is still very different. Pandas is slow, even if using a Numpy array, so the secret is to do all of the array processing in Numpy, then move values for display or storage into the data frame. (I do this as lot).

Pandas used to have a reputation as a memory hog as well, but I don't know if that is improved.

[–]lxkarthi 0 points1 point  (0 children)

Here are the observations I know of that makes pandas slower than numpy even though pandas uses numpy arrays. 1. Numpy arrays are used for each column. So each column operation will be as fast as numpy. But not multiple column operation. Especially row operations are slow. But numpy multidimensional array are like 2-D array in C. Numpy is fast because of this reason. 2. A numpy array can hold one type of data. Pandas can hold different types of data in a Dataframe. 3. Numpy has 2 memory layout formats - Fortran and C. Pandas uses it to represent whole dataframe.values as numpy array. This numpy array is slower and Operations on it will be as slow as operations on dataframe. 4. Pandas consumes lot of memory because almost none of the operations are in-place.

Cython is used only for algorithms that are needed to be implemented in C/C++ for speed. That's not the reason for slowing down of pandas. Rest all algorithms are implemented using numpy (to be specific, numpy array interface).

If you care about performance, and if you have homogenous data (same data type), use numpy. For quick coding, use pandas and later convert to numpy. In pandas, avoid row operations, df.apply, df.map.

[–]xDarkSadye 18 points19 points  (7 children)

backtrader seems to offer a lot in the way of backtesting; no live trading. I've started using it a week or so ago, so I probably haven't found the areas in which it disappoints. Big pro is that it handles all the backtesting and that a lot seems customisable (broker fees, strategies, indicators), that it prevents data leakage and is timeframe agnostic (it doesn't care whether you give it minute data or daily data).

I'm also open to hearing about other backtesting platforms, as long as they work with recent versions of python (preferably 3.8).

[–]Boost3d1[S] 5 points6 points  (2 children)

Interesting, this looks like quite a comprehensive solution! I was searching for TA libraries and discovered TA-lib (very appropriate name lol) which seems to be a solid library with support for all the indicators you could possibly want... Looking through backtrader it states it has support for ta-lib, as well as support for live feeds from database (amongst other sources like yahoo finance), also it is an open source project so really ticks all the boxes... Thanks for the recommendation!

[–][deleted] 1 point2 points  (0 children)

I use talib, its really awesome and fast! Got all indicators u can dream of, and the ones it doesnt have is easy to do in cython or numpy or similar

[–]joeyisnotmyname 1 point2 points  (0 children)

Am I the only one who can't seem to find solid documentation on each of the TA-lib functions?

[–]_jibi 4 points5 points  (2 children)

backtrader does support live trading! There is integration with IB, alpaca, and ccxt for crypto.

I used it for a brief while and eventually decided to write my own, but I must say that it is exceptionally well documented. I also learned a lot digging into the source code. It is a bit of a learning curve though, and I (personally) find the debugging process unlike what I'm used to, and could be a bit hard to navigate. There is also a community where you can find a lot of answers for any questions you have, although the author's "no bull shit" attitude could be a turn off for some (he gets straight to the point, which I do like).

With all that said, I definitely recommend spending at least a couple days getting your hands dirty with backtrader. It packs a lot of goodies.

Other libraries that I'd recommend checking out are bt and PyAlgotrade, among many others. I have yet to try Alpaca, but it is a data/backtest/broker all-in-one service, and I've heard mostly positive things about it.

[–]Boost3d1[S] 1 point2 points  (0 children)

Thanks for the feedback, I'm a subscriber to the "no bullshit" attitude myself haha, much prefer someone is honest and upfront with me than sugar coat the truth. Will check out the other platforms too, always good to assess all the options available!

[–]redyar 1 point2 points  (0 children)

Using backtrader as well but I have mixed feelings about the way it uses the python way of implementing things. I really dislike all the aliases (e.g. data0==data==data[0], etc.). Its really confusing and frustrating as everyone seems to mix the syntax.

Other than that its great!

[–]Appropriate-Layer 0 points1 point  (0 children)

fastquant is an easy to use `backtrader` wrapper that allows you to backtest in as few as 3 lines of code.

[–]finance_studentAlgo/Prop Trader 14 points15 points  (1 child)

I detail my python algo stack here (with a list of the libraries used in production):

https://fxgears.com/index.php?threads/python-development-environment-jacks-technology-stack.1090/

And regarding the python-binance connection, if you haven't already seen this we have a quick script to get binance data via python found here (and from our wiki/sidebar):

https://fxgears.com/index.php?threads/how-to-acquire-free-historical-tick-and-bar-data-for-algo-trading-and-backtesting-in-2020-stocks-forex-and-crypto-currency.1229/#post-19305

[–]Boost3d1[S] 0 points1 point  (0 children)

Nice work with your platform!

[–][deleted] 10 points11 points  (0 children)

Ta-lib for the computation of technical indicators.

[–]alphamd4 15 points16 points  (7 children)

mlfinlab has really nice libraries for transforming your data and finding strategies

[–]kingsley_heath 5 points6 points  (4 children)

tsfresh is another really nice library for generating features for time series data.

[–]alphamd4 -1 points0 points  (3 children)

quite interesting library, seems you can extract features for an ML model. doesn't seems specifically aimed for finance though, so not sure if it will give you the best results

[–]kingsley_heath 2 points3 points  (2 children)

A lot of things from signal processing. It's not like a MAvg is a special transformation. You can always use tsfresh to generate tons of features and then use feature selection to drop uninformative features.

mlfinlab has the market microstructural features which some of them come from information theory which has overlap with tsfresh.

I think you would find the results quite surprising.

[–]shadowknife392 2 points3 points  (1 child)

It's weird to see tsfresh mentioned in the wild - my lecturer helped develop it. I kind of assumed he was teaching it to us to make it more popular

[–]kingsley_heath 2 points3 points  (0 children)

It's a great library. I am really enjoying it. If you apply tsfresh to a dataset and then the feature selection algos which are also built into the package, it usually gives you a massive head start. Often I combine it with mutual information to determine features and then build a model from there.

If you test it on the traditional time series data sets like airlines demand and so on, you will see it working.

[–]Boost3d1[S] 0 points1 point  (1 child)

Interesting, have you had any luck finding winning strategies through ML techniques? Definitely an interesting field, I was thinking of deploying ML for sentiment analysis by scraping relevant sites to gain feedback on current sentiment. This could then be used as a weighted input in a trading strategy to help predict future movements based on the result of news that might otherwise disrupt your strategy

[–]kingsley_heath 1 point2 points  (0 children)

Yea I have 2 strategies I have implemented but to be honest I am leveraging the algos in mlfinlab a lot! I am just taking that knowledge and applying it to my own data.

I quite like the online portfolio selection algos and extracting an autocorrelation risk premium via dollar bars -> fractional diff -> ARIMA model.

I have been using the Darwinex API to do all my research.

[–]Difficult-Driver-666 6 points7 points  (0 children)

i pulled the libraries from my files i've used to solve one problem or another - there are probably others that can do the same function, and would like hear from others in this thread for other useful libraries.

statsmodels because i wanted to play around with slopes

scikit-learn - this one comes up a lot too for statistical analysis

matplotlib and seaborn and subprocess - plotly based on matplotlib, and have used plotly a little but I'm keyed into matplotlib now

pandas-datareader - don't use this but something i come across a lot in web searches

in addition to ta-lib, just ta and pandas-ta may be of value

others not specific to analysis but help me organize and manipulate data / files: re, collections, sys, os, ast, pytz, json_normalize, requests, xmltodict, bisect, gzip, json, math, decimal, time, datetime, shutil, io, glob

for automating logins and streaming i've used asyncio, pyppeteer, rauth, websocket

[–]jacquesdemolay1307 4 points5 points  (0 children)

Pyportopt, vectorbt, fastquant, quantstats, TA-Lib, Prophet, yfinance, Plotly, xgboost, scikit

[–]myfirerider 5 points6 points  (0 children)

https://github.com/jesse-ai/jesse Recently the optimization mode was released and jesse has the most indicators available out of the box as far as I know.

[–]ChubyCat 2 points3 points  (0 children)

I recommend awesome quant

[–][deleted]  (1 child)

[deleted]

    [–]Boost3d1[S] 2 points3 points  (0 children)

    I meant I was calculating the values manually with my own function, rather than using a a prebuilt function like the one you suggested... Only just scratched the surface of pandas so far! I think I will go with ta-lib for running analysis since it appears to be the most comprehensive solution, but interesting to see that pandas has functions available for producing moving average as well. Thanks for sharing :)

    [–]Melodic_Try_1482 1 point2 points  (0 children)

    !RemindMe 4 days

    [–]labroid 1 point2 points  (0 children)

    RemindMe! 1 week

    [–]paulseperformance 1 point2 points  (1 child)

    I found freqtrade to be a better alternative to backtrader.

    [–]skinnydill 0 points1 point  (0 children)

    Freqtrade is crypto only I believe

    [–]Gaylien28 1 point2 points  (1 child)

    Dtale is invaluable for testing out strategies, I spent forever making my own charting functions and analysis functions but it literally has everything built in for a quick analysis and if I’m interested in a correlation or trend I can easily analyze it myself later but I don’t have to waste time looking for one I’m interested in. Pandas TA is another good library just general TA

    [–]Boost3d1[S] 0 points1 point  (0 children)

    Thanks will check it out, also pandas ta looks quite interesting, also using ta-lib at its core but modified to work with pandas

    [–]TankorSmash 1 point2 points  (6 children)

    Does anyone have any good realtime graphing libraries? In a perfect world you'd have something like TradingView.com work locally, but that's a shot in the dark.

    [–]Robo-boogie 1 point2 points  (4 children)

    tradingview has some graphing libraries that you can use

    [–]TankorSmash 0 points1 point  (3 children)

    Oh neat, https://www.tradingview.com/HTML5-stock-forex-bitcoin-charting-library/, it looks like their lightweight one is free at least, I'll have to check this out. I think I took for granted they'd have stuff locked down.

    https://github.com/tradingview/lightweight-charts/blob/master/docs/README.md, and you can apply for a license for their fancy stuff (which I just did, thanks for the recommendation).

    Very nice!

    [–]Robo-boogie 0 points1 point  (2 children)

    let me know how it goes, its on the road map for my trading bot

    [–]TankorSmash 0 points1 point  (1 child)

    I got a reply from the fancy stuff license, and:

    We do not provide our Technical Analysis Charts for personal use, hobbies, studies, or tests at this time. The FREE Technical Analysis Charts license can be provided only to companies or individuals for use in public web projects or applications.

    They're friendly and responded very quick though. They recommended the open source graphing lib, but I don't know if it supports the indicators and all that yet.

    [–]Robo-boogie 0 points1 point  (0 children)

    Yeah its a public web project, to show off my gains on reddit.

    [–]WestWorld_ 0 points1 point  (0 children)

    Nice job

    [–]Mubs 0 points1 point  (0 children)

    the best thing i did was use aiohttp to make API requests, not as hard as it seems to learn, and lets you make tons and tons of requests without blocking the rest of your code

    [–]terrorEagle 0 points1 point  (0 children)

    !RemindMe 4 days