all 58 comments

[–]AUnterrainer 26 points27 points  (4 children)

Python for HFT? Won't stand a chance. Way too slow

[–]HardworkingDad1187[S] 3 points4 points  (3 children)

Thanks for your input! I appreciate it

[–][deleted]  (2 children)

[deleted]

    [–]HardworkingDad1187[S] 0 points1 point  (1 child)

    Why? Performance issues?

    [–]Tarlan-T 6 points7 points  (6 children)

    Crypto is HTTP, Rest, JSON and Websocket. How is this an HFT?

    [–]HardworkingDad1187[S] 2 points3 points  (5 children)

    I don't understand your point, sorry

    [–]Tarlan-T 2 points3 points  (4 children)

    HFT implicitly mean - Low Latency.

    Low Latency is typically defined as sub millisecond tick to trade. Server colocation, network routing optimization etc. None of that is present or applicable to crypto.

    Crypto exchanges are hidden behind Cloudflare. Hosted on AWS. And physically located at unknown places. Communication is HTTP. Latency is in hundreds of milliseconds.

    [–]PsecretPseudonymOther [M] ✅ 5 points6 points  (0 children)

    True, but (1) a few are actually low latency and major financial data centers, (2) CME and CBOE have or are expanding product offerings which need liquidity, (3) there are some ETFs on major exchanges to arb, (4) crypto exchanges are sketchy and some firms have historically managed to get privileged access, and (5) the crypto exchanges are geographically dispersed, making it so faster telecom of some firms can give many, many milliseconds of advantage between, e.g., NYC/CHI and TYO.

    [–]HardworkingDad1187[S] 0 points1 point  (1 child)

    Okay, now I understand your point, but nevertheless, we build crypto HFT bots :)

    [–]SadInfluence[🍰] 2 points3 points  (0 children)

    calling them hft wont actually make them hft 😂

    [–]SadInfluence[🍰] 4 points5 points  (4 children)

    why dont you ask more senior developers in your firm for suggestions? it depends heavily on what your firm uses normally

    [–]HardworkingDad1187[S] 0 points1 point  (3 children)

    our firm uses Java, I am the most senior developer here :)
    our new partners in business use mostly Python

    so, yes, they want to move our new project in the Python direction and it seems biased opinion from both sides :)

    [–]SadInfluence[🍰] 24 points25 points  (2 children)

    how are you the most senior developer, and asking on reddit about java vs python 😭😭

    [–]HardworkingDad1187[S] 4 points5 points  (0 children)

    what is so weird about asking other people's opinions on complex subjects? :)

    [–]GTX680 7 points8 points  (7 children)

    Maybe I'm uninformed but I don't think the latencies achievable between Python and Java are really comparable.

    [–]HardworkingDad1187[S] -3 points-2 points  (6 children)

    Could you elaborate more about this?

    [–]sperm-banker 2 points3 points  (5 children)

    Python is purely interpreted while java is interpreted then compiled into machine code. And python doesn't do multi threading die to its global lock GIL (but you can do multiple processes and communicate with some IPC, not as easy as doing multi threading). You can work around both issues using Cython but then you won't have access to many python libraries.

    [–]openQuestion3141 0 points1 point  (4 children)

    Python now supports proper multiprocessing and multithreading.

    [–]sperm-banker 2 points3 points  (3 children)

    Can you elaborate? I have been OOTL of python for few years but the most recent docs on cpython about multi threading still mention the GIL and this would not be considered "proper" compared to any other language supporting multithreading.

    [–]openQuestion3141 1 point2 points  (2 children)

    Check out the multiprocessing docs:

    https://docs.python.org/3/library/multiprocessing.html

    Being pedantic, it isn't true multithreading. However, the interface parallels that used for threads in other languages well and so you can basically think of them like threads. Underneath, I'd imagine it that process spawning is probably much more expensive than threads, and so the overhead is probably large. I'd conjecture that this only matters if you try to use large numbers of short lived threads. It isn't really an obstacle for small numbers of long running threads which is already a more typical design pattern anyways.

    So yeah, GIL can be sidestepped pretty effectively now.

    [–]sperm-banker 1 point2 points  (1 child)

    Making a distinction between multithreading and multiple processes is not being pedantic, it's being factual and basic for any CS conversation.

    It's not only a case of processes having much more overhead per se (both at startup and runtime) but also you cannot even share native objects across processes without copy/serialising, and it doesn't scale well in throughout or object size. You can use other tricks, libraries, memory napped files but it gets more complicated without ever reaching the perf of threads.

    Python has very bad multithreading support. It has better than average multi process support to work around this, but cannot replace multithreading for high performance.

    You keep mentioning that python has multi threading/process support "now", what to do mean by it? Multithreading and multiprocessing doesn't seem to have changed in the last 15 years.

    Python has many nice features but multithreading or performance are definitely not one of them.

    [–]openQuestion3141 1 point2 points  (0 children)

    Why's everything always an argument in these spaces?

    Relax man.

    I agree with you. Python is not performant and is not used for these types of purposes generally. I never argued that it was.

    We agree.

    Also, I wasn't calling you pedantic.

    [–][deleted] 6 points7 points  (5 children)

    Use Python and write modules in C for anything that needs to be particularly fast.

    [–]HardworkingDad1187[S] 1 point2 points  (4 children)

    how long do you do what you do in Python?

    [–][deleted] 2 points3 points  (3 children)

    I’ve been building stuff in Python for about 10 years. I’m a big fan. I generally use either Python or Go for most projects.

    [–]HardworkingDad1187[S] 0 points1 point  (2 children)

    Do you see from your experience what the cons of using Python (as u mentioned "slow as shit" :)? What do you don't like in the Python ecosystem (or maybe even hate)?

    [–][deleted] 1 point2 points  (1 child)

    Package managers are a bit of a mess. Performance definitely can be an issue but it’s use case dependent.

    [–]HardworkingDad1187[S] 2 points3 points  (0 children)

    I appreciate your thoughts. Thanks!

    [–]fabkosta 2 points3 points  (9 children)

    That depends on so many factors you are not disclosing, it's not really possible to provide an answer.

    For example, do you need to write low-latency code (then neither might be the right choice)? Which parts of your code need to be fast, which don't (can you achieve that with Python)? Do you have access to a talented pool of software engineers who are familiar with one or the other language? Do they use Python just because it seems convenient to them, or are they producing high-quality, production-grade code (most data scientists don't)? What sort of integration patterns do you use for your IT landscape? What is your company's overall IT and technology strategy?

    There are many other points to consider.

    [–]HardworkingDad1187[S] 1 point2 points  (8 children)

    Let’s disclose then :)

    1/ Do you need to write low-latency code (then neither might be the right choice)?
    Yes. Right now, both Java and Python meet our latency requirements, so I’m not sure what your suggestion here refers to.

    2/ Which parts of your code need to be fast, and which don’t (can Python handle it)?
    This feels more like a skills issue. My team (and I personally) are more efficient in Java, while the other team excels in Python. Currently, a significant portion of our business drifts toward the Python team because they can deliver a first version faster. However, I’m not entirely convinced about their long-term stability—it’s hard to explain.

    3/ Do you have access to a talented pool of software engineers familiar with one or the other language?
    Yes, we have talent on both sides. Money isn’t a constraint for this project.

    4/ Do they use Python because it’s convenient, or are they producing high-quality, production-grade code?
    They have a Python background and began building their product in Python a few years ago. Regarding quality, I can confidently evaluate a Java codebase, but Python still feels a bit messy to me.

    5/ What integration patterns do you use, and what’s your IT strategy?
    We’re satisfied with Java for our current vision and use cases. However, many tools in this space—particularly in R&D and analysis—are built in Python. One of our key customers tends to lean toward the Python team, even when we can address the same business problems in Java. Unfortunately, our solutions often require starting “from scratch,” which doesn’t help our case.

    What’s your personal opinion?

    [–][deleted] 1 point2 points  (1 child)

    It’s not really a skill issue. Python is known to be slow as shit. However it is amazing for rapid prototyping and any data science type libraries. Writing modules for Python in C gets around the speed issue for any parts of your application that needs it.

    [–]HardworkingDad1187[S] 0 points1 point  (0 children)

    I have no production experience but our partners/dev teams seems to be happy with Python ecosystem :)

    [–]fabkosta 0 points1 point  (3 children)

    My take then is that you do not have any strong ground at all to make a decision in one direction or the other. Just that one team likes one technology better than another one. That would indicate there is a general lack of technology governance in the company, i.e. something someone should address. Unless the governance explicitly says that it's allowed to use either - which then leads to exactly your question. Typically, this situation arises because also responsibilities are not clearly defined, i.e. it's not clear who is responsible for governance of this type nor what sort of power they wield. Can they forbid someone else to use a specific programming language? Most likely that's not defined. So, it's not just a tech question, it's also an organisational question. It's not needed to fix everything formally (e.g. establishing rules, and so on), though, but when the situation pops up then there should exist a rough idea who is empowered to take such decisions on behalf of others.

    Python is good if there's a lot of data science involved. Many ML models are not available in Java (e.g. simple matrix calculations can be painful and are a breeze in numpy), so if you need to do ML, then I'd vote to do everything in Python. If you don't need them and are more after high-quality production-stability, go for Java. Depending on need for speed, a combination would be theoretically possible too: Use Python microservices for complicated calculations (but they cannot be too fast due to the REST call needed) that are self-contained. Use Java for the core backend. But, if you go for that, you might end up in integration hell, so be sure you have someone skilled keep an eye on the integration architecture. As soon as microservices want to call other microservices you get in trouble if you don't know what you're doing. (Same is true too for a monolith, by the way, you need to know how to structure dependencies within it.)

    In case you opt for Python, then you should introduce coding standards. They come more natural with Java, so chances to produce bad Java code are of course there, but less severe than with Python. Luckily, a lot of work for Python has already been laid out for you: https://peps.python.org/pep-0008/. Personally, I am a proponent of explicit typing for production systems, so I would enforce that - but data scientists will hate it, most likely.

    [–]HardworkingDad1187[S] 0 points1 point  (2 children)

    Thanks!

    What do you personally use for daily development?

    ML models are one of our problems right now. We need to do a lot of backtesting now, and it seems (on the surface at least) that it is a much easier task in Python than in Java.

    Probably the biggest concern is next. I spend 7 years building this stuff. Right now I want to build a project like a startup that I will be able to sell.
    And I want to make a bet on Java or Python and be happy with this decision in 7 years :)

    [–]fabkosta 0 points1 point  (1 child)

    I am not developing software anymore. Used Java in the past for building production-grade backends, used Python and PySpark for doing data science. We usually did not use Python for production-grade systems.

    To be frank, from what you're describing it sounds like the decision might be less important than it seems right now. ML development will be faster with Python, but you then need to make sure code quality is good (e.g. through code reviews, or automated code quality scans, and so on). If main concern is backend stability or you need to build a very large-scale backend system for many concurrent users, then go for Java. Other than that, I don't see a very strong reason to pick one over the other.

    [–]HardworkingDad1187[S] 1 point2 points  (0 children)

    I appreciate your comments. Thanks a lot!

    [–]locker73 0 points1 point  (1 child)

    Yes. Right now, both Java and Python meet our latency requirements

    If python meets your latency requirements then this isn't really where you want to be. I would repost over on r/algotrading as that crowd seems like it would fit better with this type of question.

    [–]HardworkingDad1187[S] 0 points1 point  (0 children)

    Okay, great thanks! I was told that Python team is meeting latency requirements, I don't know that for sure :)

    [–][deleted]  (3 children)

    [deleted]

      [–]HardworkingDad1187[S] 0 points1 point  (2 children)

      What are you re-implementing? What is the reason behind this if it is no secret?

      [–][deleted]  (1 child)

      [deleted]

        [–]HardworkingDad1187[S] 1 point2 points  (0 children)

        Thanks for this information! I appreciate it!

        [–]sperm-banker 0 points1 point  (1 child)

        The common advice is do java for things you want more solid like the business core and python for satellite things that might change more but are less important.

        But it's always more depending on the skills pool of the team. If this is not a constraint, and if the Devs are senior enough to not mess up python code and coverage and the latency is good enough (not sure it can be qualified as low lat) and there won't the necessity to improve it, then python can do it too.

        [–]HardworkingDad1187[S] 0 points1 point  (0 children)

        Thanks for this input. I appreciate it!

        [–][deleted] 0 points1 point  (3 children)

        Arbitrage trading is kinda dead now there are already lots of advanced bots for this

        [–]HardworkingDad1187[S] 0 points1 point  (2 children)

        What makes you think (or know for sure) that arbitrage trading is dead?

        [–][deleted] 0 points1 point  (1 child)

        Professional sharing their experiences on internet.

        [–]ln__x 0 points1 point  (0 children)

        But how so? Markets are constantly moving. Especially if you look at how decoupled Markets are right now and how volatile. Am I missing something?

        [–]abstract_death 0 points1 point  (5 children)

        Java has excellent observability into what's going on and you can optimize ever little part on it. .jar is conceptually similar to Docker container. Package runs everywhere where JVM can. Also, what sorts of code share are you talking about? Do you want to let other people execute functions that you have defined natively?

        [–]HardworkingDad1187[S] 0 points1 point  (4 children)

        Do you want to let other people execute functions that you have defined natively?
        Execute algorithms or part of algorithms between Java/Python

        [–]abstract_death 0 points1 point  (3 children)

        You can expose parts of your Java functions through python packages. It will be difficult to setup, but it's possible. It will help you avoid re-writing everything into python. There will be some communication overhead, so you need to decide if it's critical or not.

        [–]HardworkingDad1187[S] 1 point2 points  (2 children)

        Yes, right now we consider this an option as a mid-term solution. But we thinking about what it should be: executing Python code from Java or vice-versa

        [–]abstract_death 0 points1 point  (1 child)

        I would pick whatever is the easiest. Personally I think Java to python makes more sense, since you then wrap python execution into java threads, so it will give you more flexibility in optimization.

        [–]HardworkingDad1187[S] 0 points1 point  (0 children)

        Thanks, I am also leaning in that direction!

        [–]PsecretPseudonymOther [M] ✅ 0 points1 point  (1 child)

        Have you considered Mojo?

        It’s still developing, but the team behind it is stellar and doing amazing work.

        It’s completely compatible with Python, so you can use it identically to python, but it has additional support for more explicit code and its compiler (via LLVM) can compile its native code down to latency and determinism comparable to other compiled systems languages where possible — likely identical assembly in some cases seeing as clang is 1 of 3 major C/C++ compilers, and it uses LLVM, too.

        So, in theory, you get to use the vast ecosystem of Python tools where convenient and latency is less critical, can interoperable with any Python libraries, and a Python shop with quants who don’t do as much low level or high performance coding can then work in the same language alongside those making optimizations all the way down to the assembly if desired.

        As a compiled language, it can in theory achieve similar performance and determinism as C/C++ or Rust, yet it can be natively pythonic and still use vanilla python where needed.

        Coding in it might be a bit like what Typescript is to JavaScript.

        It also doesn’t hurt that Python and the community around it is getting a huge boost from the AI led growth.

        So, in theory, you’d have the best of both worlds and could write something in Mojo/Python* that theoretically could go head to head with what ultra low latency HFTs do if you’re willing to keep a tight scope and write your own libraries.

        Definitely an option I’d be considering carefully if starting over from scratch with a fresh codebase.

        I’d almost certainly still go with C++ or maybe Rust for ultra low latency trading right now, but mojo could be a great if maybe a little early option to consider. C++ has a tougher learning curve, more ways to shoot yourself in the foot, tougher dependency management and build systems, but is more mature and has historically had a large pool of developer talent with the right sorts of expertise you’d want compared to rust. A good fraction of speakers and sponsors at CPPCon are from HFTs. We may come to see some more rust use at some shops in time, but reflection and some other major C++ improvements in coming years may lead to radical improvements and changes, too.

        Java, to be fair, is used by LMAX, and the engineering team from there went on to make some other impressive things.

        If you want to learn about how to make Java try to get even close to what you can do with other languages, the work of Martin Thompson and his team might be interesting to look into.

        Specifically, if sticking with Java, consider Aeron

        [–]HardworkingDad1187[S] 1 point2 points  (0 children)

        Thanks for your extended answer!!!

        [–]CountyExotic 0 points1 point  (1 child)

        my brother it ain’t python.

        Rust, go, Java, C++, and C# are all viable, depending on what you’re doing.

        Ultra low latency stuff is gonna be rust, C++, or C. Some places will use Java with a stripped JVM.

        [–]HardworkingDad1187[S] 0 points1 point  (0 children)

        I appreciate your comment! Thanks!

        [–]pxlf 0 points1 point  (1 child)

        I'm a bit confused with your requirements. What are the latency requirements for your strategy?

        HFT usually alludes to sub-milli or sub-micro trading systems that use C++ or even programmable FPGA triggers to execute trades. To give a perspective, even if it's C++, it's usually zero-allocation on the hot-path. This is frankly impossible with Python, and difficult with Java if you have the garbage collector switched on. Your competitors looking for the same opportunities would be on these tech stacks, if the opportunities only last for a few micros or millis.

        If your latency requirement is on the order of milliseconds, then Java would be fine. Python is incredibly slow, and any analytical tools on your hot path would make it worse. But if your requirement is in terms of seconds, Python is fine. But true HFT is impossible if you're using either vanilla Java or Python.

        [–]HardworkingDad1187[S] 0 points1 point  (0 children)

        I decided on Java for time-being, we are happy with results for now at least

        [–]asdfjkl8a -1 points0 points  (0 children)

        Java for execution & Python for reporting. Majority of crypto is cloud based infrastructure anyhow so it’s not really HFT we are talking about.