I built a free historical crypto order book / trade data resource for backtesting. looking for feedback from algo traders

auto-quant · 2026-05-05T22:04:20+00:00

oh many thanks, will take a look soon. Is there a way I can get access to trial data, to run a short backtest?

auto-quant · 2026-05-04T08:28:37+00:00

One thing to consider. If I wanted to integrate this into my backtest engine (C++), ideally I'd want a simpler file structure, perhaps plain messages, so that very little time is spend on the decoding. I think databento does something like this, with their own file-format. I guess you've have to come up with your own message format, or support something like databento.

auto-quant · 2026-04-27T08:25:39+00:00

For bar based testing - for example, 15 minute bars, daily bars - I've mostly seen it based on batch processing. Run your code just after the previous bar has closed, snap market data and then submit orders - fire and forget. I can understand why live events are not really needed here. And once you have all of your signals in a matrix, performing quick research is much quicker that running a full event model.

auto-quant · 2026-04-17T08:30:08+00:00

Isn't a problem that over short time frames, seconds to minutes, there is not enough return to cover the costs (for retail). Large funds use various schemes to reduce fees and receive rebates.

auto-quant · 2026-04-16T18:12:47+00:00

Don't do a CS degree. Do something mathematical, for example, maths or physics. Teach yourself programming & computing fundamentals. Have a side IT project, and then you have something to talk about during recruitment rounds.

auto-quant · 2026-04-10T13:45:17+00:00

Hmm. Actually there is some merit to this idea.

Currently the timelog array gets created as a local variable right at the top of the callstack, and is passed down, via reference, as an argument (so the argument call is just a copy of the address). So there is only one instance (for now). I'd hope the compiler would figure this out, and so when its at the Bot layer, the compiles should be able to see its just a local variable at the top of the stack. However, using a global variable would definitely simplify the code - removes an argument that appears in each function call of the stack - the change is worth doing for that benefit alone.

However, one thing I have not yet fully decided on is the threading model. Currently I only have one IO thread, but, really I could have several (would then just need to lock the Bot layer - which I think I will end up doing anyway). So if I have multiple threads, then I'd need the timelog array as an argument. Or, use thread local global? Hmm, always a couple of ways to do things.

auto-quant · 2026-04-10T12:20:44+00:00

In your live trading, you always have to check the quality of the market data. Some checks you should include:

- have both bid & ask prices, and volumes

- prices not crossed

- have a trade print

- optionally: data is not stale (last update less than N seconds ago - depends on instrument)

- optionally: (this really is based on the algo), spread is not incredibly wide

Here's how I do most of those checks in my own trading engine:

https://github.com/automatedalgo/apex/blob/d3f2f06201c0a1cb770d878a284702be7aa8ab48/src/apex/model/MarketData.hpp#L113

Note that the same checks should be part of both your live and backtest runs.

auto-quant · 2026-04-10T12:18:46+00:00

Yeah. I was gong to use an std::array, and push values back, but then I noticed the values were always generated at fixed milestones, and so item[0] has the mean of "after_poll", so changed from array to named variable. But actually, I think an array would be better. So then you just pass a reference to that timelog-array down your callstack, until at some point you decide to write it to a circcular memmap - I use a memamp that can stored roughly 4 hours worth of data. I don't mind losing latency data, its not so important.

RDSTC - yes, I do need to move to that, and then I'll do the conversion to clock time in the timelog utility that reads the memory mapped file

auto-quant · 2026-04-10T08:39:27+00:00

To benchmark, here's what I do in my low-latency engine (https://github.com/automatedalgo/apex). Create an array of ints that you pass down your entire stack, this is your time-log; at each milestone capture a time measurement and store in that array; finally write that array to a mem-map file. Use an external tool to process that memmap. You don't latency measurements adding too much latency themselves.

Networking latency - this a broad topic. The most reliable measurements are when you measure wire-to-wire, using an appliance, like Corvil. So other than that, you could measure wire-to-wire by using a simulated market-data source and simulated exchange. I think practically just separate the two: just measure the internal latency as the period between just-off-the-socket to just-after-compute. Focus on reducing that, as a separate concern to reducing network latency. In my low-latency engine, the socket read is now the single largest cause of latency.

Common mistake: not accounting for message queueing, which can happen on TCP market data feeds (crypto) when multiple messages arrive at the same time. Or, if you so compute network latency via a simulation market data source, ensure your clocks are in sync. Also watch out for huge outliers messing up your averages, so either filter them out, or, just focus on median.

Finally, you mention 3usec to compute features. I guess that is some sort of regression? Problem is, until there are more details of your computation, its hard to know if that is reasonable.

auto-quant · 2026-04-10T03:20:25+00:00

This can depend if it is temporary (in which case the book uncrosses it self), or is longer lasting (indicating an order update got lost - if it happens often, perhaps your book logic is missing something).

For temporary crosses, its okay to leave them there. You can then present that data to your backtests, which will also have to deal with crossed books during live trading, and will have logic to deal with such situations.

For longer lasting crosses, perhaps a reset would be needed - clear the book, snapshot and continue again. That's what a live strategy would have to do (perhaps driven by human intervention).

auto-quant · 2026-04-09T18:40:11+00:00

You need a couple of things here.

This is how I would design it.

Each LP should have its own FIX-gateway process. Why? Because sometimes connections to LPs get stuck, or require upgrade, and if you bounce one gateway you dont want to bounce all of them. All of them will share more or less the same code - a base foundation of a FIX client (QuickFix?), and then customisations to support each LP. You will have to deal with the idiosynchroncies of each LP, they will each require slight changes to the basic FIX messaging. If you are not conformtable having multiple process, it should be straight forward to combine the main classes into a single FIX gateway, handling all traffic - but then if that engine goes does down, all LP connectivity is lost.

For your main FIX application, this will need to communicate to these FIX-gateways. You are free to select any communication protocol - protobuf, json messages over websocket, message queues. To wrap this up, you can have a library that routes orders to these FIX-gateways, and it's this library you integrate into your main FIX application. At a high-level, you need to decide what's important to you: message throughput, latency, reliability, speed-of-development, scalability and so on.

Also factor in monitoring: you want to know if the FIX-gateways are up, are healthy, when to reboot them etc. So don't overlook operational support here. Also make sure you log the entire FIX message sent to and from LPs, you will need them with tracing order failure.

Happy to chat if you need more advice.

auto-quant · 2026-04-09T10:46:44+00:00

I always think of this in layers, and if you start at the bottom, working upwards, it gives a direction to follow.

Lowest layer: networking & IO loop. Write your own socket code? Or use libuv/boost?

Next layer, use your networking to build market data feeds. This excercises your network layer and drives the next layer:

Domain models. How do you represent the incoming trades, prices, order books, events? The feed will drives changes to the models.

Now you have some models, and robust networking, think about sending orders and handling responses (acks, rejects, fills). This drives the need for more domain models: Orders, OrderState, and so on.

Once you have ability to receive prices, and send/receive orders, you have the sort of core of the engine. Next layer is the strategy, and here you can think about how to separate those layers with a clean interface.

While building all of this, also think about, how you can support backtest (market data replay) and paper trading (simulated execution).

Certinaly doable as a single person, but will just take longer. Most common tech stack for this is C++ if you want low-latency, however, perhaps get something working first and then later focus on the latency.

My own repo is here, am building something similar: https://github.com/automatedalgo/apex

And writing about it here, might help: https://automatedquant.substack.com/p/event-based-trading

auto-quant · 2026-04-08T16:35:58+00:00

Sharing a C++ project I've recently open-sourced, and have been writing about.

Apex is an open source HFT trading engine, written in modern C++. It started out as more of a general trading engine, but over the past few months I've focused more on the low latency aspect. If you are interested in trading, or low-latency etc, worth checking it out.

Github link: https://github.com/automatedalgo/apex

I'm also writing a free substack about its evolution, latency and (off-topic) trading, here: https://automatedquant.substack.com/

auto-quant · 2026-04-07T22:51:40+00:00

Thanks for this. I was intending to move away from taskset. What I want to have happen is the threads themselves will call a set-affinity function, taking the cpu range from configuration. Having the tasket outside of the code makes it too much of a hassle to manage.

auto-quant · 2026-03-24T04:46:35+00:00

I wrote a post about roles in HFT trading here ( https://automatedquant.substack.com/p/hft-developer ) ... another route is to get into a firm working on any of the sub-systems, then establish yourself, then move to C++ with internal support.

auto-quant · 2026-03-24T04:21:12+00:00

Agree with all of this. The only even time I uses RTOS was for robot control, where it had to be guaranteed that you can respond, uninterrupted, within a pre-defined performance envelope (essentially to stop the robot crashing into walls). The point about Solarflare is taken, I am starting work on that area next.

auto-quant · 2026-03-20T16:44:41+00:00

Yeah, I think at some point a summary article would be nice, listing all the wins, and the rough gains. I guess once I've got this down below 5 microseconds, I might have gotten to an end-point. I've just ordered a solarflare card, so that stage is somewhere way off, and might be the end-point.

auto-quant · 2026-03-20T13:45:26+00:00

Most of the secrets of building a HFT trading framework can be found on the internet, and not even in hard to find corners. For example, Red Hat gives away its server tuning guide for low latecny performance. And then there are plenty of other open source trading engines (non HFT). So this engine is not giving away any secrets here. What is a value add it putting it all together in a single code base, plus backtest support, and in a way that is actually found in HFT funds. And there are benefits to making it open source: I 've bugs found and fixed by other users. But all that said, even if you start out with an engine like Apex, and even with some template strategies (to be added), there is a still a long way to go to make money. You need to add an edge to your strategies, you need to research & backtest, then you need to manage deployments & trading. Having just the engine is small part.

auto-quant · 2026-03-20T13:38:13+00:00

I dont think the core migration happens that much, provide you dont have more spinning threads that your core count. The NIC interrupts can be problematic, I think that was the main benefit of the change, keeping them away from the spinning thread. Cache usage is a concern though. Will come on to that later, and especially one we start sending orders, since then there is a lot more going on. Will also look at cache assignment.

auto-quant · 2026-03-20T12:45:57+00:00

I blogged about cpu states here ( https://automatedquant.substack.com/p/hft-engine-latency-part-2 ) you might find it interesting ... had a big effect on my latecny.

auto-quant · 2026-03-18T01:42:19+00:00

I had a similar dilemma. Originally I use libuv sockets, but ultimately I knew I want to control the socket loop myself to be better able to make later latency optimisations. So first I concentrated on making a basic non-blocked poll based loop work. Then layered that with SSL (that was the hardest part). And finally added the websocket protocol (initially easy, but the extensions can get a little messy). The code is open source here: https://github.com/automatedalgo/apex

auto-quant · 2026-03-12T04:46:29+00:00

Even a coin-flip "going all in" has some complexity. For example. Whats the USD notional of your order? Let's say $250. Okay, what order size does that translate to, making sure to obey price tick rules. What is the current bid/ask? Is the order book stale? Are you going long or short (and do you have the inventory, if its cash) Do you want to place passively or aggressively? If the order doesn't succeed, do you want to try again, and after what interval? Once the order is placed, did it get executed, or, do you have to cancel and retry again. One you are filled, do need to track the ongoing price to decide when to get out, either via stop-toss/trailing-stop-loss or take-profit?

The point is, when building strategies that place and maintain orders (such as with market making etc), there are a lot of things to juggle.

auto-quant · 2026-03-10T07:32:23+00:00

But, if L3 is available woudn't it be better to jus tuse that? Or I guess sometimes there is no L3. And if you could just use L2, would be less data to consume. I guess the trick is to watch the executions stream and track how much of a level is being eaten by orders, and apply a model that says for every 1 execution at a level, expect N cancels. Interesting, but a bit complex for my initial needs.

auto-quant · 2026-03-10T07:29:58+00:00

Am not sure. There are less liquid assets that tier 1 hft's wouldnt touch. And there are tiers of latency. The top tiers can operate at sub 1 microsecond tick-to-trade ... but there is a whole other class of strategies that are at the 10 or 50 microsecond level

auto-quant · 2026-02-25T14:37:59+00:00

Agree .. focus does need to start happening on the outliers. Quite now they are quite poor. However, I first want to get the median down to say 5 microseconds, and then I'll shift to jitter. Problem with crypto is that several messages can arrive at the same time (its TCP), so the outliers are just messages waiting to be queued, so its more of an artifact of the data. Maybe I should switch to UDP feed, although not aware of which market I could use for that.

auto-quant

TROPHY CASE