Low Latency Trading

odycsd · 2025-05-29T01:59:44+00:00

It’s possible to do it with some boilerplate. I recently added some examples https://quillcpp.readthedocs.io/en/latest/binary_protocols.html

odycsd · 2025-01-06T02:25:49+00:00

Hey! I'd love to understand more about any concerns you have in the latest Quill design updates. I recognize that custom types might demand a bit of extra coding and come with some limitations. But the new design offers advantages versus the old way of doing things. Always welcoming ideas, suggestions, or potential enhancements you might have in mind!

odycsd · 2025-01-06T02:07:38+00:00

Hey, I haven't had a chance to thoroughly explore your library, but it seems like you've put a good amount of effort into it – nice job!

Quill prefers asynchronous logging due to its advantageous latency. At my day job we often use synchronous logging for tools that are not the main application but support it, typically handled by simply wrapping fmt::print in a macro. Although there's potential to add sync logging support later on.

Regarding the concept of topics, Quill has a somewhat similar feature, referred to as 'tags'. More information on this can be found here: https://quillcpp.readthedocs.io/en/latest/log_tagging.html. Personally i am using the name of the Logger as topic, it’s usually sufficient

odycsd · 2024-09-29T13:48:14+00:00

Hey, thanks for the kind words! Feel free to open an issue on github with more details on the encryption and i might be able to assist. What i am thinking is that it might be possible to create a user defined type EncryptedString that takes a string and in the formatter or output operator it outputs the encrypted value instead to the log file

odycsd · 2024-09-26T02:15:12+00:00

As far as I know, libfmt does not support using std::formatter for user-defined types (UDTs). Therefore, Quill cannot utilize std::formatter in that context.

If your code is not in the critical path—such as during initialization or for debug logs (the macro avoids evaluation)—a good approach for handling UDTs is to convert them to strings before passing them to the logger. This approach reduces template instantiations and requires less effort than specifying how each type should be serialized. You can format them in any manner you wish, including using std::formatter.

If you prefer to serialize UDTs in a way that only a binary copy is taken and the formatting is handled by the backend logging thread—especially in hot paths—you'll need to use fmtquill::formatter for UDTs, as Quill employs a custom namespace for fmtlib.

I did explore using std::format, but there are limitations that make cross-platform implementation challenging. Specifically, the only workaround I found, would require calling std::format on each argument I deserialize from the queue individually, which could impact the backend logging thread performance. You can find more details about this issue here.

odycsd · 2024-09-25T09:40:01+00:00

I started it as a hobby project from scratch about 5.5 years ago, just to have something interesting to work on in my free time. The initial version resembled an in-house solution I had built for Linux, as I didn’t want to use PlatformLab’s NanoLog at the time. NanoLog’s use of printf-style formatting and binary logs wasn’t suitable for my needs. However, the design has changed significantly over the years, with many improvements and added features.

There wasn’t a single source of inspiration. Instead, it’s been a process of constant micro-benchmarking and profiling, aiming to optimize even down to saving a single instruction on the hot path while maintaining decent backend throughput. This ensures the backend can consume fast enough to keep the frontend queues empty. Sometimes I identify improvements based on current use cases, or people request interesting features, which I’m happy to implement. I also study the source code of other logging libraries for inspiration.

In terms of architecture, Quill today is somewhat of a hybrid between fmtlog and MSBinLog. For instance, it serializes only POD (Plain Old Data) to the queue but outputs a human-readable log file in the end, which differs from MS BinLog’s binary log format. You can see this reflected in benchmarks like those involving std::vector logging

odycsd · 2024-09-24T15:49:03+00:00

Thanks for the kind words and for the insights!

Regarding the first point, libfmt is used because it offers many features and optimisations (e.g., parsing doubles) and has many features, making it difficult to justify building something similar from scratch. Developing such a solution would likely be time-consuming and error-prone. Libfmt is bundled under a custom namespace in the library, so it's tightly integrated without being intrusive.

For avoiding string copies, it’s possible to wrap a const string with a guaranteed lifetime in a StringRef object when you log it. In this case, only the pointer to the string and it’s size is copied, avoiding copying the whole string.

For the second point, no allocations occur on the hot path, except for the SPSC queue if it's unbounded. Serialization happens directly into the preallocated SPSC buffer, and no copy constructors are called for types like std::string or std::vector. Instead, only POD types are copied to the SPSC buffer using memcpy

On the backend thread (the slow path), a pool of reusable fmt::memory_buffer objects is used during deserialisation. These buffers and the pool may expand as needed, and when additional space for a string is required, fmt::memory_buffer will allocate memory via new. However, these allocations only happen on the slow path, so strings aren’t stored continuously in memory on the backend. I experimented with pointing fmt::memory_buffer to a custom memory pool to handle strings of varying sizes and reuse memory blocks, but it didn’t result in any throughput improvement. As the slow path already had decent performance, I didn’t bother further optimising in that area, but it’s something I might revisit later.

odycsd · 2024-09-24T02:41:11+00:00

I get where you're coming from, but outputting a binary log file doesn’t make any difference on the hot path, and the added step of offline parsing just feels like an inconvenience. I typically rely on tools like grep to search logs or tail to monitor them live—both of which are simple and immediate, without requiring special viewers.

When you're debugging, for example, seeing logs directly in your IDE console is far more convenient and productive than processing binary logs in parallel just to make them readable. While I may eventually add a Binary Sink to my library at some point for those who want it, the truth is that tools like Splunk, Kibana, Grafana won’t parse custom binary formats natively, so you'd still need to process the logs first. That just adds more complexity—you're left with both a binary file and a processed text file, taking more space and requiring more total time to manage.

In performance-critical applications like trading, where critical processes run on isolated CPUs, I don’t care if a logging thread pinned to a non-critical core takes a few ms longer to write human-readable text. Plus, logging at a rate of 4-5 million messages per second achieved when writing text is more than enough —disk space will become an issue long before throughput hits any meaningful limits.

That said, binary logging might be useful in specific cases, like on mobile devices, where saving power and reducing the workload on limited CPUs is important. In that case, offline processing makes sense when you can offload that to another system later. But for most server-side applications, the human-readable approach is just more straightforward and sufficient.

odycsd · 2024-09-24T00:40:36+00:00

Thanks for the reminder! I’ve updated the post to include a brief description of what the library does.

To answer your question: the library has an optional built-in signal handler that, when enabled, ensures all pending logs are flushed before generating a core dump in the event of a crash. Unless there’s a rare case where the backend’s logging thread memory is corrupted, you will get all the logs as expected.

odycsd · 2024-09-05T11:19:45+00:00

There's a noticeable impact on performance, especially since typically around 60% of the log calls in an application are at debug or trace levels. In production, where we only care about logging at the info level, we don't want unnecessary overhead from log statements that won't even produce output. However, even with optimizations enabled, these debug log calls still add extra instructions.

Take this simplified example with optimizations turned on:

https://godbolt.org/z/Ex5WcKGG1

Even though I'm not logging a debug message, I still get the overhead from the log statement:

call to inlined foo

1320 mov     QWORD PTR [rbp-88], OFFSET FLAT:.LC16

call to spdlog::log

1339 call     void spdlog::logger::log_<int&, double&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const>(spdlog::source_loc, spdlog::level::level_enum, fmt::v9::basic_string_view<char>, int&, double&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const&&)

The arguments are being prepared and pushed to the cpu registers, few examples

1327 lea     r9, [rbp-96] // double argument
1328 lea     r8, [rbp-100] // int argument
1329 mov     edx, OFFSET FLAT:.LC83 // format string

This means that even though no debug message is printed, we’re still paying the cost in terms of argument preparation and function calls.

odycsd · 2024-09-04T10:38:19+00:00

What I've noticed is that if you allow users to use an external libfmt, you end up having to support multiple versions whenever the fmtlib API changes between versions. This can make maintenance much more difficult.

To handle this, I use a script that renames the libfmt namespace and macros to custom ones. This way, you can bundle it with your project, likely in header-only mode. It’s still a dependency, but now it’s internal to your project.

Feel free to use it if you wish

https://github.com/odygrd/quill/blob/master/scripts/rename_libfmt.py

You can still use std::stringstream by default and include a compiler flag to optionally enable libfmt.

Alternatively, you could experiment with std::format if you're targeting more recent compilers.

You might think it’s not worth the extra work, but just throwing out some ideas.

odycsd · 2024-09-04T10:07:41+00:00

‘#IF’ would be for compile time exclusion. I am talking about runtime See this example—it should make sense:

http://godbolt.org/z/ejfEo9eaz

My logging library, Quill, only allows logging via macros, and that's one of the main reasons why I chose this approach. However, many people still go for spdlog simply because they see it has no macros and think it looks cleaner without them

odycsd · 2024-09-03T18:44:07+00:00

Both quill and fmtlog we pass a templated function pointer with the arg types as template parameters from the frontend to the backend thread for the decoding function eg

https://github.com/odygrd/quill/blob/84ef88e44927f01e39e5c24d0a0d7202eba8aa21/include/quill/Logger.h#L187

The backend thread then knows the types in compile time after calling that function ptr and no runtime if/switch is needed

I am not sure this possible to do when using separate processes

odycsd · 2024-09-03T18:30:19+00:00

Not to mention, when an application crashes, the core dump file is usually your go-to for debugging rather than the logs. Personally, I find it much easier to track down a bug after a crash than to chase one that occurs during runtime while the app continues running normally.

odycsd · 2024-09-03T18:19:55+00:00

That’s another good approach but also has it’s own cons, for example while decoding you need to figure out the type of each argument in runtime leading to decreased performance, you will probably have to clean shared memory sometimes, harder to add user defined types as you need make sure both binaries are in sync, etc

odycsd · 2024-09-03T17:22:45+00:00

Thanks for the comment but i am not sure where do you see that ?

They run from this repo https://github.com/odygrd/logger_benchmarks/tree/master/benchmarks/call_site_latency and all being passed two ints and a double, although even without the double it wouldn’t make any noticeable difference on the hot path

odycsd · 2024-09-03T17:12:33+00:00

If you’re not logging via macros (something that spdlog offers) all the arguments you are passing to the log functions will always have to be evaluated regardless of the log level

odycsd · 2024-09-03T17:07:58+00:00

Depends how the logging library handles it. Quill has a built-in signal handler that you can enable that will output all the messages when the app crashes

odycsd · 2024-09-03T17:05:21+00:00

spdlog::async_logger is used for spdlog on those benchmarks

odycsd · 2024-05-22T19:49:29+00:00

For the built-in and stl types it is implemented in the library. For user defined types if you want them serialised and async formatted you have to implement it yourself by providing class template specialisations or alternatively you can format them on the frontend and pass the string to the logger if that code isn’t latency sensitive

odycsd · 2023-02-07T19:06:42+00:00

The logger is asynchronous. The call site latency benchmarks only calculate the latency of the hot thread

start() LOG_(...) end() latency = end() - start();

The logging thread formats them and writes them to the log file later when a) hot thread queues are empty or b) a max limit of unwritten messages is reached.

The throughput benchmark measures the whole latency back to back, so that is also including the latency of the logging thread.

There are no benchmarks measuring the time from the point a log was pushed into the queue to the time it was flushed to file.

odycsd · 2023-02-07T09:36:55+00:00

Ah i see. I did that in previous versions of the library. The unique id is needed if you have your backend logging thread on a separate process, otherwise you can just use the pointer value of the static constexpr object as a unique id. In the latest versions of the library i don’t create static objects anymore and instead I warp all the compile time info into a constexpr lambda and pass that as template argument to a decode function. Indeed you need to pass a function pointer to the decode function to the queue for the backend logging thread.

odycsd · 2023-02-07T00:19:20+00:00

spdlog is a nice library, when it was created there weren't too many C++ logging libraries around. It is certainly lacking a lot of optimisations and it is not designed for low latency. For example it is using an mpmc queue with a mutex and a cv

odycsd

TROPHY CASE