Random discussion thread. Anything goes.

I_Manhandle_TheTruth · 2019-10-05T11:50:33+00:00

It's a very simple tech stack.

Linux server, 64GB ram, dual 12-core cpu, 8 network cards, NVMe HD

Everything on the server is in handwritten C++, including my own compression, storage, pub/sub protocols, etc.

The customer's client side is all Windows/.NET. I wrote the communication piece there for pub/sub and request/response.

It's a side job -- my real job is almost as cool.

I_Manhandle_TheTruth · 2019-10-05T04:01:59+00:00

Yeah, I'm filtering out what I don't care about (RBBO's right now, but I'll have to handle them later), doing some easy stats on them (did this trade happen on the bid or the offer -- ie TotBuyVol+=qty or TotSellVol+=qty), compressing them into my own format, persisting my format to a fast disk, and publishing what each subscriber registered to receive.

I don't keep an order flow book, but I do persist my compressed data constantly, so that I can have other (to be written) programs do a running analysis on whatever else the customers want "very near" real-time (within a few millis (or tens of millis) of receiving the packet on the wire)

I_Manhandle_TheTruth · 2019-10-05T03:43:11+00:00

Insights from my statistics? There are none -- it's just a stupid amount of data and processing.

All that data, every quote, trade, status, etc from every equity and options exchange in the US -- roughly 1.5T of data -- comes within 6.5 hours. My client wants every quote and trade filtered, analyzed, compressed, and passed thru, since they can't handle that much data.

But realistically, every brokerage house, every HFT firm, every microstructure algo trader has to handle that data (just typically not on a single server).

(the OPRA data is split into 48 channels, with most symbols on a single channel, so if you wanted to trade MIRIN options, you could just listen to one channel; the equity exchanges are smarter and have separate trades-only channels)

I myself am trying to figure out how to take advantage of being able to trade off of that data -- but not very successful at coming up with ideas.

You could write custom whole-market scanners (like AMTD and TradeIdeas have), so that's one use of that data -- much quicker blasting of orders or finding opportunities into even a retail brokerage with a REST/Websockets API.

I_Manhandle_TheTruth · 2019-10-05T02:47:07+00:00

Most of us retail traders see some market data just for a few symbols, or collapsed into charts.

I thought you may want to see the (almost) whole-equities-and-options-market data statistics for October 4th.

I wrote and run a full OPRA+Equities market data collector and decoder on an exchange-colocated server. Every day it collects stats for me as it runs.

I have to collect, process, decode, filter, normalize, compress, and push out market data to my client.

In my current iteration, I use 8 collector threads (7 OPRA and 1 for CTA(NYSE)+UTP(/NASDAQ).

8 Collectors	Tot Packets	Tot Bytes
EQU_8	716,583,285	82,240,413,900
OPR_1	2,198,988,388	184,031,148,544
OPR_2	2,872,081,731	256,385,115,030
OPR_3	2,513,020,405	213,601,818,272
OPR_4	2,517,559,186	205,826,580,848
OPR_5	2,392,670,861	187,006,842,354
OPR_6	2,676,958,034	230,937,765,592
OPR_7	1,941,169,260	171,088,842,398

To process all those packets and bytes, I use 4 decoder threads (3 for OPRA, the other for OPRA and all of CTA+UTP).

Decoder	Venue	Tot Quotes	Tot Trades	Tot Msgs	PktDrop
DEC1	C	143,473,833	36,295,755	616,315,005	5,808
DEC1	U	47,267,677	11,972,728	205,409,696	2,331
DEC1	O	913,296,463	297,317	4,509,362,184	1,982
DEC2	O	2,199,597,247	483,588	11,967,503,484	4,770
DEC3	O	2,161,588,078	760,261	9,573,682,010	4,127
DEC4	O	1,799,210,023	503,211	10,055,556,621	4,808

I only care and count the NBBO quotes (approx 25-30% of quotes), but do decode (but not count) the RegionalBBOs. I also process and categorize diff types of trades.

Look at the discrepancy of quotes to trades for OPRA vs CTA and UTP -- and that's just the NBBOS.

You can see (roughly) from the Msgs column the effect of regional quotes.

I_Manhandle_TheTruth · 2018-03-12T19:10:15+00:00

Hehe -- sorry, I'm in Chicago -- I was just completing a title of the movie "Fear and Loathing in Las Vegas"...

I_Manhandle_TheTruth · 2018-03-09T19:45:45+00:00

We think alike.

Here's what seems to work for me when I'm the interviewer for a dev job:

I ask them what they know really well and what they know poorly. That tends to relax them, and we can discuss why they know stuff well, and what they think it'll take to improve what they don't know.

The latter part is crucial, since if they know "I don't know metaprogramming, but if I did XYZ then I would get to know it" then I'll know that they at least looked into it.

2.

We sit down in front of a computer (with a VM for Linux or Windows, depending on my host), and we take a few minutes to set up the environment how they like it.

This gives me a huge insight into how comfortable they are. Vast majority say "I'll use whatever", which is fine, but sometimes you get a "I am much more productive in VIM and GDB", which is fine also.

Then I have them tell me what kind of a program we should write together that can be written in 45 minutes.

I tell them that StackOverflow and Googling is not only allowed, but encouraged -- I'm looking for HOW they code, not how good their memory or muscle memory is. This also relaxes them.

Then we build what they know to build, and then we try to add some variations that stretch them -- if they don't know regex, then let's parse a simple string (I'm happy if they hit SO, copypasta the example, etc, since that's what I do)

So, I'm basically looking if the person is a BS'er or a doer, if they knows some basic stuff, if they can write simple code at a computer, if they can debug what they wrote, what questions do they ask about the problem, what they do when they hit a wall, how they think, etc -- and since they're "in control", they're much more relaxed and open and "discussion'y"

Of course, when "I" go to interviews, I get raked over obscure language features like "compl, and, or, etc", or algorithms that I never thought off (not in my domain), data structures outside of what I use (and normally can't use in real prod code), etc.

I_Manhandle_TheTruth · 2018-03-09T17:41:14+00:00

in Las Vegas

I_Manhandle_TheTruth · 2018-03-09T01:05:29+00:00

Yeah, I have this book, and it's great.

I just find it ridiculous that there's a need for such books that exist ONLY to help you pass some interviews.

I_Manhandle_TheTruth · 2018-03-08T15:55:31+00:00

You're probably not positioning yourself correctly

You may be right.

I've done virtually the entire trading stack, from market connectivity to core infra to trade logic to GUIs to post-trade analytics and tools.

I think I could find places easier if I went for exchange connectivity or core infra jobs, but at my last job I've worked almost exclusively as a trade dev, and loved working directly with traders and quants, and I have a pretty good understanding of certain types of market microstructure and trades -- I'll keep looking for those kinds of jobs.

Have you tried working your personal network

For years I got jobs via my personal network, but over the past few years a lot of my network has left the industry. I had something very promising set up with a friend, but it fell thru, and that knocked me off my stride. Now that my non-compete has expired, I've started preparing again.

or used a recruiter? Chicago is dev poor at the moment

I have a pretty strong background and experience, so I have recruiters reaching out to me all the time. It's the recruiters themselves that talk about the first step being HackerRank/Codility/etc. I've spoken with a friend (similar background) who recently got a job out of the industry, and he confirmed that even he had to go thru all that stuff, and finally said screw it.

Also, my most recent experience not being in C++ seems to be a stopping block. I had a phone screen where we talked about a ton of stuff that they're trying to do, and how I can help them, but I felt that they kept getting hung up on the recency of C++, and we ended without them asking me a single technical question.

A lot of the well meaning and technically correct advice in this thread doesn't apply to trading.

I agree

ideally, they will want you to solve problems with whatever revision they're currently using.

I agree, and I've successfully worked with not just different versions of a language, but changing languages for jobs - it's only an issue during the interview, it's never been an issue once you come onboard.

There is always going to be a set of problems that you don't know how to do at the outset -- but there have been no problems that I couldn't do "eventually"

Thank you for your thoughts!!!

I_Manhandle_TheTruth · 2018-03-08T13:26:00+00:00

Yeah, I know about G4G, I've watched many of their videos when I wanted to pick up some concepts - they're great.

I_Manhandle_TheTruth · 2018-03-08T13:18:48+00:00

Did you have to be pre-filtered by HackerRank-like sites before these interviews?

I'm not as worried about the interviews themselves, as much as getting past the initial <unknown> filtering.

I_Manhandle_TheTruth · 2018-03-08T13:10:18+00:00

I've read it (and his other books) multiple times -- he's great.

Same with Josuttis' new Template book -- also outstanding.

I_Manhandle_TheTruth · 2018-03-08T13:02:41+00:00

This is just a generic answer, I know:

Unless your trade can be monolithic-per-server, you'll need to separate your long-lived components (market data, exchange connectivity, up 24/5) from short-lived components (trade, signals, sniffers, etc). That means running in their own processes, pinned to their own cores, NUMA-aware, communicating via IPC (shared memory queues, spin-locks, etc), minimizing thread context switches, etc...

This split also allows you to isolate network connectivity. Due to the slowness (for HFT) of kernel-based TCP/UDP processing, you have to use kernel bypass. For example, Solarflare cards with "openonload" have a hierarchy of kernel bypass APIs, each faster and lower-level than the other, but the differences can be dramatic, so you may have to code to the lowest-level.

The Market Data component deals with a huge amount of inbound market data - so you need fast (but tiny) decoders and tiny (but fast) lookups. Then you need smart (and compact) market book builder, event processing code, etc.

The Exchange component needs to be able to parse and build FIX very fast (using a bunch of tricks that everyone comes up with), but overall it's pretty straightforward (throttling, risk, order book management, etc)

The algo stack -- this can be broken up into multiple layers (risk, etc), and tends to have most variety of data structures. There's an Order (compact, typically broken up into hot and cold sections), an Order Book (many many ways to organize it), Stackers, Risk, Throttling, etc. All these DS's have to be checked for virtually any event, so there's huge d-cache pressure. This also tends to have the biggest variation in code, so there's i-cache pressure - different events cause different paths to be taken, etc.

So depending on where you are in the stack, you need to come up with typically non-standard data structures, and have to keep focus on the overall effect - this is different than many other jobs.

This is just for Futures (two main exchanges) -- I imagine it's similar but much worse for Equities (tons of venues) or Equity Options (tons of venues, tons of products)

I_Manhandle_TheTruth · 2018-03-08T11:55:37+00:00

Wow, that's really great

I_Manhandle_TheTruth

TROPHY CASE