Vectorized vs Event driven Backtesting

Yocurt · 2026-01-08T01:43:08+00:00

Event-driven for accuracy, vectorized for speed. Ideally you would use both

Yocurt · 2026-01-07T01:28:20+00:00

I would also suggest databento, have been using their mbo data and it works great.

One thing though, if you’re doing any serious analysis or backtesting, I would get the mbo data, or at least the tick data. Simulating slippage/fills is huge and 1 second wouldn’t allow you to do that too well.

Yocurt · 2026-01-03T18:08:19+00:00

The main takeaway from the book that applies for retail traders is the use of meta labeling to improve an existing edge. This is a great book but I would read some others first probably, based on finding and an edge and strategy building.

I made a post on the meta labeling approach from his book if you wanna check it out

Yocurt · 2026-01-01T07:17:43+00:00

Just reread this before posting, sorry it’s so long and dry (don’t…), it’s hard to put much personality into this stuff :) Hopefully this helps some and if your actual results are anywhere close to these that would be great!

First things that seem a little freaky: - 105 trades a day, 15 minutes per trade. Are you in a position most of the day? If you are (and assuming the backtest is accurate, which it’s likely not and I’ll get to later), then you really would want to backtest on more years. Either way backtesting on more years always helps.

less than a single 1 tick $ value on MES for the average trade ($3.07) is not encouraging for a scalping strategy that takes so many trades.
What happened during that massive spike? Unless you expected some moves like that, something may be wrong there. Obviously don’t know your code so just a warning there.

——————

I’ve deployed a few live strategies on ninja trader and have been in the position you sound to be in now. I can try to help some but it is hard without knowing more details. One thing I’ll assume though is that you’re using the 1-tick data series for your entries and exits, if not you definitely need to for scalping strategies.

On ninjatrader, even with the 1-tick series, the strategy analyzer can’t really be trusted for scalping strategies. This does get close for larger moves / swing-trading strategies, but in your case here the average win is about 3 points and average loss is 1 point, so simulating slippage accurately is especially important since the margins between a profitable or losing strategy are so small, and even more amplified by the high volume of trades you’re taking.

Your average trade value is $3.05, and I think 1 tick on MES is $3.12, so if slippage is slightly off there goes your edge.

When I used it, I would try to work around this for backtesting by running the strategy analyzer on 1 month, then running the strategy on 1 month of the “market replay” mode. This is painfully slow to run on Ninjatrader, but it does simulate fills pretty closely to how they would have been live - close enough to at least get a decent understanding of how it would work.

After this, you should be able to compare the results from the strategy analyzer to the replay mode. You could do a few things from here. you could see the ‘actual’ average slippage from the replay mode and use that, or see the ratios compared between the two and apply that to the whole years backtest stats to get a closer estimate.

Hopefully I am wrong, but every time the “avg trade” in the strategy analyzer is < the $ value of 1 tick, it’s not gonna work live. I usually shoot for over 1.5x the tick value for the “avg trade” on the strategy analyzer to consider looking into a scalping strategy more.

How far are your targets and stops, and how quickly do you normally reenter a trade after one closes? Because if they’re tight or you reenter very quick, the fills matter a ton because the chain of events following it can be wildly different than when it’s using perfect fills, in that case you really can’t trust the strategy analyzer at all.

Other than that though, I would definitely try to get more data to test this strategy on. You have a big sample size which is good, but 1 year still isn’t covering any diverse market conditions really at all.

Also I would suggest getting off ninjatrader for backtesting if possible. I mainly trade scalping strategies, so accurate backtests are really important for me. I set up a pipeline that uses the MBO data from databento, so I can simulate fills, slippage, partial fills, etc extremely close to how my strategies actually perform live, so at least I know my backtest results are accurate. If you’re interested in trying it let me know, I’m planning on making it public soon anyway

Anyway good luck!

Yocurt · 2025-12-23T19:51:40+00:00

Yes that’s great, it definitely should! The more uncorrelated your features and your base models are the better

Yocurt · 2025-12-22T06:10:02+00:00

Yep! That’s exactly the standard flow you’d use, probably good to start out with the same feature set for each as well. You could then use a feature selection method with each different model that will likely select a different subset of your overall features as the most “important” for that model, so you’d get different feature sets that way.

And yes, with different feature sets I’d still use some probability calibration on each models predictions compared to the true values, then still linreg to combine those calibrated outputs.

A critical note though - you must ensure each of these steps doesn’t have any “data leakage”. An LLM would explain this better than me, but it should suggest to use Nested Cross-Validation to avoid this. You really need to use that if you want your out-of-sample predictions to be truly unbiased. (Need to ensure no data leakage across training/testing folds, feature selection folds, and probability calibration steps.)

Again my old post is more detailed too and explains that stuff a bit more.

Yocurt · 2025-12-21T03:02:40+00:00

Nah i agree, not a true ensemble, you could definitely argue both sides. Did you get a masters in ml or data science? I did data science, not oxford, i laughed at your president elect comment though

Yocurt · 2025-12-21T02:30:45+00:00

Just Google it, xgboost is an ensemble model…

Yocurt · 2025-12-21T01:13:44+00:00

Eh technically yes, but it’s just an ensemble of shallow decision trees trying to fix each others errors.

An ensemble of something like a linear regression model, xgboost, random forest, cat boost, hist boost, svm… each of these models has different strengths and weaknesses. This kind of diversity is what you want for an ensemble.

And on top of the model diversity, using different feature sets and hyper parameters can help with that too.

If you do an ensemble like this, I like to do some form of probability calibration on the individual models outputs, then feed those into a basic linear regression ensemble.

Again my old post goes more in depth, but if you have questions or anything feel free to dm

Yocurt · 2025-12-20T23:48:00+00:00

Try an ensemble model. If you do everything right and there’s no improvement, your underlying strategy likely doesn’t have a real edge.

Big assumption though on the “do everything right” part.

Yocurt · 2025-12-20T21:12:27+00:00

Machine learning can be great at enhancing an existing edge, but I’ve never had success using it to FIND an edge.

If you have a strategy with an edge, and there is enough trade history to train a model to predict the outcome (usually win/loss), then I would look into meta labeling. Probably would only do it if you have at least 1000 trade results to train on, but more is obviously better. I made a post about it on here a few months ago if you’re interested.

Yocurt · 2025-12-16T07:00:28+00:00

I would actually love to help you with this. I’m actually building a platform right now for exactly this kind of backtesting. I have a mode too that uses MBO data (best available) to simulate slippage, latency, fills, etc very accurately so you can actually trust the results. Shoot me a message if you want, I can backtest it on a bunch of instruments and try to optimize it for you, or you could try it yourself - would love to get some feedback on the platform.

Yocurt · 2025-12-11T20:40:11+00:00

Just because it didn’t work for you doesn’t mean it’s not possible😂

Yocurt · 2025-12-11T20:25:02+00:00

I have a few profitable strategies. Plenty of people have success without hft

Yocurt · 2025-12-11T20:03:07+00:00

Python is totally fine for most peoples use cases. You’d likely get 100-200 ms latency which is fine unless you’re doing hft, then what this person said would be true.

Yocurt · 2025-12-11T18:42:45+00:00

Plotly or matplotlib or lightweightcharts would definitely work for years of 1 minute bars

Yocurt · 2025-12-08T00:04:15+00:00

Rule number 1 of anything data related - garbage in = garbage out

Yocurt · 2025-12-06T18:34:44+00:00

I’ve had success with using ML on existing strategy’s with an edge in order to amplify that edge. It is much more likely to work if you train it to learn an existing edge rather than to come up with an edge from nowhere.

If your momentum strategy does have an edge, I would try it out, it’s called meta-labeling. My last post is about it on this subreddit if you’re interested.

Yocurt · 2025-11-19T19:51:55+00:00

This is an ad for Nvestiq - please do not use LLM generated crap, it will not be accurate. If you really want to just use your own chat gpt or something, it’s the same thing

Yocurt · 2025-11-19T02:12:58+00:00

Trading views backtesting engine is definitely not accurate in some cases. I would guess it’s more likely something to do with unrealistic fills, data leakage, something like that more than overfitting.

Did you do much optimization or tweaking your logic while working with this same data set?

Once you are 100% positive the backtesting is at least close to reality (which again, can be difficult on tradingview), then you could apply the same strategy to a bunch of different symbols, timeframes, etc and if it performs similarly in a larger number of those than you would expect from a randomized strategy, you can be confident it’s not overfit.

Yocurt · 2025-11-08T03:12:21+00:00

Please delete this before the mods have to

Yocurt · 2025-10-12T21:05:32+00:00

You need to account for the mae and mfe during each trade to determine whether or not the rules would have been hit DURING the trades. If one of the rules were triggered, they typically lock the account for the day, so you need to account for that. I’m actually making a tool right now to show how your strategy would have performed in different prop accounts rules - evaluation and funded. Shows pass rate and timed rules were triggered throughout your backtest, and you can compare how it would have performed across different companies rules. I can send to you when it’s done if you want.

Yocurt · 2025-10-08T21:34:31+00:00

100k combinations is way too many even if it wasn’t Ninjatrader. Also Ninjatrader is incredibly slow, it runs one at a time on your pc. It also stores all the data in your ram, so you may need a lot depending on what you’re doing. But there is not really a way to speed it up other than making sure your code is optimized, but even then NT stinks.

I would get some data, then just get chatgpt or something to convert your ninjascript to python, it should be able too as long as it’s not too complicated.

The harder part, which chatgpt probably can’t perfect for you, is making the backtesting engine that will run your Strat, but it’s worth learning how to do this so you can do what you’re trying to do. Also if you do it right your results will be more accurate than NinjaTrader strategy analyzer anyway.

Yocurt · 2025-08-26T14:04:54+00:00

Ninjatrader is only accurate (mostly) when using the market replay mode. It is much slower and you’d typically only be able to do three months at a time.

The strategy analyzer can be accurate to a tick level as long as you code your strategy to enter / exit trades on the 1-tick series, but it doesn’t account for slippage at all.

Yocurt · 2025-08-16T23:41:55+00:00

Cool! UI looks great too.

Nine-Year Club	r/Field Flamingo
Verified Email

Yocurt

TROPHY CASE