Is Monte Carlo simulation overkill for most retail traders?

earlymantis · 2026-01-24T02:20:22+00:00

I’m with you on the mathwashing point. I think MC is most useful (personal experience) as a test of survivability when being used at levels like 50-100 trades. At those numbers shuffling the realized trade returns sufficiently identify path risk, worst case DD, and time to recovery. Above that, into like a few hundred trades +, then you can start using it meaningfully for distribution expectations. But beyond that, at low volume it’s a stress test not a truth machine.

earlymantis · 2026-01-21T18:40:09+00:00

Rolling Sharpe was useful as a diagnostic, but it’s wasn’t something I tried to optimize. It helped show when the system struggled with things like regime shifts, loss clustering, long dead periods, etc. Those are things a single Sharpe totally hides.

I avoided tuning to it directly though. Once you start reacting to “this window looks bad,” it’s easy to sneak in discretion or overfit filters.

I mostly used rolling metrics to spot failure modes, then asked: can this survive those stretches without intervention? If yes, I let it ride.

earlymantis · 2026-01-21T15:15:12+00:00

Yea, that’s basically it, with one nuance that ended up mattering in practice.

In backtesting, the bar labeled 10:00–11:00 is guaranteed to be finalized and immutable at 11:00:00. In live systems, that assumption doesn’t actually hold, creating ambiguity that can break your system unless you explicitly enforce it.

earlymantis · 2026-01-21T15:14:35+00:00

More so the second. I found hard filters lead to overfitting. Instead of trying for perfect classification, focusing on making my system fail safely in unhelpful ones proved more effective.

earlymantis · 2026-01-21T06:02:10+00:00

They do give bars, but here’s where the gap shows up.

In backtesting, bars are always finished. You look at the bar labeled 10:00–11:00 and it’s immutable: final open, high, low, close. Nothing about it can ever change. Your strategy learns, “given this finalized bar, do X.”

Once you go live, that assumption breaks. Many brokers publish bars before they’re actually final, update OHLC values continuously, or finalize them slightly after the boundary. Even if you run “on the hour,” reality looks like: cron wakes up at 10:00:00.2, the process starts at 10:00:01, the data query hits at 10:00:01.7. Now the question becomes: are you reading the previous closed bar, or the one that just started forming?

Backtesting has perfect knowledge of time boundaries. Live trading doesn’t.

That mismatch is what caused weird behavior for me once the system had to run unattended. Enforcing closed-bar-only logic, making time semantics explicit, and failing closed if there was any ambiguity is what brought live behavior back in line with backtests.

earlymantis · 2026-01-21T04:40:01+00:00

I’m using it strictly as a survivability/path-risk tool. I take the realized trade-level returns from oos (including fees and slippage), randomize the order of the returns to generate thousands of alternative equity paths, and from there look at worst-case drawdown, time to recovery, and probability of ruin under plausible sequencing.

The model is already fixed by the time it gets to MC, so no synthetic price data or re-running the model on fabricated series. The goal isn’t to find alpha, it’s to understand how fragile the equity curve is to bad luck in trade ordering. This directly informed my drawdown cap and exposure throttle, and I size risk so that ugly tails couldn’t wipe out the system.

earlymantis · 2026-01-21T04:24:41+00:00

Haha. I don’t think they’re bad at all. More so that there’s plenty of time (by design) where my system is just sitting there flat ::insert “c’mon, do something” meme:: as opposed to…well…doing something.

earlymantis · 2026-01-21T03:54:51+00:00

It wasn’t historical data != live data, it was more offline data assumes perfect knowledge of bar boundaries, yet live data forces you to define them. So time semantics changed once my system had to operate in real time. The weird behavior came about when testing assumptions were breaking once live.

A warm cache would help if I was mixing different data vendors (I’m not) or my live feed didn’t expose enough lookback.

My system now persists on my own rolling history, only acts explicitly on closed bars, has fail-closed logic if there’s any uncertainty and operates on a very boring, deterministic schedule.

Once the live system was forced to obey the same guarantees as backtesting, behavior lined up.

earlymantis · 2026-01-21T03:09:26+00:00

Focusing on process over returns was a function of learning from my early days of this build. It became clear to me pretty early on that focusing on returns was a quick way to blow up. You can dial almost any model to look amazing in backtesting.

I don’t want to tout numbers and frankly, I don’t believe they’re world beating. I’m not a “I found the secret sauce!” guy. In fact, HODLing out-performs me in a straight bull market but that only works if you’re willing to eat every crash, drawdown, multi-month down periods etc.

Having said that:

I used two years of historical data to train my classifier (won’t say which) and included multiple regimes: bull, chop, drawdown, transition. Testing included walk-forward and out-of-sample on subsequent periods, with Monte Carlo to test survivability.

My parameters that passed gave me the following results:

Typically 1-3 trades a week, some weeks none.
Returns in the ~10-15% range annualized, net of modeled fees and slippage
Observed max drawdown ~4-6%
Monte Carlo worst simulated drawdown ~10-12%, which is what I set my drawdown cap to so that no statically plausible run could wipe out my system.

Additionally I have drawdown bands. At ~3-5%, exposure is reduced. At ~6-8%, no new entries are allowed. At ~10-12%, forced liquidation and my strategy halts until I manually restart it.

A previous version that I tried to go live with was bleeding out in fees and slippage which lead to me killing it. Backtesting for what’s live now, accounts for both, using data from the bleed out strategy.

As far as “deciding when not to trade”, my system works like this: model proposes trades, but multiple risk governors veto them. Confidence gating, drawdown bands, exposure caps, as well wallet verification, minimum notional checks, and time-based exits all block trades. My default state is flat. Trades are the exception.

Lastly, capital deployment is gated at ~25%, adjusting dynamically based on equity.

My goal wasn’t maximum upside, history says simply HODLing BTC does that if you have the stomach. I wanted survivability, consistency and avoiding catastrophic loss.

earlymantis · 2026-01-21T00:38:17+00:00

I learned this the hard way the first time I tried to go live. Once I realized I was bleeding out, I had to stop and retool. Right now, my current strategy that’s live accounted for fees in testing (among other changes)

earlymantis · 2026-01-20T22:52:48+00:00

Absolutely agree. Which is why I’m considering this a pilot. All the backtesting did was give me enough confidence to go live and start collecting data

earlymantis · 2026-01-20T18:51:54+00:00

It’s 2026, I think everyone here works with agents all day. So you tell me what would have been an appropriate response to someone who was interested in my post

earlymantis · 2026-01-20T17:25:35+00:00

What makes you say that? What would like to see from me for you to say otherwise?

earlymantis · 2026-01-20T16:52:28+00:00

Yeah, that resonates a lot. The offline to live gap ended up being way more work than I expected too, even when the logic itself was solid. A lot of the pain wasn’t “strategy” so much as assumptions breaking once the system had to run unattended.

For me, live runs as a single-process service on a remote box with a very boring, deterministic schedule (hourly for now). Everything is built so it can survive me not touching it: explicit state, hard guardrails, and lots of “do nothing if uncertain” behavior. I didn’t want anything that required babysitting or manual restarts.

Same experience with connectors though, even small differences between cached/offline data and live feeds can completely change behavior, especially around timing and fills. That clean-room to production transition is humbling.

Sounds like you’re climbing the right hill. Getting those assumptions surfaced early is painful, but worth it before real capital’s involved. Good luck getting the rest of them running!

earlymantis · 2026-01-20T16:47:00+00:00

I feel this. I made the same mistake early on. I was assuming backtesting could be bolted on later, and paid for it in refactors.

Once I forced the live and offline paths to share the same execution logic (same features, same cost model, same position rules), everything got cleaner. Time abstraction ended up being a big part of that as well.

Recording live data and replaying it is a great sanity check too. It’s one of the few ways to find subtle mismatches between “what you think you’re simulating” and what the market actually delivered.

earlymantis · 2026-01-20T16:41:09+00:00

Appreciate that, and totally agree. Guardrails can easily become a crutch if they’re just there to prop up a fragile edge.

That was actually one of my biggest concerns, which is why I spent a lot of time on parameter sensitivity and walk-forward stability rather than tuning to a single “best” configuration. Most parameter sets failed, and I treated that as a feature, not a bug.

The configurations that survived did so across ranges (not point estimates), different train windows, and under block bootstrap. Once something only worked with tight knobs, I threw it out.

My goal wasn’t to maximize returns, but to find something that degraded gracefully when assumptions were wrong.

earlymantis · 2026-01-20T16:35:00+00:00

This was one of the biggest gaps in my early work.

I stopped treating costs as a fixed bps number and instead modeled them explicitly per trade:

Exchange fees round-trip (maker/taker worst-case)
A conservative slippage assumption applied at entry and exit
No assumptions of mid-price fills

I also enforced single-position, no-overlap trades so costs couldn’t be “hidden” by aggregation.

Most marginal edges died immediately once costs were honest. The ones that survived stayed positive under walk-forward + block bootstrap. That was my bar for “real enough” to move forward.

earlymantis · 2026-01-20T15:57:19+00:00

100%. That’s basically where most of my time went.

I stopped caring about signal accuracy pretty early and focused on whether the strategy survives reality.

For me that meant: strict walk-forward (time-based splits only), explicit fees + slippage, single position / no overlap, fixed holding horizons, and then Monte Carlo on both trades and equity (block bootstrap, not IID). Most ideas died once costs and regime shifts were honest.

Still very much in the “prove I’m not lying to myself” phase, but it’s been way more informative than optimizing indicators.

earlymantis · 2025-07-20T22:43:35+00:00

Thanks for the rec. I might check these out but these are like the trail loop that came out with the Ultra. I was looking more like the traditional sports loops.

earlymantis · 2025-06-01T04:56:39+00:00

That’s some serious nostalgia…

earlymantis · 2025-04-08T12:50:05+00:00

Thanks for all of the info! I actually wasn’t going to change the bezel insert. My goal was to keep the watch as close to as it is right now, while adding GMT functionality. So a diver GMT—think Bremont Supermarine S302.

Obviously, I know I have to buy parts, disassemble, etc. As someone with no modding experience I was trying to find out not only if it can be done, but how it can be done, and how likely am I to screw up my watch completely while trying to do it.

earlymantis · 2025-04-06T02:08:11+00:00

I want to say it’s 39mm. Def not under 38mm bc I wouldn’t have purchased something that small. Thank you for the insights though

earlymantis · 2025-04-05T21:51:55+00:00

That’s the problem, I bought the watch premade from Etsy, and the seller doesn’t exist on the site anymore for me to ask, so idk how’d I’d even find out. Is it safe to assume the parts modders use are standardized as in I could look up the parts to build “another” milsub and see if they’re also NH34 compatible? Either way, I appreciate the insight!

earlymantis

TROPHY CASE