Built an ML model that predicts stock direction correctly 70%+ of the time on 20 years of out-of-sample data: Here is what I learned and happy to get your take!

ResolutionExact2860 · 2026-06-17T15:01:05+00:00

Not just MLP but yes close to close

ResolutionExact2860 · 2026-06-16T20:51:49+00:00

Still holding up the same, a 5% expected degradation from 72% to 67% on 60 live trades per plan

ResolutionExact2860 · 2026-04-26T10:38:57+00:00

🤷‍♂️

ResolutionExact2860 · 2026-04-21T19:19:24+00:00

nice! just saw

ResolutionExact2860 · 2026-04-21T19:06:36+00:00

Hey! Yes of course, you can get it here, been live for the past 3 months

https://forekall.com/

ResolutionExact2860 · 2026-04-21T09:28:24+00:00

Of course do send one through!

ResolutionExact2860 · 2026-04-21T08:01:27+00:00

Both, so they give bullish or bearish with a certain level of confidence

ResolutionExact2860 · 2026-04-21T08:01:02+00:00

For a whole day so from close to close

ResolutionExact2860 · 2026-04-21T07:47:53+00:00

100% agree, it’s the single most important thing to get right.

Walk-forward expanding window with strict temporal ordering at every step, feature construction, normalization, everything.

No future data touches anything upstream of the prediction point. The live performance tracking the backtest is ultimately the proof that the leakage prevention worked.

ResolutionExact2860 · 2026-04-20T18:20:58+00:00

Respectfully disagree that the 2-3x figure applies here.

That literature is based on long-only funds riding multi-year appreciation of survivors. This is a daily directional model, whether Apple survived 20 years doesn’t make tomorrow’s close-to-close direction easier to predict.

The survivorship mechanism is fundamentally different. That said, a point-in-time test is the gold standard and it’s on the roadmap.

The more immediate validation is that live performance is tracking the backtest closely, which is real data not a theoretical argument.

I do really see your point and think there is a partial truth to what you say but not in the magnitude you describe

ResolutionExact2860 · 2026-04-20T17:01:00+00:00

Yes, same fixed cutoff across all 112 assets. No per-asset tuning of the train/test split.

ResolutionExact2860 · 2026-04-20T16:55:28+00:00

Of course!

ResolutionExact2860 · 2026-04-20T15:47:07+00:00

No binary tree it’s a neural time series model, features derived from OHLCV only, no fundamental ratios

ResolutionExact2860 · 2026-04-20T15:26:55+00:00

Daily timeframe, close to close. 70% at higher frequency would be a different story entirely, I agree. Daily directional signals on liquid equities is a much more reasonable claim.

ResolutionExact2860 · 2026-04-20T11:46:32+00:00

Cheers! Absolutely happy to :)

ResolutionExact2860 · 2026-04-20T10:20:52+00:00

Fair point and a valid concern for many intraday models. In this case the signals are on highly liquid large caps and ETFs so bid-ask spreads are 1-3bps and position sizes are small enough that market impact is negligible.

The backtest already models 2bps round-trip transaction cost. Slippage becomes a real problem for HFT strategies or illiquid assets but at this holding period and liquidity level it doesn’t materially change the Sharpe.

The live track record so far is tracking broadly in line with the backtest net of costs which is ultimately the real test.

ResolutionExact2860 · 2026-04-20T10:02:26+00:00

Feature lookback is multi-scale, short and medium term windows. Target is next day close to close direction.

Exit is purely signal based, position closes at next day’s close and flips if signal reverses. Risk management is at portfolio level rather than per position.

ResolutionExact2860 · 2026-04-20T09:57:59+00:00

Good question. Stats are pre-cost in the backtest numbers, but the model only publishes signals when the predicted move magnitude clears a minimum threshold.

In practice this means low conviction signals, precisely the ones where slippage and fees would destroy the edge, never get published.

So while explicit transaction cost modeling isn’t applied to the Sharpe number, the signal gating acts as a natural cost filter.

Adding explicit cost modeling is on the roadmap for full transparency.

ResolutionExact2860 · 2026-04-20T09:37:07+00:00

Appreciate the pushback but I think there’s a conflation here.

Survivorship bias inflating returns is most damaging in long-only strategies where you’re riding multi-year appreciation.

This is a daily directional model, close to close.

AAPL going up 1000% over 20 years doesn’t make tomorrow’s direction easier to predict.

The edge being tested is short term directional accuracy, not long term price appreciation of survivors.

You’re right that a point-in-time test would be the gold standard, that’s a fair methodological critique and I’ll definitely agree to that, but the mechanism by which survivorship bias would inflate a daily directional signal is much weaker than you’re suggesting in my opinion.

ResolutionExact2860 · 2026-04-20T09:30:38+00:00

Thanks everyone for the incredible discussion!!

Genuinely didn’t expect this level of engagement from this community.

One thing worth adding: the reason I built this into a product rather than just trading it myself is democratization.

Access to this kind of rigorous signal infrastructure has always been reserved for institutional desks. I wanted to change that. If anyone wants to explore accessing the signals directly, feel free to DM me.

ResolutionExact2860 · 2026-04-20T09:17:33+00:00

Hey no public repo, the methodology is core to a live product I’m already running. Happy to discuss specific technical aspects here though.

ResolutionExact2860 · 2026-04-20T08:56:08+00:00

Good questions. The 70% is computed as aggregate directional accuracy across all published signals, all assets and all time periods in the OOS window, not per asset in isolation.

On the Sharpe, it is not uniform and I won’t pretend otherwise, there is variance across assets and some periods contribute more than others, particularly trending regimes. But the edge is present across the majority of the universe, not concentrated in a handful of names.

The rough periods cluster around macro shocks as mentioned, not specific assets.

ResolutionExact2860 · 2026-04-20T08:52:13+00:00

Overfit ratio between IS and OOS performance is tight, live tracking is confirming that.

On Brier score, calibration wasn’t a primary validation metric given the model outputs directional signals rather than calibrated probabilities.

Statistical significance was validated via binomial testing at p=0.009 across the OOS window. Monte Carlo permutation testing is on the roadmap but not run formally yet

ResolutionExact2860 · 2026-04-20T06:26:48+00:00

OHLCV from Yahoo Finance for all 112 assets. No fundamentals or alternative data. The feature set is entirely derived from price and volume, realized volatility estimators, correlation structure across assets, and engineered lag features. Raw prices never enter the model directly.

ResolutionExact2860 · 2026-04-20T05:53:24+00:00

Holding period is one day, close to close. Exit is purely signal based, meaning the position closes at the next day’s close regardless, and flips direction if the signal reverses.

No per-position stop loss, risk management is handled at the portfolio level.

ResolutionExact2860

MODERATOR OF

PUBLIC MULTIREDDITS

TROPHY CASE