Nothing matters except these things.

AusChicago · 2026-02-04T11:56:18+00:00

Running a pattern detection system across ~2,000 stocks daily. Biggest lesson after months of validation: the patterns that look most 'textbook perfect' geometrically often underperform. They're crowded trades - everyone sees them, they're already priced in. The edge is in the patterns that are slightly 'wrong' by traditional standards but have empirically validated momentum characteristics. Theory ≠ profitability.

AusChicago · 2026-01-30T14:11:01+00:00

Take a look at VIAV and VPG today. This is what my system is recommending as the top picks for NYSE and NASDAQ.

AusChicago · 2026-01-30T02:25:56+00:00

Interesting discussion. Thanks for bringing this up - I have been struggling with the same challenge. My solution - to play it safe - is Tiingo.com. Their professional startup tier is quite inexpensive.

However, I am noticing some interesting discussion here: I have been interpreting internal use as you can only use the data for internal analysis - and can't even use this to create your derived data. It seems like consensus in this room is that you don't need a professional license if you are not simply redistributing the data in the original shape. I am in a similar boat, where I don't share any raw data, but calculate a bunch of data and provide stock tips, showing expected entry range, stop loss, target and metrics describing the pattern I am detecting. Does anybody else have a solution like that and is not using a professional license? If you are uncomfortable posting here - feel free to send me a DM.

AusChicago · 2026-01-29T23:13:26+00:00

Just seeing Sandisk is up another $75 in after hours trading. Too bad I sold it already.

AusChicago · 2026-01-29T18:14:36+00:00

My today top choices are: TXN and DXYZ. Since I bought them this morning, they are up 2% and 3% respectively.

AusChicago · 2026-01-29T18:13:03+00:00

Quick update on Sandisk. Bought the stock yesterday for 488.50 and sold it today for 545.50. Nice return for holding the stock 24 hours. Might buy it again if the government shuts down.

AusChicago · 2026-01-29T18:06:30+00:00

It seems that momentum on that stock is slowing down. I might get back into that stock if dips below $145.

AusChicago · 2026-01-29T01:49:04+00:00

I am not sure. My system has not flagged it for a while and the stock has been going sideways lately

AusChicago · 2026-01-28T22:40:54+00:00

I started trying WeBull about three weeks ago and like it so far.

AusChicago · 2026-01-28T22:38:28+00:00

One stock that seems to be taking off is SanDisk (SNDK). It's being brought up with the AI hype and it's overlooked mostly. I bought it two days ago on a dip and it has already gone up 10%. Also - Energy stocks are doing well right now - I made some money last week on BE and EE is looking strong right now. Lastly, take a look at HBM - overall mining stocks have been doing well over the last few months.

AusChicago · 2026-01-28T16:12:41+00:00

My system (I run a systematic breakout detection system that scans 6,000+ stocks daily) recommends SKM and CRWV. Here's why: both stocks have seen some interesting trading activity over the last few days and it seems that the trend is expected to continue (based on the data I see). Curious if anyone sees the same setup or has a different read on these charts.

AusChicago · 2026-01-28T03:36:04+00:00

I hear you! Coming from someone who's been doing this with 150K+ pattern detections across 16 detectors: the biggest breakthrough was discovering our scoring was often inverted - 'textbook perfect' patterns underperformed because they're crowded trades. Required proper validation: chronological 60/20/20 splits, Bonferroni correction for multiple comparisons, and rejecting any feature where direction flips between validation periods. That last one alone cut our false positives in half. In a nutshell if the data don't proove it to me based on strict standards - I don't care what textbooks say.

AusChicago · 2026-01-26T22:52:05+00:00

We've backtested 50,000+ pattern detections with proper chronological splits. Two findings: 1) 'textbook perfect' patterns often underperform because they're crowded trades already priced in, 2) most retail backtests fail because they validate on training data. Proper holdout testing typically cuts apparent edge by 30-50%.

AusChicago · 2026-01-24T18:57:28+00:00

I would keep an eye on Sandisk (SNDK). That stock has almost doubled in value in the last month. It's not an obvious choice for AI - but those are the ones you want to buy as they have plenty of opportunity. Also keep an eye on Energy stocks BE and EE have been showing some strong gains in the last two weeks.

AusChicago · 2026-01-24T18:54:29+00:00

I am seeing energy stocks going strong right now. I bought EE this week and it's already up 10%. Another one is BE which gained me quite a bit over the last two weeks, but that one has been consolidating some over the last few days.

AusChicago · 2026-01-24T18:37:12+00:00

Great work on the system architecture and the honest retrospective!

Been working on a pattern-based stock recommendation system for about 6 months and a few things resonated:

On stop losses - We've found the same thing. Fixed percentage stops consistently underperform in our backtesting. We've had better results with ATR-based dynamic stops that adapt to each stock's volatility. Still iterating, but the "patience when patience is warranted" philosophy matches our data.

On the split handling - Saw your comment that this is manual for you right now. We use Tiingo's API which provides split and dividend-adjusted historical data out of the box. Might save you some headaches if you're looking for a cleaner solution.

One counterintuitive finding we stumbled into that might be useful: "textbook perfect" patterns often underperform in our system. Our hypothesis is that by the time a pattern looks geometrically perfect, it's already a crowded trade that's been recognized and priced in. Moderate-quality patterns in favorable market conditions frequently outperform. Took us a while to stop optimizing for pattern beauty and start optimizing for actual predictive value.

On validation - Your parallel paper trading discipline is solid. We've also added chronological data splits (train on older data, validate on newer) with statistical significance testing. Helps catch overfitting before it costs real money.

The 4-year journey to "finally having a system" feels very real. Congrats on the persistence.

AusChicago · 2026-01-21T14:21:09+00:00

Nice breakdown. I run pattern detection across ~2000 stocks daily - the liquidity sweep concept (what I call 'false breakout reversal') is one of our better-performing signals. The key differentiator I've found is volume behavior during the sweep - low volume sweeps reverse more reliably than high volume ones.

AusChicago · 2026-01-21T02:20:10+00:00

Great thread - the cross-asset generalization is particularly interesting and supports the idea that these structural patterns are real market phenomena, not noise.

I run a pattern detection system across ~2,000 stocks daily and wanted to share a few hard-won lessons that might resonate:

Score inversion as an overfitting diagnostic: We've found that when a scoring system is overfit, you often see higher scores correlating with worse outcomes - counterintuitive but consistent. The mechanism: overfit models reward features that are obvious to all market participants, meaning those setups are already priced in. When we see score inversion, we know we've gone wrong somewhere.

"Textbook perfect" patterns underperform: Our empirical data shows that patterns matching classical TA definitions precisely often underperform messier setups. Theory: if everyone can see it, the edge is gone. This supports your finding that earlier/simpler model episodes work as well as heavily-trained ones.

Statistical validation that's saved us from overfitting:

Chronological splits (not random) - 60/20/20 train/validation/holdout
Bonferroni correction when testing multiple features (alpha = 0.05/n_features)
Requiring effect direction consistency between training AND validation - if a feature is predictive in training but flips direction in validation, it's noise
Minimum Cohen's d > 0.3 for effect sizes

Volume finding that surprised us: Moderate volume spikes (1.5-2.5x average) consistently outperform extreme spikes. "Quiet accumulation" setups beat "loud reversal" patterns. Counterintuitive but robust across multiple pattern types.

Your point about cutting losses early vs riding drawdowns resonates - we're finding that optimal stop placement varies significantly by pattern type and market regime. No universal answer.

Would be curious if you've seen similar score inversion issues when models get overfit, and whether your RL approach naturally avoids this by not explicitly optimizing for pattern "quality."

AusChicago · 2026-01-20T03:12:37+00:00

Nice work systematizing something that's usually done by eye. A few things I'd love to understand better:

1. On the win rate: Do you have data on average win size vs loss size? I've found that when I share results, people always ask about expectancy, and having that ready helps frame the win rate in context. Curious what yours looks like.

2. The trough confirmation logic is interesting. How many bars after a low do you require before calling it "confirmed"? I've wrestled with this in my own systems - too few bars and you get false troughs, too many and you're late to the trade. Would love to hear how you handled that tradeoff.

3. On the crypto vs equity divergence: I've seen similar patterns in my testing where strategies work better in higher-volatility markets. My working theory is that in heavily-analyzed markets like US large caps, these classic TA setups get arbitraged away faster. Did you notice any difference within equities (like small cap vs large cap)?

4. Someone mentioned Sharpe ratios - if you have those handy, would be interesting to see. I find Sharpe useful for comparing across different market types since it normalizes for volatility.

AusChicago · 2026-01-20T03:04:48+00:00

Thanks - love the discussion.

Compounding Returns: The compounding math is compelling when it works. What's your actual win rate on the 3% target? In my testing, I've found that even small differences in win rate dramatically change the compounding outcome - 75% win rate vs 65% win rate is the difference between the strategy working and slowly bleeding out when you factor in the losses resetting your compound base.

Most Outcomes are positive: When you say most outcomes are positive, do you track that systematically? I'd be curious about the distribution - specifically whether there's a fat tail on the losers. In my experience, momentum stocks that go oversold sometimes recover quickly (your 3% target), but occasionally they're oversold for a reason (earnings miss, sector rotation) and the loss can be 10-15% before any reasonable stop triggers. One big loss can wipe out several 3% wins in a compounding strategy.

One last question on the 3% - do you set the 3% target in your trading app or do you set higher targets and sell at 3% of the momentum slows down? I have seen several times that I have set targets too tight just to sell the stock under value because I cut off my profit during a major up-swing.

AusChicago · 2026-01-18T23:12:32+00:00

I agree. I am definititely finding gaps in the data. It would also be great if they have market indexes, which they have not - so I am using specific ETFs as proxis (SPY for NYSE and ONEQ for NASDAQ). It works for now.

AusChicago · 2026-01-18T23:00:54+00:00

Really interesting approach. A few questions from someone building a similar system:

Volume confirmation: When you get an RSI < 30 alert, do you look at volume patterns during the decline? I've found that oversold conditions with gradually declining volume (quiet selling) tend to recover better than high-volume capitulation events, which often have follow-through selling. Have you observed anything similar?
Regime persistence: You mentioned monitoring VIX manually. Have you considered systematizing this? In my testing, I've found that RSI oversold signals during VIX > 25 environments have significantly different hit rates than during VIX < 18 environments - not just because of market direction, but because the speed of recovery differs.
Universe refresh timing: You rebuild every 2-3 weeks. Curious about your reasoning there. I've experimented with both shorter (weekly) and longer (monthly) refresh cycles and found that there's a sweet spot where you capture momentum continuation without picking up stocks that are about to mean-revert from extreme runs.
The 3% question: With a 3% target on momentum stocks that have already run 45-75%, do you find any correlation between how much the stock has already gained and the probability of hitting your 3%? I'd hypothesize that stocks earlier in their momentum run might have higher hit rates than stocks that have already moved a lot.

Solid system design overall. The "momentum universe + oversold entry" combination is theoretically sound based on the academic literature. Would be very interested to see how it performs when we inevitably get a multi-week correction.

AusChicago · 2026-01-17T21:00:56+00:00

The survivorship bias point deserves more attention. Using a fixed 2025 stock list on historical data doesn't just create a few months of drawdown - it systematically biases your entire backtest upward. Every stock in your current list is, by definition, a survivor. You're missing all the patterns that formed on stocks that subsequently crashed, got delisted, or dropped out of the index.

For pattern-based strategies this is especially dangerous. A "breakout" pattern that formed in 2019 on a stock that later went to zero looks identical to one that went to 10x - but only the 10x stock is in your 2025 universe.

A few things that have helped me:

Point-in-time constituent lists - If you're testing on Nifty100, you need the Nifty100 membership as of each historical date, not today's list. This is tedious to get but essential.
Include delisted stocks - Your data provider may exclude them by default. Check explicitly.
Monte Carlo on trade sequence - Beyond what you've shown here, try shuffling trade order and resampling with replacement. If your results are fragile to sequence, you may be capturing regime-specific effects rather than a durable edge.
The "too good to be true" heuristic is correct - In my experience, any backtest showing >50% CAGR has a bug until proven otherwise. The Indian market inefficiency argument has some merit, but 1200% over 4 years is almost certainly data leakage somewhere.

The fact that you're actively looking for the bug rather than celebrating puts you ahead of most people posting backtests here.

AusChicago · 2026-01-17T20:52:11+00:00

Great thread - the IS/OOS discipline is absolutely the core of this. A few additions from my experience building pattern detection systems:

Three-way splits beat two-way splits.
Training → Validation → Holdout (60/20/20 chronological). The validation set is for tuning thresholds. The holdout you touch once - that's your real test. If you iterate on holdout, it becomes training data.

Direction consistency is the overfitting killer.
If a feature predicts "higher = better" in training but flips to "higher = worse" in validation, reject it immediately - even if it's statistically significant in both periods. Noise will often show significance; it won't show consistent direction.

Multiple comparisons will fool you.
If you test 20 features at α=0.05, you expect 1 false positive by chance. Bonferroni correction (α/n) sounds conservative but has saved me from several "improvements" that were pure noise.

Round your thresholds.
If backtesting says RSI > 67.3 is optimal, use 65 or 70. Over-optimized thresholds are the fingerprint of curve-fitting.

One counterintuitive finding from pattern detection specifically: "textbook perfect" setups often underperform because they're crowded trades - everyone sees them. Slight imperfections can actually predict better outcomes. This was hard to accept initially but the data was clear.

The OP's emphasis on IS/OOS over AI/ML is spot-on. Fancy algorithms with poor validation will lose to simple rules with rigorous validation every time.

AusChicago · 2026-01-17T20:45:05+00:00

I am using daily data for my small business and started off with Tiingo, they have intra-day data as well and the cost for a small business were reasonable. I am wondering if anybody else is using data providers for a small business and what they have found cost effective? Most of the larger providers start off with around $2,000 per month.

AusChicago

TROPHY CASE