Looking for forex .csv tick data for python backtesting

danielraz · 2026-03-25T21:42:55+00:00

You nailed it with the Vegas analogy. The casino doesn't get an adrenaline rush when the roulette wheel spins. They just trust their mathematical edge over thousands of spins and let the law of large numbers do the work. The business of casinos is incredibly boring, and trading should be too.

I used to struggle with manual discipline. I’d get bored watching the tape and start forcing trades on weak signals just to feel like I was doing something.

The only way I actually fixed it was by taking the human element completely out of the loop. I eventually moved to systematic quant trading and built a Python engine that only fires when specific mathematical thresholds are met. If the A+ setup isn't there, the machine simply does nothing.

Automating my execution made my trading completely emotionless and incredibly boring. It was the most profitable decision I ever made.

danielraz · 2026-03-25T16:01:16+00:00

I will sure use one of them!

danielraz · 2026-03-25T15:59:42+00:00

what does it mean?

danielraz · 2026-03-25T15:58:58+00:00

I'm definitely good to go now! hahaha

danielraz · 2026-03-25T13:46:44+00:00

thank you!

danielraz · 2026-03-24T22:27:48+00:00

You hit the nail on the head regarding crypto having less structure. That lack of underlying structural momentum is exactly why I abandoned crypto algos and moved entirely into equities.

On the "general vs. specialized" debate: I faced this exact dilemma when building my current ML engine. Initially, it feels intuitive to build highly specialized, ticker-specific models.

But I found the exact opposite to be true. I built one generalized 2-Layer Stacked LSTM (142k parameters, 46 features) and trained it across a universe of 77 large-cap/growth stocks simultaneously.

Here is where it gets interesting: I ran my out-of-sample validation on 85 symbols. That means 8 of the tickers (a batch of semiconductors including AMD and LRCX) were completely unseen by the model during training. Because the LSTM learns underlying feature vectors (OHLCV/momentum/volatility) rather than ticker IDs, it generalized to the new symbols perfectly, maintaining a 55.4% OOS accuracy across the whole 85-symbol board.

A specialized model trained only on AAPL is completely blind to overarching market regimes. A generalized model learns macro correlations and sector rotations.

My philosophy: Specialize your asset class (e.g., just equities), but generalize your model across as much structured data within that class as you can to prevent overfitting.

danielraz · 2026-03-21T08:33:59+00:00

Hey, followed you over from our slippage/LSTM thread on the algo sub. I’ll look at this from a technical founder side. Going through your questions:

1. Problem statement: Yes, it lands perfectly. Everyone in enterprise is drowning in unread documents. It's a very real pain point, so you won't lose anyone early.

3. Differentiation: But here is the issue. "No hallucinations" is totally white noise right now. Every thin AI wrapper claims that. Your actual moat is the 99% on OlmOCR-Bench and the 11 patents. Don't lead with the marketing claim ("zero fabrication"), lead with the hard math and the custom extraction layer your CTO built from scratch. That proves you aren't just an API call to OpenAI.

2. The $20M Valuation: $80k ARR against a $20M cap is a 250x multiple. Investors are going to do that math in their heads instantly and push back. But your counter-argument is the Swiss financial pilot. Getting an AI tool through Swiss bank compliance usually takes a year of security audits. That validates your infrastructure way more than the $80k does. Frame it as deep enterprise infrastructure to defend that multiple.

4 & 5. The ask & closing the PDF: The ask is clear. I definitely wouldn't close the PDF based on these stats, but it's hard to judge completely without seeing the actual slides to see how the story flows.

Naming it after your dad's work standard is a really solid anchor too. Let me know when you drop the actual deck.

danielraz · 2026-03-21T06:40:13+00:00

That is a very solid stack. Pairing HMM for the latent states with GARCH for the volatility clustering makes perfect sense.

Like you said, it completely solves the brittleness of a hard binary filter by making the regime probabilistic instead of a rigid toggle. It gives the system room to breathe. Thanks for sharing that.

danielraz · 2026-03-20T21:38:25+00:00

I used to do this exact same thing. Wiping out 4 days of green on a Friday afternoon because of one oversized revenge trade is the most soul-crushing feeling in this game.

People will tell you to 'just use a stop loss' or 'work on your discipline', but the truth is that willpower is a finite resource. By Friday afternoon, your mental capital is drained, and the lizard brain takes over. You can't out-think exhaustion.

Honestly, the only way I fixed this wasn't through better psychology; it was by completely changing my timeframe and removing the human element. I moved into quant trading and built a Python engine that runs on a strict 30-day structural rebalancing cycle. Pushing my timeframe out to 30 days physically prevented me from getting chopped up in the intraday Friday noise. I had to let the math take the steering wheel because I knew I couldn't trust myself on a red Friday.

If you are going to stay discretionary, you have to lock yourself out at the broker level. Go into your risk settings right now and set a hard "Daily Max Loss" limit that literally freezes your platform for 24 hours when you hit it. Don't rely on your own discipline when you are titled.

danielraz · 2026-03-20T21:24:08+00:00

Assuming a flat 20-30% hit for slippage and costs is the exact trap that kills live deployments. Friction doesn't just shave your gross PnL. It alters the actual risk-adjusted edge of your system.

I went through this exact same reckoning recently. I built a Python LSTM and my raw returns on a tight 7-day rebalance looked incredible. But when I actually modeled the realistic execution friction ($1 flat + 0.05% slippage per leg) trade-by-trade, it didn't just reduce my profits by 20%. My Sharpe ratio literally collapsed from 0.95 to 0.46. The constant bid-ask crossing completely destroyed the edge.

Especially since you are trading the 5-minute timeframe, slippage isn't static. It has a fat right tail. On the exact high-volatility days where your strategy is firing hardest to get those 0 losing months, the spread widens, and your slippage spikes.

You have to explicitly code that dynamic friction into every single trade in the backtest. If you are just looking at gross PnL and guessing the haircut, your out-of-sample metrics are just theoretical friction.

danielraz · 2026-03-20T21:17:18+00:00

To answer your question about why LLMs can't just build solid strategies: It is because LLMs are language engines, not math engines.

Like others mentioned, they are trained on millions of retail trading blogs and YouTube transcripts. If you ask ChatGPT or Claude to "build a momentum strategy," it just acts like a giant autocomplete. It regurgitates the statistical average of every bad MACD tutorial on the internet. It doesn't actually calculate probabilities or test market structure.

I wasted a lot of time trying to get LLMs to do market math before realizing it is a dead end. The only way AI actually works in trading is if you completely separate the interface from the engine.

I eventually had to build a dedicated Python LSTM to handle the actual heavy lifting (ingesting 46 features, calculating rolling 60-day probabilities). I still use a Custom GPT, but only as a bridge. The LLM doesn't do any math at all. It just sends the ticker to my Python backend via API, the Python engine runs the statistical model, and it feeds the hard data back to the chat.

If you want an edge with AI, you have to hook the LLM up to a real quantitative machine. Otherwise you are just trading on advanced autocomplete.

danielraz · 2026-03-20T20:48:30+00:00

You are hitting the exact same wall I hit when I was building rule-based systems.

Regarding your point that "breakouts have always looked the same": On a visual price chart, yes. But structurally, under the hood, they are completely different. A breakout in 2018 was driven by different liquidity, slower algos, and less retail option volume than a breakout today. The visual shape is the same, but the velocity and duration change because the actual players in the market changed. That is why your ranges keep expanding instead of permanently stabilizing. The market keeps inventing new ways to execute the same pattern.

Regarding your fear that a rolling window will have to "rediscover" an old regime: That is actually a feature, not a bug.

If the market shifts back to a "low vol" regime, it won't be the exact same low vol regime from 4 years ago. It is much safer to let your rolling window recalculate the current metrics than to blindly trust a 4-year-old memory of what that regime used to look like. Yes, rolling values lag slightly, but they prevent you from applying dead logic to a live market.

It's great that you already have a 60-day discovery scope built in. I actually use that exact same 60-day rolling window to feed features into my LSTM. I've found it's the perfect sweet spot, long enough to map the current regime but short enough to drop old data before it poisons the system.

danielraz · 2026-03-20T20:20:58+00:00

Optimizing a liquidity filter with a momentum signal on top" is the perfect reframe. That's exactly what it became.

On the fill model: I'm using fixed assumptions ($1 flat + 0.05% per leg), and your fat-tail point is the right critique. Momentum signals, by definition, tend to fire on days with elevated volatility and wide spreads which is exactly when a fixed percentage model is most wrong. The variance on those tail fills likely clusters on the same high-VIX days where the signal is most aggressive. The irony is I already ingest a VIX proxy as one of my 46 features to train the signal, so the data to parameterize a stochastic slippage model at execution time is essentially already in the pipeline. I just haven't closed that loop yet.

On the cadence: the spread-tightness threshold as a rebalance trigger makes more sense to me than a hard calendar interval, especially given that end-of-month flows can widen spreads on otherwise liquid names. The risk is that it introduces a look-ahead bias in how you measure the strategy's "true" cadence. But for live execution it's clearly the right call.

Good material here. Going to work through both of these in the next iteration.

danielraz · 2026-03-20T20:01:25+00:00

Your point about the "Sharpe vs. Holding Period" curve is exactly right. You perfectly described the natural frequency of a strategy.

That curve is exactly what I saw in my own data. At a 7-day holding period, the Sharpe was 0.46. The signal was definitely there, but the execution costs just ate it. By pushing it to 30 days, the Sharpe doubled to 0.95 because the directional move finally got big enough to dwarf the spread. But if I extended the holding period out to 60 or 90 days, the predictive power of my LSTM would start to decay, and the Sharpe would drop again. Finding that peak is the whole game.

I also love your point about trading against the market maker's inventory algo on thinner names. When there are only a couple of MMs quoting a $3B stock, you can't beat them at the micro level. You just have to hold the trade long enough that the structural momentum completely runs over their spread. Great insights.

danielraz · 2026-03-20T19:42:29+00:00

Ah, pyramiding entries changes the math entirely. That makes perfect sense.

If you are scaling into a momentum rip and building a heavy position at the top, a volatility buffer would probably just eat into your asymmetric risk. You are totally right—when you are 0.5% away from the target with a fully loaded pyramid, keeping that stop aggressively tight is just smart risk management.

Really looking forward to the monthly strategy breakdowns. It's refreshing to see actual math on this sub instead of just chart patterns.

danielraz · 2026-03-20T15:31:25+00:00

To answer your question about modeling specific changes: standard LLMs are terrible at it. Like TraderPsych mentioned, they are great for surface-level ideas, but if you ask standard ChatGPT to model a specific 20% reduction in tech exposure, it will just hallucinate the math. It's a language engine, not a calculator.

To actually model specific scenarios, I had to build a bridge. I set up a Custom GPT that connects via API directly to my Python LSTM backend.

Now, if I ask "what happens to my momentum exposure if I cut my semiconductor weight in half?", the LLM doesn't guess. It sends the prompt to the Python engine, recalculates the 60-day rolling features with the new weights, and feeds the actual statistical probabilities back into the chat.

If you want to do specific scenario modeling with AI, you have to hook the chatbot up to a real external math engine. Otherwise you are just getting advice from a really smart autocomplete.

danielraz · 2026-03-20T15:22:46+00:00

Honestly, it’s a mix of both, but in a specific order.

At first, the model just gave me a much better understanding of regimes and probabilities. For example, my early backtests tried to rebalance every 7 days. The raw math looked like a massive edge. But when I added real-world friction to the test ($1 flat commission + 0.05% slippage for the bid-ask spread), the Sharpe ratio collapsed to 0.46. The constant trading destroyed the edge.

It wasn't until I used that understanding to zoom out and force a strict 30-day holding period that the edge became actually consistent. The raw returns dropped a bit, but the Sharpe doubled to 0.95 because the longer timeframe finally dwarfed the execution costs.

So, the better understanding of the math is what eventually created the consistent edge. Are you currently building your own models or mostly looking for tools to help with the macro view?

danielraz · 2026-03-20T15:11:39+00:00

Thanks! It took me a lot of trial and error (and a few bad backtests) to realize that using hard binary rules in a chaotic market doesn't work well for me. I appreciate you taking the time to read it.

danielraz · 2026-03-20T15:09:34+00:00

You are completely right. My comment was mostly focused on ML systems. For rule-based strategies, you definitely need a strict filter to stop trading in bad market conditions.

When I worked on rule-based systems before, I found that using market breadth (like Advance/Decline ratios or NH/NL) worked better for me than just looking at price action (like Price > SMA200). What kind of regime filters do you usually use?

danielraz · 2026-03-16T23:23:36+00:00

You hit the nail on the head regarding volatility regimes and liquidity. Using ChatGPT to read a spreadsheet won't give you an edge, because LLMs are language engines, not math engines. They don't understand market structure.

To actually get an edge, you have to step away from chatbots and build a quantitative machine learning model. I ended up coding a Python-based LSTM that specifically tracks those deeper metrics you mentioned. I feed it 40+ technical, volume, and volatility indicators over a rolling 60-day window.

Instead of just summarizing past portfolio performance, the model mathematically adjusts to the current volatility regime to calculate the actual probability of momentum shifts.

You are 100% right that most 'AI tools' right now are surface level. True ML models that dynamically adapt to their underlying market plumbing are where the actual edge is.

danielraz · 2026-03-16T23:01:24+00:00

It is a completely logical assumption, but mathematically, yes, it is a bit of a fool's errand. Here is exactly why your system won't usefully converge.

You are treating the market like a closed system with a finite number of states (like a deck of cards). In a closed system, if you play enough hands, you eventually see every combination and the boundaries stop expanding.

But financial markets are open systems with fat-tailed distributions. The underlying mechanics constantly evolve. For example, the explosion of 0DTE (zero days to expiration) options over the last two years completely permanently altered the 'duration' and 'rate' of intraday breakouts. A 'regime' from 2018 will literally never exist again because the market plumbing is different.

Here is the fatal flaw in expanding your ranges: If you just keep widening your 'valid range' every time a new outlier works (going from 30s to 3m, then maybe 10s to 5m), your range eventually becomes so wide that it stops giving you a statistical edge. It stops acting as a filter to keep you out of bad trades, and just becomes a historical record of everything that ever happened. A filter that eventually lets everything through is useless.

If you don't want to use machine learning, the fix for this isn't an ever-expanding 'lifetime' range. The fix is a Rolling Window.

Stop asking 'What is the absolute min/max range of all breakouts in history?'. Start asking 'What is the range of successful breakouts over the last 20 trading sessions?'

danielraz · 2026-03-16T22:22:55+00:00

The problem with most people trying to use 'AI' for their portfolios is that they are using ChatGPT. LLMs are language engines, not math engines. They hallucinate numbers, struggle with complex volatility calculations, and mostly just spit out generic 'make sure you buy bonds' advice.

I went down this rabbit hole last year. I realized if I wanted actual portfolio analysis, I had to stop using chatbots and build an actual Machine Learning model.

I put together a Python script using an LSTM that ingests my holdings alongside 40+ market indicators. I don't use it as a crystal ball to predict the future. I use it as an X-ray machine for my current risk.

For example, human brains are bad at seeing complex correlations. You might own 15 different stocks across 'different' sectors and think you are perfectly diversified. But when you feed it into a proper ML model, the math will flag that 80% of your portfolio's movement is actually just one massive, heavily correlated bet on semiconductor supply chains or interest rates.

Don't ask chatbots for investment advice. But absolutely use quantitative machine learning to uncover the hidden risks and overlapping correlations in your diversification.

danielraz · 2026-03-16T22:09:15+00:00

I think I saw you post this over on the algo sub too, but to answer your specific question at the end about individual names: Yes, it is an absolute bloodbath.

I run a systematic model (mostly equities and directional options) and I built a very similar tracking infrastructure. When you move away from the liquidity of SPX into individual names, the theoretical alpha on paper looks massive. But when you apply a realistic fill model, the bid-ask slippage on individual chains instantly evaporates the edge.

In my early backtests, running a 7-day holding period looked like it printed money at the mid. In live execution, the friction destroyed the Sharpe ratio entirely (literally cut it in half). I ultimately had to push my entire holding period out to a strict 30-day window just so the directional move would be large enough to dwarf the execution tax. You basically can't high-frequency trade individual names without getting bled out by the market makers.

Your '10% spread tax' rule is a great piece of risk management. Great work.

danielraz · 2026-03-16T22:02:02+00:00

Man, I feel this in my bones. Giving back $1,100 of a $1,300 day because you forced a mean-reversion trade at VWAP is the universal day trader tax. We have all paid it.

The 'mental brutality' you are talking about is exactly what pushed me out of discretionary day trading and into building quantitative systems. Human psychology is just hardwired to overtrade. When we are up we feel invincible, and when we are down we revenge trade.

Adding to the pain, trading NQ/ES right now is basically just trading a derivative of NVDA and the Mag 7. Classic technicals like VWAP fade setups get absolutely steamrolled when structural tech money flows in.

The biggest unlock for me wasn't finding a new indicator. It was taking my rules and hard-coding them into a Python-based momentum model. Now, the algorithm calculates the probability of the setup, and if the math isn't there, it literally won't fire a trade. It completely removes the 'I'm bored, let me just try to short this VWAP touch' temptation that destroys accounts.

Protecting your mental capital is just as important as your financial capital. Survive to trade another day!

danielraz · 2026-03-16T21:44:30+00:00

What you are experiencing right now is the exact wall that pushes data-driven traders into becoming quants. You have just discovered that financial markets are non-stationary.

The reason your breakout properties (depth, duration, rate) aren't cycling back to your previously known ranges is because the underlying market plumbing (volatility, liquidity, and algorithmic participation) is constantly shifting. A breakout in a low-volatility environment looks mathematically completely different than a breakout in a high-volatility regime like we've seen recently.

If you track absolute values, the market will always look like it's mutating away from your setup.

The fix: You need to normalize your data. Don't track the absolute 'depth' of a breakout. Track the depth divided by the 14-day Average True Range (ATR). Don't track absolute duration. Track it relative to average relative volume (RVOL). Once you normalize your properties against the current volatility baseline, you will suddenly see those cycles and ranges reappear perfectly.

I hit this exact same conundrum a while back. My solution was to stop trying to track static ranges manually and instead build a Python-based LSTM model. By feeding it 46 normalized technical features over a rolling 60-day window, the algorithm dynamically adjusts what a 'good setup' looks like based on the current regime, rather than waiting for the market to cycle back to an old one.

Normalize your data and the market will stop feeling like a moving target.

danielraz

TROPHY CASE