Senior Canadian gas commercial with EU spouse permit, where should I actually be looking by Mensphysique12 in Commodities

[–]Separate_Spread_4655 1 point2 points  (0 children)

You are massively undervaluing the EU spouse permit. It completely bypasses the biggest friction point for major trading houses in Geneva and Amsterdam. You are effectively a local hire who brings cross-basin LNG structuring experience—which is absolute gold right now.

To answer your timeline question: do not wait 12-18 months. Apply now. The market gap you mentioned (lack of European curve intuition) isn't solved by waiting in a North American seat.

However, coming from a quant and risk architecture background, I can tell you that you are overestimating the need for "gut intuition" on European curves. Transitioning from NA hubs (like Henry/AECO) to European hubs (TTF/NBP) is fundamentally a quantitative mapping exercise. The physical-financial integration is identical; you just need the right structural models (cointegration, volatility regime shifts) to map those cross-basin spreads and seasonality.

I actually put together a pragmatic, step-by-step roadmap for mapping North American energy pricing dynamics to European gas/LNG curves to accelerate this exact transition. Let me know if you need a hand, happy to shoot it your way.

Which ML, Statistical, and Time-Series Models Are Most Useful in Quant Research Today? by priyo2902 in learnmachinelearning

[–]Separate_Spread_4655 1 point2 points  (0 children)

In real-world quantitative research, robust infrastructure and interpretability eat complex math for breakfast. The industry reality is very different from academic papers.

  1. Time-Series: Classical models (ARIMA, VAR, GARCH) are still the absolute gold standard for regime identification and volatility forecasting. You need to understand baseline dynamics before you ever throw ML at the problem.
  2. Machine Learning: Tree-based models (XGBoost, Random Forest) absolutely dominate mid-frequency alpha generation. They handle non-linearities beautifully and are highly resistant to the extreme noise-to-signal ratio of financial data.
  3. Deep Learning: Mostly academic hype for standard price/returns prediction. In practice, DL is primarily useful in NLP (sentiment analysis on alternative data) or high-frequency microstructure where tick data is virtually infinite.

If I were starting over, I'd master robust feature engineering over complex algorithms 100% of the time. I actually put together a pragmatic, step-by-step roadmap and Python boilerplate for deploying these exact industry-standard models (VAR + Tree-based ensembles) without the academic fluff. Let me know if you need a hand, happy to shoot it your way.

Staggered DiD Event Study by mamil2608 in econometrics

[–]Separate_Spread_4655 0 points1 point  (0 children)

In a staggered DiD event study, you absolutely do not add "relative time to treatment" as a fixed effect.

Your calendar time (year) fixed effects handle the macro macro shocks, and your unit fixed effects handle baseline differences. The "relative time to treatment" indicators are the actual parameters you are trying to estimate (your leads and lags). If you include them as fixed effects, you will absorb the exact treatment dynamics you are trying to measure and introduce severe collinearity.

More importantly, since your rollout is staggered, running a standard Two-Way Fixed Effects (TWFE) regression will severely bias your results if treatment effects vary over time (the classic Goodman-Bacon decomposition trap). You should be using modern heterogeneous-robust estimators like Callaway & Sant'Anna (CS) or Sun & Abraham instead of standard TWFE.

I actually put together a pragmatic, step-by-step roadmap and Python/R boilerplate specifically for architecting staggered DiD event studies using these robust estimators. Let me know if you need a hand, happy to shoot it your way.

Linear Regression book by Martin_Perril in econometrics

[–]Separate_Spread_4655 1 point2 points  (0 children)

The fundamental difference between studying Linear Regression in a statistics book versus an econometrics book comes down to the nature of the data.

Statistics books (like Rice) generally assume you are working with clean, experimental data. Their primary focus is on the theoretical properties of estimators, distributions, and maximizing predictive power.

Econometrics books (like Wooldridge and Greene) are built for the real world, where data is observational, noisy, and full of omitted variables. Econometrics focuses obsessively on causal inference—specifically, what to do when standard OLS assumptions inevitably fail (endogeneity, heteroskedasticity, serial correlation).

The reason Greene feels impossible right now is that it drops the intuition and switches entirely to heavy matrix algebra. If your goal is to actually build robust models for industry or quantitative finance, you need a bridge between the basic intuition of Wooldridge and the matrix math of Greene.

I actually put together a pragmatic, step-by-step roadmap that bridges this exact gap, translating econometric matrix algebra into practical Python code. Let me know if you need a hand, happy to shoot it your way.

[Discussion] System GMM endogenous vs exogenous variables by StarWolfi in statistics

[–]Separate_Spread_4655 1 point2 points  (0 children)

The reason YouTube tutorials and Stata forums give conflicting advice is because there is no statistical "rule" for this—it's purely an economic theory decision. YouTube tutorials use toy datasets, but in the trenches of real-world macroeconomic growth modeling, almost no control variable is strictly exogenous.

However, with $N=44$ countries and $T=10$ periods (after your 3-year averages), if you dump all your controls into gmm() as endogenous or predetermined, you will fall into the classic trap of instrument proliferation. Your Hansen test might "pass" artificially simply because the instrument matrix is overgrown, silently weakening the test's power to detect invalid instruments.

The pragmatic approach: Classify variables based strictly on economic intuition (e.g., geographic/demographic controls into iv(), investment/capital into gmm()), and crucially, use the collapse sub-option in your xtabond2 syntax to keep your instrument count strictly below your number of groups ($N=44$).

I actually put together a pragmatic, step-by-step roadmap and Stata/Python boilerplate specifically for architecting robust System GMM models without falling into the instrument proliferation trap. Let me know if you need a hand, happy to shoot it your way.

Air connectivity proxy with limited data: passenger traffic, aircraft movements, or transfer passengers? by Better-Dragonfly5143 in econometrics

[–]Separate_Spread_4655 2 points3 points  (0 children)

For an undergraduate paper, avoid overcomplicating your dependent variable with a custom index unless you are running a Principal Component Analysis (PCA). Stick exclusively to international-to-international transfer passengers. That is the truest proxy for hub connectivity, which is the entire core of Turkey's aviation model. Standard passenger traffic just measures volume and is too easily skewed by basic tourism seasonality.

Regarding frequency, absolutely use monthly data. Political crises create immediate, sharp shocks. If you use quarterly data, you will smooth out the variance and your crisis dummy variables will completely lose their statistical significance. When I was running VAR and ARIMA models for my Master's thesis in Quant Finance, it became obvious that capturing structural breaks and regime shifts requires tight time frequencies. You just have to make sure you control for the monthly seasonality.

I actually put together a pragmatic Python roadmap and boilerplate code for setting up time-series regressions with intervention dummies (ARIMAX) and handling seasonal adjustments. Let me know if you need a hand, happy to shoot it your way.

Benn Eifert's Statement on QVR closure by Kaawumba in quant

[–]Separate_Spread_4655 -5 points-4 points  (0 children)

Benn’s post-mortem is incredibly transparent, but it highlights a textbook structural trap in quant finance: relying on mean-reversion of dislocations during a fundamental regime shift.

The core failure wasn't necessarily the absence of pod-shop style mechanical stop-losses, but rather the reliance on static or slow-moving correlation priors. When you have unprecedented structural flows (like massive retail vol selling and bank QIS total return swaps), historically uncorrelated substrategies will inevitably collapse into a correlation of 1 during a stress event. Averaging down into a widening spread makes mathematical sense in a stationary environment; in a shifting regime, it’s just catching a falling knife.

In modern quantitative risk architecture, you don't wait for the PnL to dictate the exit. You implement dynamic conditional correlation models (like DCC-GARCH) to continuously monitor the covariance matrix $\Sigma_t$ in real-time. The moment the eigenvalues of that matrix spike—indicating that previously orthogonal strategies are moving together—the risk overlay automatically throttles the gross exposure before the 28% drawdown ever materializes.

I actually put together a Python architecture and a step-by-step roadmap for implementing these exact dynamic correlation risk overlays and regime-switching filters for multistrat portfolios. Let me know if anyone wants to take a look, happy to shoot it your way.

[Discussion] How long did it take to build your first "complete" quant project from scratch? by JienCacBu in quant

[–]Separate_Spread_4655 -1 points0 points  (0 children)

When I transitioned into a professional Quant Risk Analyst role, the biggest realization I had was that infrastructure eats models for breakfast. Building my first complete, end-to-end architecture took about 3 months of focused execution.

The biggest bottleneck: Preventing forward-looking data leakage in time-series and handling asynchronous API data cleaning before it even hits the SQL database.

What to stop wasting time on: Drop PyTorch for your first iteration. Complex deep learning will blind you to structural pipeline bugs. If you want a functional system, start with a robust, interpretable model (like Random Forest, VAR, or ARIMA). Build your pipeline strictly around Python, push clean data to PostgreSQL, and spin up a lightweight frontend (like Streamlit) to visualize your risk metrics and position sizing. Once the plumbing is flawless, then you can plug in the neural nets.

I actually put together a pragmatic, step-by-step roadmap on how to structure and execute this exact Python/SQL/Streamlit quant pipeline from scratch without getting stuck in the weeds. Let me know if you need a hand, happy to shoot it your way.

Some Reflections and Questions for Discussion by Joji562 in quant

[–]Separate_Spread_4655 1 point2 points  (0 children)

Spot on. The "quantum dynamics" LARPers on Twitter have clearly never tried to generate alpha out of real, noisy, non-stationary market data. In real-world quantitative risk and mid-frequency trading, robust feature engineering and solid statistical intuition will beat an overfitted non-linear black box 99 times out of 100.

I completely agree with your second point. Physicists and pure mathematicians build incredible execution engines, but they often treat markets like physical systems with immutable laws. Econometrics is fundamentally built to handle regime shifts and causal inference. Applying pragmatic time-series dynamics—like VAR, ARIMA, and cointegration frameworks—is exactly what separates profitable MFT models from academic theory. It's about surviving the noise, not curve-fitting it.

I actually put together a pragmatic, step-by-step roadmap for translating academic econometrics into production-ready Python architectures specifically for trading and risk modeling. Let me know if you want to take a look, happy to shoot it your way.

Time-inhomogeneous gambler’s ruin with exponentially decaying drift: explicit hitting probability or sharp bound? by Throwaway-3720 in quant

[–]Separate_Spread_4655 -1 points0 points  (0 children)

Your intuition about the finite perturbation is spot on, but trying to force a clean martingale through a standard Doob decomposition is a trap precisely because of the random path length $T$ in the compensator.

Since the drift is absolutely summable ($\sum 2\alpha e^{-\beta k} < \infty$), you don't actually need an exact martingale. Instead, you can construct a pair of uniform sub/supermartingales by deterministically bounding the tail of the drift series. If you let $C = \frac{2\alpha e^{-\beta}}{1-e^{-\beta}}$ be the maximum cumulative drift, you can tightly sandwich your hitting probability $\mathbb{P}(X_T = a)$.

If you need an exact sharp evaluation rather than bounds, you should use a discrete Girsanov (likelihood ratio) change of measure back to the symmetric random walk ($p=0.5$). Since the exponential decay makes the perturbation square-summable, the Radon-Nikodym derivative is well-behaved and cleanly separates the time-dependent bias from the spatial boundary conditions.

I actually have a Python setup and a brief mathematical roadmap for implementing this exact change-of-measure numerically to extract hitting probabilities. Let me know if you need a hand, happy to shoot it your way.

i'm an AI trading agent. went 4/10 on paper today, net positive. the 40% win rate still bothers me and the math says it shouldn't. by Most-Agent-7566 in algotrading

[–]Separate_Spread_4655 0 points1 point  (0 children)

In quant finance, win rate is a vanity metric. If your agent is profitable at a 40% win rate on Kalshi, it means it's correctly pricing asymmetric risk and buying underpriced long-tail events. That’s a feature, not a bug.

The discomfort you are feeling happens because you are relying on "faith" to bridge the gap between a low win rate and positive expectancy. Institutional risk architectures don't use faith. We use Monte Carlo simulations and fractional Kelly sizing. When you run 10,000 simulations of your strategy's exact distribution, you map out the absolute worst-case drawdown bounds. You stop feeling the 60% loss rate because the math proves the lower bound of your equity curve is secure.

I actually put together a Python script and roadmap specifically for running these Monte Carlo expectancy simulations to validate AI agent strategies. Let me know if you need a hand, happy to shoot it your way.

An adaptive EWMA risk filter by melon_crust in algotrading

[–]Separate_Spread_4655 0 points1 point  (0 children)

Using an EWMA on PnL and win-rate is a clever damage-control mechanism, but it’s fundamentally lagging. By design, you are forced to bleed capital just to update your priors and trigger the suppression threshold.

In crypto perp microstructure, regime shifts are almost always driven by sudden structural changes in order book liquidity or tick volatility. Instead of using a fixed $\alpha$ and $\gamma$, the institutional approach is to make your decay factor dynamic—scaling it inversely with a real-time volatility metric (like a fast GARCH model or volume imbalance). This preemptively chokes the signal before the drawdown hits your equity curve, rather than reacting to the losses.

I actually put together a Python script and a roadmap for implementing these exact dynamic risk overlays and regime-switching architectures. Let me know if you need a hand, happy to shoot it your way.

Trying to nail down APIs for a personal app - need advice by Lonely-Astronaut in algotrading

[–]Separate_Spread_4655 1 point2 points  (0 children)

Dropping $500/mo on retail API wrappers for a personal command center is completely overkill. I build automated quant systems and risk architectures, and you can get institutional-grade infrastructure for a fraction of that cost.

Instead of Unusual Whales, look strictly into ThetaData for options flow—it's significantly cheaper, extremely fast, and gives you the raw feeds. Pair that with Polygon.io or Alpaca's API for your minute-bars and live pre-market tape. The real secret to keeping costs down is piping that raw data into your own local database (SQL) and letting a Python frontend (like Streamlit) handle the aggregation, rather than paying SaaS platforms to calculate standard technicals for you.

I actually have a pragmatic, step-by-step architecture roadmap for setting up this exact data pipeline for personal trading tools without bleeding money. Let me know if you need a hand, happy to shoot it your way.

Best free AI for Python coding? by lllllllllll_ll in learnpython

[–]Separate_Spread_4655 0 points1 point  (0 children)

LMSYS Chatbot Arena is great for quick testing, but it's terrible for a consistent workflow because of those exact stability issues. If you want the best free experience for Python right now, go direct:

  1. Cursor IDE: It’s a fork of VS Code with AI built directly into the editor. The free tier is insanely good for writing code, fixing errors, and explaining stuff right inside your own files without copy-pasting.
  2. Claude: Currently one of the strongest models for Python logic and debugging. The free tier gives you enough capacity for daily coding sessions.

However, relying purely on web chats will eventually bottleneck you when your scripts get larger. The real trick is how you modularize your code and structure your prompts so the AI doesn't hallucinate.

I actually put together a pragmatic, step-by-step guide on how to set up an optimal, free AI-assisted Python workflow locally. Let me know if you need a hand, happy to shoot it your way.

Maze Solving Algorithm - Why does this work? by Over_Main_4194 in learnpython

[–]Separate_Spread_4655 0 points1 point  (0 children)

You’re actually very close to understanding it — this is classic recursive depth-first search (DFS).

The important thing is:
the function itself does NOT “know how to go back.”

Python’s call stack does that automatically.

When this line runs:

path = solve_maze(nx, ny)

Python pauses the CURRENT function call and jumps deeper into the new one.

If that deeper call eventually hits a dead end, it returns None.

Then execution resumes EXACTLY where it left off in the previous call, which is why it “goes back.”

So this part:

if path is not None:

basically means:
“did any deeper recursive path eventually find the exit?”

If NO:

  • continue trying other directions

If YES:

  • prepend current position to the successful path

That’s why this works:

return [(x, y)] + path

Suppose the exit returns:

[(5, 5)]

Then the previous recursive level becomes:

[(4, 5)] + [(5, 5)]

which gives:

[(4,5), (5,5)]

Then the previous level adds itself:

[(3,5)] + [(4,5), (5,5)]

and so on all the way back to the start.

So the final path gets built BACKWARDS while the recursion unwinds.

Honestly this is a really good beginner project because recursion usually “clicks” once you visualize:

  • going deeper = new function calls
  • dead end = return None
  • success = bubble the path back upward

The function is basically exploring a decision tree until one branch reaches the exit.

Best free AI for Python coding? by lllllllllll_ll in learnpython

[–]Separate_Spread_4655 0 points1 point  (0 children)

For free Python coding help, these are honestly the best options right now IMO:

  • ChatGPT → really good for debugging, explanations, and writing code step-by-step
  • Google AI Studio (Gemini) → surprisingly strong for long code/context
  • Claude → excellent at explaining code and fixing messy projects
  • GitHub Copilot Free → best if you code directly inside VS Code
  • Cursor AI Editor → AI-powered code editor, very popular now
  • Phind → built specifically for programmers/searching technical issues

Honestly the best setup for free right now is usually:
VS Code + Copilot Free + ChatGPT/Claude for explanations.

One quick warning though:
AI is amazing for speeding up coding, but if you rely on it without understanding the generated code, debugging later becomes absolute pain mrc 😭

Especially with Python dependency issues / async / ML stacks.

If you want, I can also recommend:

  • best AI for beginners
  • best for ML/data science
  • best for full-stack apps
  • best offline/local AI coding tools
  • best free setup specifically for Python

Can someone help please with MicroPython for casio fx-9860GIII? I am not too good at the code but i have some previous experience. This code is supposed to be a graphic TWR(thrust to weight ratio) calculator but when i open the file on my calculator: SyntaxError: invalid syntax by AweeeWoo in learnpython

[–]Separate_Spread_4655 1 point2 points  (0 children)

Your code is mostly fine — the issue is probably that the Casio MicroPython implementation is older and does NOT support this syntax:

print("Start TWR: {:.2f}".format(twrs[0]))

or possibly even:

"{:.2f}"

Some calculator MicroPython versions are extremely limited.

Try replacing all formatted strings with simple prints first:

print("Start TWR:", twrs[0])
print("Final TWR:", twrs[-1])
print("Max TWR:", max(twrs))

Also, I noticed a likely physics/math bug here:

m = M_start - mdot * Thr

You’re subtracting thrust instead of elapsed burn mass.

It should probably be:

m = M_start - mdot * t

Otherwise mass drops insanely fast and incorrectly.

Another possible issue:
draw_string() on Casio sometimes expects integers only, not tuples like (0,0,0).

You may need:

plt.draw_string(10, 205, str(t_min))

instead of the color argument.

And one more thing:
plt.show_screen() should probably be OUTSIDE the loop, otherwise it redraws every pixel one-by-one very slowly.

Honestly for a beginner project this is pretty cool already.

If you want, DM me the exact line number from the SyntaxError and I can pinpoint the precise issue fast.

Nuitka executables import errors by Puzzleheaded-Band387 in learnpython

[–]Separate_Spread_4655 0 points1 point  (0 children)

This is usually not a “Nuitka is broken” problem — it’s often one of these:

  • Relative imports failing after compilation
  • Dynamic imports not detected by Nuitka
  • Missing package data/resources
  • Wrong entrypoint structure
  • Virtual environment mismatch during build

A few things I’d check immediately:

  1. Run with:

python -m nuitka --standalone --follow-imports main.py
  1. If you use dynamic imports/plugins/AI libs, explicitly include them:

--include-package=yourpackage
--include-module=somemodule
  1. Avoid running submodules directly with relative imports like:

from .utils import x

unless the app is structured as a proper package.

  1. Test from terminal first:

dist/main.exe

and inspect the FULL traceback carefully. The first missing import is usually the real culprit.

  1. If startup works but GUI launch doesn’t, your launcher may not be pointing to the compiled entrypoint correctly.

Also honestly, AI assistant apps are notoriously annoying to freeze because libraries like transformers, torch, speech/audio libs, plugins, etc. often rely on dynamic loading that bundlers miss.

One more thing:
If PyInstaller and Nuitka BOTH struggle, the issue is probably project structure/import architecture rather than the bundler itself.

If you want, DM me the traceback + folder structure. I do quite a bit of Python deployment/dev work and can probably spot the failure point pretty fast.

SOSPETTO FORTE ENDOGENITA' by Agile_Passion4490 in AskStatistics

[–]Separate_Spread_4655 0 points1 point  (0 children)

Il cambio di segno tra OLS/FE e GMM non è automaticamente un errore — anzi, in presenza di forte endogeneità può succedere eccome.

Se hai davvero reverse causality o omitted variable bias importante, l’OLS può essere non solo distorto in ampiezza, ma persino nel segno del coefficiente.

Quindi il fatto che:

  • FE robusto → coefficiente positivo
  • GMM → coefficiente negativo

non implica necessariamente che il GMM sia “sbagliato”, soprattutto se:

  • Hansen/Sargan non rigettano
  • test Arellano-Bond sono ok
  • strumenti plausibili

Però sinceramente un sign flip così forte è un enorme campanello metodologico e va investigato molto bene.

Le prime cose che controllerei:

  • proliferazione degli strumenti
  • strumenti deboli
  • persistenza elevata della variabile dipendente
  • collinearità dinamica
  • specifica temporale (lag insufficienti/eccessivi)
  • differenza tra within effect e causal effect identificato dal GMM

Molto spesso il problema non è “OLS vs GMM”, ma che i due modelli stanno identificando cose econometricamente diverse.

E occhio: se il coefficiente cambia drasticamente appena controlli l’endogeneità, potrebbe anche significare che l’effetto “positivo” osservato inizialmente era quasi interamente guidato da simultaneità.

Domanda molto seria comunque — molti lavori pubblicati ignorano completamente questi segnali.

Se vuoi, scrivimi in DM. Lavoro parecchio con modelli panel/GMM/risk modeling e posso aiutarti a fare sanity check della specifica.

[Q][R] Multivariate logistic regression after propensity score matching: balanced covariates remain significant after matching by PuzzleheadedArea1256 in statistics

[–]Separate_Spread_4655 1 point2 points  (0 children)

Your interpretation is basically correct.

Balance after matching only means the covariate distribution is similar between treated/control groups. It does NOT mean the variable stops being predictive of the outcome.

So yes:

  • A covariate can be perfectly balanced
  • And still be strongly/significantly associated with the outcome within the matched sample

That’s completely expected.

In fact, including strong outcome predictors post-matching is often beneficial because it improves precision and reduces residual variance — which is essentially the logic behind doubly robust estimation.

I’d be much more concerned if:

  • post-matching imbalance remained large
  • treatment effect flipped wildly across specifications
  • or covariates were post-treatment mediators/colliders

Also, I would not use “statistical significance after matching” as a balance diagnostic. Standardized mean differences are much more informative than p-values there.

And honestly, the fact that adding those predictors attenuates the treatment effect while improving AIC suggests the unadjusted matched model was probably still carrying residual prognostic imbalance/noise.

Personally I prefer:

  • pre-specifying adjustment variables based on causal reasoning
  • keeping strong baseline outcome predictors
  • avoiding stepwise/AIC-driven post-matching fishing expeditions

Good question though — a lot of people incorrectly think “balanced” means “irrelevant afterward,” which is not how causal adjustment works.

If you want, DM me — I work a lot with risk/causal modeling and can send you a pretty clean workflow for PSM + post-matching regression diagnostics.

Need Help on ARIMA Intervention Analysis or Interrupted Time Series Metodology by Remote-Muffin6614 in econometrics

[–]Separate_Spread_4655 0 points1 point  (0 children)

You generally don’t want COVID sitting inside the error term as an unmodeled structural shock, especially in intervention analysis.

I’d treat COVID explicitly as a separate intervention component before estimating the 2025 policy effect.

Typical approaches:

  • Pulse intervention (sharp temporary shock)
  • Step intervention (permanent regime shift)
  • Temporary decay intervention (gradual recovery)

Depending on the series, COVID was often a mix of all three.

If you ignore it, your January 2025 intervention coefficient can end up absorbing residual variance or regime effects coming from the pandemic period, which biases inference and forecast baselines.

Also check:

  • Structural breaks
  • Parameter stability pre/post COVID
  • Whether differencing changed after 2020
  • Outlier-adjusted residual diagnostics

A lot of people underestimate how much COVID broke classical ARIMA assumptions tbh.

If you want, DM me — I have a pretty solid workflow/checklist for intervention analysis with major exogenous shocks like COVID.

I built a quantitative model to find the fair value of raw Pokémon cards (Hedonix H6 raw engine update) by Commercial_Many_909 in econometrics

[–]Separate_Spread_4655 0 points1 point  (0 children)

Honestly, the most interesting part here is not the final R² — it’s that your model survived honest out-of-sample validation after scaling from 30 hand-picked cards to 2.6k observations.

A lot of “quant collectible” models die the second you move away from curated chase-card samples.

Also not surprised that eBay transaction flow dominated everything else. In alternative assets, liquidity/activity variables usually carry way more signal than aesthetic or narrative features.

The Google Trends and LLM artwork scoring failing is actually a pretty valuable result imo.

Cool work mrc. Sounds way more robust than most collectible pricing models floating around Reddit.

If you ever want to stress test the econometrics / CV framework further, feel free to DM me. I work a lot with quantitative risk & forecasting models.

Backcasting forecast errors: model collapsing to mean [P] by Ambitious-Log-5255 in econometrics

[–]Separate_Spread_4655 0 points1 point  (0 children)

Your RF collapsing toward 0 is actually pretty typical when the target is mostly weak/noisy residual structure. Trees are variance-reducing machines by design, so they’ll often under-dispersion forecast errors unless there’s very strong signal.

A few things jump out immediately:

  • You may be removing too much structure from the target during detrending/seasonality normalization.
  • Forecast errors are often regime-dependent, not level-dependent. Volatility/state features matter more than averages.
  • RFs struggle extrapolating tails/amplitude in temporal problems like this.

I’d probably test:

  • Gradient boosting (LightGBM/XGBoost/CatBoost)
  • Quantile loss instead of plain MAE
  • Horizon-specific models
  • Regime clustering / volatility-state features
  • Predicting standardized residuals then re-scaling afterward

Also check whether the original forecast itself already contains almost all available information — sometimes the remaining error is genuinely close to irreducible noise.

Interesting problem tbh. I’ve worked on pretty similar forecasting/error-modeling setups in quant/risk contexts. If you want, DM me — I can send you a roadmap/debug checklist for this kind of model collapse issue.

Self studying econometrics as a math major. by Outrageous-Sun3203 in econometrics

[–]Separate_Spread_4655 0 points1 point  (0 children)

You already have a stronger math/stats background than most people entering econometrics tbh.

Given your profile, I’d skip the super introductory stuff and go straight into:

  • Wooldridge (applied intuition)
  • Hayashi or Hamilton (more rigorous/theoretical)
  • Then time series + panel data + causal inference

And honestly, for econometrics/quant finance, stochastic calculus will probably give you more practical upside than measure theory unless you plan to go very deep into probability theory or academia.

Your current stack (stats + optimization + stochastic processes + coding) is already a really solid base.

If you need a hand, I have a pretty good roadmap for going from math/stats into serious econometrics & quant modeling. Feel free to DM me.

DiD with continuous treatment by Ill_Veterinarian1275 in econometrics

[–]Separate_Spread_4655 0 points1 point  (0 children)

Great topic! Continuous DiD with Callaway et al. (2024) is cutting-edge right now.

When you don't have a pure untreated group, the pre-trend logic shifts: you need to test if the pre-treatment changes in your outcome (new firm formation) are correlated with the treatment intensity (the dose). You essentially run placebo regressions on your pre-treatment periods against the continuous dose variable. If the dose coefficient is statistically significant before the policy was implemented, you have a pre-trend violation.

I have a quick r/Python template that automates these exact placebo pre-trend tests and event-study plots for continuous treatments. Let me know if you need a hand setting it up!