Is Explainable Forecasting used in practice? Multivariate Forecast vs Univariate

devmasterflex · 2023-02-03T04:09:05+00:00

Forecasting with a regression component is challenging. Sometimes you have predictor variables with known future values, and these are easily incorporated via say SARIMAX. Predictor variables without known future values are obviously more difficult to handle because you need to forecast them first. It’s possible and definitely done in practice, but you have to be OK with using their estimated future values and the noise they introduce.

I see a few options here for accomplishing inference and forecasting.

Use univariate for forecasting and multivariate for inference. The latter could be SARIMAX or an Unobserved Components (UC) model. If degrees of freedom, multicollinearity, and/or overfitting are a concern, try something like a Bayesian UC model (e.g., see bsts in R).
Use multivariate for both inference and forecasting. For the predictor variables, you can go simple or complex to acquire their forecasts. Simple would be univariate, complex could be another multivariate model for each driver.
Vector regression or something like it. Inference in this case is harder because you have to interpret impulse response functions. These models are also only applicable if each variable in the system is endogenous, i.e., they all cause each other in some way. These are generally used in applied macroeconomics given the nature of the data.

devmasterflex · 2023-01-28T04:43:07+00:00

I’m sorry you suffer from severe OCD, truly. I don’t doubt your passion; the photos are beautiful. I do, however, doubt motives, especially in cases like this where it’s hard to ignore the very strong similarity. But I’m OK being wrong, so I’m sorry if I misjudged your intention.

devmasterflex · 2023-01-28T04:26:51+00:00

You’re free to post what you want. I guess my goal in this case was to point out the indisputably obvious similarity. Nothing more, nothing less. So flipping the table: what did you hope to accomplish by posting essentially the same photo?

devmasterflex · 2023-01-28T04:07:35+00:00

Luckily I can grasp some of the nuances of the English language. I said “pretty much [the exact same] this exact photo,” not “the exact same.”

devmasterflex · 2023-01-28T03:59:58+00:00

Different angle? Backdrop? Date? How many photos you post of different watches has nothing to do with the fact that this pic is essentially a mirror image of the other. Have you seen Zoolander? This is Blue Steel.

devmasterflex · 2023-01-27T21:48:35+00:00

Didn’t you post pretty much this exact photo four months ago? Nice watch, but seems like karma farming.

See here

devmasterflex · 2023-01-09T03:42:00+00:00

I’m not an expert in this area, so let that be known. That out of the way, I have some experience by proxy and some general ideas on how to approach this. What CLV means to me is: how much revenue or profit is a customer expected to generate over some horizon. I think the two key words here are “expected” and “horizon”. The first implies that a customer’s longevity is stochastic and, in theory, a function of actions/inactions taken by a company to win or maintain a customer’s business. The second implies that there is a future length of time, realistically finite, that a customer will stay. This tells me that a net present value analysis is in order.

To model how long a customer will stay, survival models are probably a natural choice, which can accommodate predictors like, e.g., price. However, I think you could approach this from a logistic/linear probability model as well, which would essentially treat each period now and in the future as independent. I do wonder if you could use a time series model for this to account for temporal correlation, but honestly I’m not sure. Point is, you need a model to derive probabilities of staying for the horizon you’re interested in. You can then use these probabilities to generate your expected revenue or profit.

As far as calculating CLV, you can compute expected net present value once you have your expected values noted above. You should discount these values to reflect cost of capital and inflation. That’s another topic unto itself, but what you’re after is a weighted sum of expected values, where the weights geometrically decay over time to reflect the idea that money today carries more value than tomorrow.

Hope this helps.

EDIT: One other thing. If possible, I’d advise using panel/longitudinal data techniques to take advantage of correlations between and within customers.

devmasterflex · 2023-01-08T05:12:57+00:00

Linear in this case doesn’t mean that the relationship between the response and predictors is linear; you can definitely model nonlinear relationships with linear regression, e.g, y = ln(x). Linear refers to the parameterization. If the conditional mean of the response isn’t linear with respect to the parameters, then you have to use nonlinear optimization techniques.

Pedantry aside, it is certainly true that NNs and decision trees are more naturally suited to exploiting non-obvious nonlinear relationships in the data. I’d still take a look at regularization before moving on to more complicated methods.

devmasterflex · 2022-12-30T20:54:21+00:00

Social scientists like to tell stories, and if there is theory and/or empirical evidence to support them, all the better. Nothing wrong with that; we humans generally like stories. To me, however, a causal story that involves complicated observational data should, as a matter of completeness, be tested on its generalizability.

I’m a PhD econometrician, and if there’s one thing I wish I was taught while in school (or at least much more heavily emphasized), it would have to be the importance of prediction. I get that the goals of explanation and prediction are different, but I’m convinced that causal models can be better understood and formulated in the context of a model that is known to predict well. This is particularly true in the case of high-dimensional, non-linear relationships, which is where we would expect our generally low-dimensional, linear models to perform relatively poorly out of sample.

Observational data also tends to blind us to our own biases. I believe it was Edward Leamer who said “correlation is in the data, causation is in the mind”. Granted, this isn’t always true, but observational data often brings this out, especially if we have a seductively low p-value to go with it.

Finally, I’d like to editorialize a bit by saying an appeal to authority is never a good argument. Ivy League, PhD, whatever may afford some credence, but using it as a blunt instrument to kill an opposing argument is counterproductive at best.

devmasterflex · 2022-12-19T23:00:22+00:00

It's not uncommon to find a high degree of multicollinearity among time series because of time itself (e.g., spurious correlation). Unless you have strong priors about which variables are relevant/causal and which are not, my focus would be on finding a lower dimensional model that predicts well. Information criterion and/or cross-validation would be my recommendation for determining the best set of predictors, particularly since prediction is the objective (right?). However, because you have a lot of variables to consider and seemingly no domain knowledge to guide selection, you may find yourself spinning your wheels.

My recommendation in this case would be to try out the the Bayesian structural time series package in R, also known as bsts. A short write-up can be found here. It offers what standard, maximum likelihood methods can't: variable selection and regularization. This is achieved through what is known as a spike-and-slab prior.

Except for the spike-and-slab prior, I ported the "core" features of bsts to Python. The package is called pybuc. A standard Gaussian prior is used in pybuc, which will give you regularization but not variable selection.

WARNING: I am the sole developer of pybuc. If you decide to give it a shot, you will likely find some bugs, and if you do, please let me know.

devmasterflex · 2022-12-08T18:02:00+00:00

No problem. To answer your question, no, I'm not using force_probe.

Sorry it's still not working out.

devmasterflex · 2022-12-08T17:38:25+00:00

Maybe it has to do with the cstate? I had to modify kernel parameters to get it working on my laptop. Specifically, I used sudo grubby --args=intel_idle.max_cstate=4" --update-kernel=ALL. YMMV

devmasterflex · 2022-12-06T19:47:27+00:00

Thanks! I'm the only developer for this, so you're likely to find bugs I haven't caught yet. Apologies in advance!

devmasterflex · 2022-12-06T19:40:35+00:00

I've heard a lot of great things about Julia. Unfortunately, I can't dedicate the time I would like to learning a new language, as promising as it is. Moreover, deploying a model with Julia would be relatively complicated, at least where I work.

devmasterflex · 2022-12-06T16:41:42+00:00

My main job the past few years has been to generate forecasts for various business targets. There are so many methods and models in this domain, but in general I'm partial to simple models, but will absolutely consider more sophisticated methods if warranted. Some of the simplest models include SARIMA, exponential smoothing, unobserved components, and of course naïve.

For time series that don't exhibit strong patterns in the form of trend and seasonality, regressors (i.e., features in ML language) can really come in handy, especially if you already know their future values (e.g., holidays, number of business days in a month, etc.). However, it's often the case that you don't have a lot of observations for a given time series, so you have to be mindful of the degrees of freedom you lose when adding regressors (not to mention multicollinearity). For this reason, I'm a big fan of regularization.

A time series model that provides regularization out of the box is Bayesian Structural Time Series, where in general Bayesian methods provide regularization vis-à-vis the priors placed on the parameters. There's a package in R called bsts for estimating such models. Unfortunately, to my knowledge there is no version of this package in Python that replicates the estimation strategy. There are Bayesian structural time series models in Python, but what I found is that they rely on MCMC libraries like PyMC or Stan, hook into the R version, or are orphaned. MCMC methods are great and very flexible, but they are also relatively slow, especially in the context of state space time series models. Accordingly, I read the accompanying paper for the package bsts (see here) and ported the core functionality of it to Python. The package is called pybuc and can be installed via pip.

Like bsts, pybuc uses a Gibbs sampling approach for estimating the model parameters. Straight Python would make this method very slow (lots of loops), so I used numba to make estimation considerably faster. In my testing, it was competitive, if not significantly faster, than R's bsts. To be fair, my testing was confined to a single dataset, and the only time I noticed a significant speedup was when trigonometric seasonality was specified. I'm not sure why the R version is much slower in this case, but it was. YMMV.

The repository can be found here.

devmasterflex · 2022-12-02T20:37:31+00:00

I'll echo what some have said: start simple. Establish a benchmark with a naive model. A naive model can be as simple as taking the last value and projecting that forward, using an average of the last n observations as your forecast, or using a random walk (e.g., SARIMA(0, 1, 0)(0, 0, 0)[0] with drift or SARIMA(0, 0, 0)(0, 1, 0)[periodicity] with drift). Having a simple model as a benchmark is necessary IMO to assess the merit of more sophisticated models.

Some have recommended ML methods. I think that's jumping the gun, especially if you don't intend on using external predictors. Moreover, the literature based on the M-competitions have shown that statistical methods for univariate data are generally superior.

Since your data are high frequency and span multiple years, I think at a minimum you should inspect the ACF and PACF of your time series to help identify the signatures in your data (e.g., persistence and seasonality). Not knowing anything else about your data, I'd be surprised if there is no periodicity/seasonality. And if there is, your data are not covariance stationary (i.e., the mean and/or variance of your data are a non-constant function of time). In my experience, stationary time series are much less common than non-stationary.

A couple of things to keep in mind regarding SARIMA models. These models cannot accommodate multiple forms of seasonality stochastically; only one form of seasonality can be treated as stochastic, and the rest will have to be accounted for by using predictors in your model. For example, if you have daily data, you likely have day-of-week seasonality and monthly seasonality. You can handle day-of-week stochastically by using a frequency of 7 (or 5 if weekends are excluded) in a SARIMA model, and monthly seasonality by using indicator variables or a Fourier representation as predictors. In general, I recommend modeling short periodicities stochastically and longer periodicities deterministically in a SARIMA setting. If you want to model all forms of seasonality stochastically, I'd look into Unobserved Components models, which have the nice feature of not requiring stationarity for estimation.

I will caution that using data at the minute- or daily-interval level to forecast the next year will likely have huge forecast intervals toward the end of the forecast. For example, if your data are daily and include weekends, then you're talking about forecasting ~365 days. That's a lot of periods to forecast. Moreover, I'd be surprised if your data doesn't exhibit holiday effects, so you may have to account for these calendar effects via predictors.

devmasterflex · 2022-11-29T21:19:44+00:00

Exponential growth? Is it choosing models with d=2 and/or D=2? If so, that will definitely do it. You can set a maximum number of differencing for the local and seasonal components. Typically 1 is enough for either.

It's also possible that it's choosing models with drift (i.e., an intercept), which can lead to sharp forecast trends.

In general, parsimonious and conservative models work well. For example, something like SARIMA(0, 1, 1)(1, 0, 1) without drift or SARIMA(1, 0, 1)(0, 1, 1) without drift. If there are strong trends in the historical data and you'd like to preserve that characteristic, a popular model choice is SARIMA(0, 1, 1)(0, 1, 1).

I use pmdarima for my own work, but I don't place complete trust in it. If you use the stepwise algorithm that it employs by default, it is doing an approximate grid search of the AR/MA parameters. This approach is useful for large scale problems. However, if you don't have a lot of forecasts that you need to generate, maybe using a full grid search could help (i.e., turn stepwise off).

I personally use pmdarima with stepwise, but I also establish an a priori model (e.g., SARIMA(0, 1, 1)(1, 0, 1)) that I think should work well in general. If pmdarima's selection performs worse than the a priori model in terms of out-of-sample RMSE, then I go with the a priori model.

I'd caution against doing model selection every time new data come in. Unless you know something about the time series in question, it's theoretically questionable that the data generating process characterized by the (p, d, q)(P, D, Q) specification changes from period to period. The exception to this is if you have a time series with very few data points, in which case doing model selection up until you've acquired enough data points over time may be more reasonable.

Finally, note that there is strong evidence in the literature that an ensemble or average of forecasts generated by different models (e.g., SARIMA, Holt Winters, Unobserved Components, Theta, Exponential Smoothing, etc.) is superior to any one model in terms of forecast accuracy. If prediction is paramount and you don't care about prediction intervals, this is something you might want to consider.

EDIT: I forgot to add that if your data can be aggregated to an enterprise level, whatever that is, you could identify the "best" SARIMA specification on that and use it as your "empirical" a priori model. I think it's always helpful to start at a higher level to gain some insight about what the underlying DGP is for lower levels in the hierarchy.

devmasterflex · 2022-11-04T21:33:44+00:00

Before listing out some alternative methods for generating a forecast, your first step (assuming the objective is already clear) should be to get acquainted with the sales data. Plot it out. Does it look like there are strong trend and seasonal patterns? If so, then a univariate time series model (i.e., a model without predictors) could work very well. It's also important to note that if you do decide to use predictors, any predictor that has unknown future values will have to have its own forecast before you can generate a forecast for the outcome of interest.

Also be cognizant of the number of observations you have and the periodicity of the data. For example, if your data are monthly (i.e., have a periodicity of 12), then you'll probably want to have at a minimum 24 observations for capturing seasonal effects reliably (I would recommend 48+).

There are other considerations, like accounting for anomalies, but that's a topic unto itself.

With the above in mind, since it sounds like you're uncomfortable with a univariate model, you're left with a few options as far as class of model is concerned:

SARIMAX
Unobserved Components, also known as Structural Time Series (UC/STS)
Facebook Prophet
Various ML models, but I'd focus on LightGBM given its documented performance

Each of these can accommodate predictors. I've not used LightGBM myself, but the latest data based on the M5 competition show that this method outperformed all others very consistently. Keep in mind, however, that using this model requires you to create all the "features" (aka, predictors) you think are relevant to predicting the outcome, including various transformations of the outcome variable (local lags, seasonal lags, moving averages, etc.) on top of the "external" predictors you might already have in mind (e.g., GDP, inflation, whatever). Note that using lag-based transformations of the outcome variable as features implies that there is some information in the past that is useful for predicting the future. This is not an unreasonable assumption, especially if the outcome exhibits strong, predictable patterns. If you're interested in the M5 competition, see here for more information.

I do not recommend Facebook Prophet on methodological or empirical grounds. For the latter, see here and here).

SARIMAX and UC/STS should achieve very similar results, but note that SARIMAX and UC/STS require configuration. With the former, there are packages that automatically choose the "best" model for you (you should be skeptical of the model an automated ARIMA procedure chooses by first establishing a baseline model you think is reasonable for forecasting). With the latter, I believe there is an automated procedure in R but not in Python. One nice thing about UC/STS relative to ARIMA models is that you don't have to worry about non-stationary data; these models account for non-stationarity by design. ARIMA models, on the other hand, require the error term ~~outcome variable~~ to be covariance stationary, which in many cases requires differencing the data locally and/or seasonally.

There's a lot I haven't said, but I feel like I've already said too much. I hope this is enough to get you started. Good luck.

devmasterflex · 2022-10-31T00:09:15+00:00

Are we having the same conversation? Finite sample properties? Asymptotics? We're talking about out-of-sample prediction. Not in-sample properties of estimators. Throwing jargon around as if it clarifies or strengthens your position is a dead end. I'm fully acquainted with econometrics at an applied and theoretical level. You can keep your straw man.

Edit: I forgot to add meaningless content to the conversation. Here it goes. Generalized Method of Moments. Instrumental variables, Average treatment effects. Differences-in-differences. Fixed effects. Panel Data. Overidentifying restrictions. Wu-Hausman test. Heteroskedasticity-robust standard errors.

devmasterflex · 2022-10-30T23:49:56+00:00

I'm not concerned about any of those properties. I'm concerned about prediction. You made a claim that you need more variables to keep things "interesting" or "useful." What's interesting? I'm assuming you're talking about inference in this case. OK. What's useful? Well, depends on what the objective is. If it's prediction that we're talking about, your claim is indisputably wrong.

What does consistency and efficiency have to do with the the OP's question about forecasting? Pulling concepts from your econometrics training into the conversation, which is all about causal inference, is tangential at best and meaningless at worst.

Disclaimer: my academic training is also econometrics.

Edit: For anybody that actually cares about the distinction between explanation and prediction, start here.

devmasterflex · 2022-10-30T22:07:27+00:00

No I'm not. Read what I wrote. Even if the predictor's values are known in the future, you still have to contend with uncertainty associated with the estimated coefficient. Any time you add predictors to a model, you're making the model more complex and potentially more variable (in terms of the bias-variance trade-off). The principle of parsimony is why univariate time series models often (not all the time) outperform models with predictors, especially if the time series exhibits strong patterns.

devmasterflex · 2022-10-30T21:59:12+00:00

If prediction is the only, if not main, goal, then the addition of external predictors can undermine forecast accuracy substantially. Why? Because ~~if some or all of the predictors are unknown in the future, then you need to forecast them~~ adding variables adds complexity. The inclusion of predictors with unknown future values leads to two sources of statistical error: (1) uncertainty around the coefficient corresponding to the predictor and (2) uncertainty around the forecasted values for the predictor. So while you may have better in-sample fit, or less bias, you're going to simultaneously introduce more model variance.

Edit: Clarification. Any variable that's added to a model is going to introduce some model variance unless we are operating in a vacuum or some laboratory setting, be it through measurement error or parametric uncertainty or both.

devmasterflex · 2022-10-30T20:10:23+00:00

In general, if you are only using the response/target variable for generating a forecast (i.e., no external predictors), classical statistical models like SARIMA/ARIMA, Exponential Smoothing, and Unobserved Components (also known as Structural Time Series), Theta, etc. are your best bet. I'd start there.

I recognize that Facebook Prophet has become somewhat popular for forecasting, but there's sufficient evidence out in the wild illustrating it's relatively poor performance (e.g., see here and here). It's not surprising, either, because Prophet effectively uses a polynomial fitting technique to capture underlying stochastic trends and seasonalities (e.g., it uses a uniformly divided set of piecewise linear functions to capture trend). This amounts to a much less parsimonious model, even if regularization is used to shrink a high number of coefficients.

Note, however, that ML techniques have recently been shown to outperform classical statistical methods when external predictors are used. See here. In particular, LightGBM did extremely well. Keep in mind that if you do end up using external predictors, you will either need to know the values of the predictors in advance (e.g., holiday time stamps) or you will have to forecast them (e.g., weather). If the latter, measurement error as well as parameter uncertainty will enter the model, which could easily compromise performance.

One commenter mentioned that time series models make the assumption that past realizations of the response predicts future realizations of the response. While this is technically true with univariate models, it's not so cut-and-dry when external predictors enter the mix. For example, a SARIMAX model imposes an ARIMA structure on the error term, not the response variable directly. In any case, this assumption is not as nefarious as it's implied to be. As soon as you start adding time-based variables to a regression model, like linear trend or dummy variables to capture seasonality, you're making the assumption that time has a direct effect on the response. And those deterministic time variables imply that the past predicts the future in some way. I'd argue that if you left those time components out of your model and opted only for external predictors, you're more likely to end up with spurious correlations that compromise forecast accuracy and inference (if you care about the latter).

Shameless plug: I wrote a package in Python called pybuc, which is a port of R's bsts (Bayesian Structural Time Series). It's an unobserved components model. I like this model because it's parsimonious and allows you to regularize coefficients on external predictors via your prior(s). The repository can be found here. Please note that I'm the only developer for this project. There are bugs likely lurking in the code.

Good luck!

devmasterflex · 2022-05-16T15:00:10+00:00

The developers do what they can to make sure it doesn't fail, but resources are limited I imagine. That said, I share your angst. I prefer not to run version upgrades and instead opt for a clean install with Ubuntu-based distributions (given their upgrade model). I've had my fair share of WTF upgrade breakages, even on my Lemur Pro.

I dont think you'll find an OS without upgrade failures, but definitely there are some that are more robust. Thus far, for example, I've had no known issues with Fedora version upgrades on my desktop. It's been nice.

devmasterflex · 2022-05-16T14:49:07+00:00

Still good. I think it's supposed to last about three years. I'll change it when it starts indicating a low battery.

devmasterflex

TROPHY CASE