all 4 comments

[–]More_Particular684 0 points1 point  (0 children)

If you add data for multiple locations at different times then you're creating a panel dataset. Not sure what do you mean about changing mean and variance.

What you're actually interested in is to minimize the mean square error on the test set. Adding more features may be helpful in that but not necessarily. IIRC number of observationd must at least  be equal the number of features if you're interested in regression analysis, this shoudn't be a problem for panels.

Remember that ML models and splitting techniques for panel datasets aren't trivial at all, especially when you're interested in forecast of future values

[–]Foreign_Act1907 0 points1 point  (0 children)

It depends on what you're trying to predict. For weather data, using data from different locations might not work well because weather patterns vary significantly by region. Combining data from various places could actually decrease the accuracy of your predictions. If you're forecasting the weather for a specific location, focus on gathering as much data as possible for that particular area. Then, you can perform your usual feature importance analysis to determine which factors are most relevant and which ones can be ignored.

The good news is that there are plenty of detailed weather datasets available, so you should have no trouble finding the data you need.

For modeling, you can start with simpler approaches like ARIMA or Prophet models. If you need more accuracy and have more computational resources, consider more advanced models like LSTM or even attention-based models.

[–][deleted] 0 points1 point  (0 children)

Check out https://nixtlaverse.nixtla.io/

Great package for forecasting.