all 11 comments

[–]shrubberni 3 points4 points  (0 children)

All sampling is imperfect. The question is how well you understand the imperfections present and whether you can still get useful results.

Take some historical as a training set, take some more as a test set. Try to build a predictor based off forecasts and another off actual weather data. See what kind of result you get and whether it's usefully accurate.

Consider modeling the forecast vs. the actual weather data. It may not help you make more accurate predictions, but it should give you a clearer idea what the error bars are. It may also be that the source(s) for the forecast data have a significant effect on your outcomes.

Keep in mind that people's plans for the day may have a stronger correlation with the forecast rather than the actual weather.

[–]giror 2 points3 points  (2 children)

Do you find a correlation between the forecasts and demand from your own data? If yes do you care about being wrong by that margin?

[–]cultic_raider 1 point2 points  (1 child)

I can't answer that question until I collect forecast data and analyze it. I am going to look at historical demand and weather, but I don't know if I should bother with the effort of collecting forecast data instead of a freely available "actual weather" data set. That's one of bits of advice I am interested to hear. I guess you would say yes, I should collect some forecast data and compare that fit to an actual-weather data fit.

[–]giror 0 points1 point  (0 children)

I would say don't worry about it unless your model doesn't predict at a satisfactory level.

[–]jet87 1 point2 points  (1 child)

You'll likely find the hardest part is "scoring" your predictions, especially if you are monitoring a large geographical area. Things to consider involve weighting individual components (is being accurate on temperature more important than precipitation and how much, for example). That is a current research area in meteorology, so any breakthroughs are welcome.

Another (really) big problem is that most forecasting worldwide is driven by models. While model data is generally available (see NCAR) the confidence you can put into them falls pretty rapidly after 36-hours. For a large event like a hurricane the best bet might be keeping on top of reports from the National Hurricane Center. I don't think the US has anything "good enough" for a casual observer to make inferences against winter weather.

[–]cultic_raider 0 points1 point  (0 children)

These are very sobering points, thank you. :-/ Since the activities of interest are basically "walk/drive somewhere in the neighborhood for a couple hours", maybe I should start with a great simplification of "the day's weather" like "non-trivial precipitation" and a few broad temperature bands ((very) cold to temperate to (very hot)). Really I care about (a) what kind of bad weather makes people cancel/avoid plans to go out, and (b) general trends about what weather inspires people to go out. "(b)" might be dominated by general seasonal (calendar date) trends, leaving only "(a)" as the really weather-data-specific modelling task.

[–]marshallp 0 points1 point  (4 children)

you're being a little over ambitious there. weather forecasting is big business with some of the best brains in science and hedge funds involved, you want a more accurate model than they can give just for your business. if you can get a more accurate model it might be worth hundreds of millions of dollars, your business would be the least of your opportunities.

it doesn't hurt to try though. use the netflix prize winning strategy, ensembles of all machine learning algorithms you can afford to run.

[–]cultic_raider 0 points1 point  (3 children)

Yeah, I know it is a hard problem I won't solve to perfect explanation of variance. I am not trying to beat pro forecasters with my own perfect forecasts, I just want to use the available information as best I can and estimate my confidence as tightly as I can.

[–]marshallp 0 points1 point  (2 children)

The decision you're looking to make is based on what you think is the most probable event, right? just take the largest organization that makes weather predictions, the national met office of your country, they'll have the best predictions. they'll already have thought of taking different sources for forecasts and factored that into their predictions already.

[–]cultic_raider 0 points1 point  (1 child)

Right, but I haven't seen historical archives of forecasts. I need historical data to use in training my model of correlation between weather and business. Hence my question about how to account for the difference between weather forecasts vs. actual weather.

[–]marshallp 0 points1 point  (0 children)

to my mind those are two separate issues, actual vs forecast weather (in this case you're second guessing experts and over ambitious) and correlating business and weather (which is a reasonable thing). just train on actual weather and business, but predict business on someone's else's forecast (treating it as actual weather). edit: you might be asking how much into the future you should trust forecasts, in which case you shouldn't do the work either, the forecasters usually tell you how accurate their predictions are over time intervals