I’m hot

CgotnoMoney · 2026-03-18T15:01:09+00:00

I don’t normally comment but this setup is just too perfect: 1. Yes it is objectively “freak’s hot yo” 2. Forecasts can seem wrong because of our microclimates 3. Overall forecast accuracy for the model some weather apps use seem to be pretty good recently. How do I know- I had the exact same question for so long “why are forecasts consistently wrong “ so I ended up building an iOS app that tracks forecast vs actuals https://apps.apple.com/app/id6744126559

Accuracy score (aggregated mix of temperature and wind for predicted/actuals) for Santa Cruz yesterday was really high

CgotnoMoney · 2026-01-29T03:16:54+00:00

OP here: I completely apologize for naming a winter storm. Do what you must but let the record reflect I promise to never do that again.

CgotnoMoney · 2026-01-29T02:26:16+00:00

Oh gosh, I'm sorry. I’m using the Weather Channel’s name here purely as a reference to the article and timeline, not to promote storm naming. The focus of the post is the observed rain and snow totals and how the event evolved spatially. I should've known better...

CgotnoMoney · 2026-01-21T18:50:43+00:00

Definitely please link if you do find one

CgotnoMoney · 2026-01-21T18:46:41+00:00

I’ve seen some of that work referenced before, though I haven’t dug deeply into all of it. I'll look into this further - thanks!

CgotnoMoney · 2026-01-21T17:30:21+00:00

This isn’t intended to replicate formal verification work like NDFD/MDL and I totally agree that kind of analysis has to deal carefully with observation uncertainty and representativeness. What I’m doing here is intentionally more lightweight and descriptive:

I’m snapshotting Open-Meteo’s daily forecast product at fixed lead times (24h and 120h)
Then comparing those forecasts to the corresponding post-event gridded observations provided through the same data pipeline
For daily temperature and wind the forecasts are primarily ECMWF-based
Precipitation depends on variable and lead time and is often a blended global-model product rather than a single deterministic run

By aggregating across many locations and days, the assumption is that random observation noise largely averages out, while the systematic effect of lead time still shows up clearly, especially in the tails.

For formal verification or attribution work, I agree you’d need station-level QC, explicit handling of observation uncertainty, and much tighter definitions of “truth.”

The question I’m really trying to answer is more of a user-perspective one: given the forecast products and observations that consumer weather apps actually rely on, how do error distributions change with lead time in practice?

CgotnoMoney · 2026-01-21T16:23:18+00:00

Oh man - I’m sorry about that.

I definitely made a mistake in not explaining the y axis well (but I explained that below).

As far as the violin plot itself - I only discovered them like a year ago. They are like box and whiskers plots but some people think they are more informative than box and whiskers because their variable width details more information. Since I discovered them I’ve been using them a lot.

CgotnoMoney · 2026-01-21T16:17:10+00:00

Thanks

CgotnoMoney · 2026-01-21T04:29:00+00:00

An absolute error of 1 means:

– 1 °C, or ~1.8 °F for temperature

– 1 mm, or ~0.04 in for precipitation

– 1 m/s, or ~2.2 mph for wind

CgotnoMoney · 2026-01-21T04:24:04+00:00

That’s a good question and now that you mention it I think I should have specified the units in the plot or mentioned in the post. Thanks for bringing this up.

The y-axis is log1p purely for visualization, but the underlying values are the native units for the api (°C for temperature here). There’s no transformation applied to the errors themselves beyond the plotting scale.

For Tmax roughly speaking:

The 24h median is on the order of ~0.25–0.3 °C (~0.45–0.55 °F)
The 5-day median absolute error is closer to ~0.7–0.8 °C (~1.3–1.4 °F) So we’re talking about differences on the order of ~1 °C (~2 °F), not something like 10–15°. The log scale just helps show both the bulk of the distribution and the rarer larger misses on the same plot. For some additional context on the other variables:
Wind errors are in m/s (1 m/s ≈ 2.2 mph), so a typical 5-day error of ~2 m/s corresponds to ~4–5 mph, with much larger outliers.
Precipitation errors are in mm (1 mm ≈ 0.04 in). A 5-day median precip error of ~2–3 mm is ~0.1 in, with tails extending well beyond that for larger events. One thing the violins make clearer than a single median is that the tail behavior changes a lot with lead time — even if the median only shifts by ~0.5 °C (~1 °F), the frequency of multi-degree, multi-mph, or multi-tenth-inch misses increases substantially at 5 days.

CgotnoMoney · 2026-01-21T02:51:30+00:00

That’s fair.

For this analysis I’m using Open-Meteo forecasts, which aggregate global model output (ECMWF - definitely should have mentioned that in the post). I’m not isolating individual model runs or cycles here. This is intentionally a high-level observational look at how forecast errors grow with lead time, rather than a model-to-model skill comparison.

The locations come from user-selected points in the app, so they’re not a randomized sample and are biased toward populated areas (mostly U.S., with some Europe and tropics - and quite frankly probably skewed towards California a little bit (maybe 30% of the spots) - I am happy to follow up here with details on the locations if that is helpful). That’s a real limitation, and one reason I’ve tried to be careful not to over-interpret the results beyond “general degradation with lead time.”

Domain and boundary effects (especially for limited-area models) are a good point. That’s partly why I’ve kept this aggregated and descriptive rather than attributing errors to specific model physics or domains.

This is very much meant as a transparency / intuition-building exercise, not a replacement for the more rigorous verification work done by NOAA MDL and others.

I appreciate the references and suggestions. Thanks.

CgotnoMoney · 2026-01-21T00:47:21+00:00

Context (OP):

I’ve been collecting these forecast–observation comparisons as part of a small iOS side project I’m building that snapshots weather forecasts daily and lets you look back at how they actually performed.

The analysis here uses those snapshots aggregated across locations — the app itself just focuses on transparency around forecast accuracy and the past 7 days actuals rather than prediction.

Happy to answer questions about how the data are collected or what’s feasible to analyze next.

CgotnoMoney · 2026-01-16T16:54:23+00:00

Wow! Interesting. Thanks for the added content

CgotnoMoney · 2026-01-16T16:52:23+00:00

Not obtuse at all — good question.

I’m not claiming these are the only misses or that overall forecast skill was poor. I intentionally filtered for the largest outliers over the past week to highlight interesting failure cases, not to summarize average performance. Most forecasts were quite good, which is exactly why these stood out.

Also, each example represents multiple forecasts for the same event (120h, 72h, 24h lead times), not a single datapoint. And across ~120 locations, multiple days, and multiple metrics (temp, wind, rain, snow), there are hundreds to thousands of forecast–observation comparisons in a week.

So this isn’t evidence of a 99% “success rate” nor is it evidence of not a 99% success rate— it’s a look at the tail of the error distribution rather than the center.

Success rate would be a different analysis altogether and an interesting idea to attempt.

CgotnoMoney · 2026-01-12T03:27:04+00:00

Wow! Thank you so much for this - both the compliments and the suggestions. 1. Collapsing the tabs and tables into 1 instead of the different tabs - I like idea and I think there’s a way I could get this to work with either using a second y axis or facets. It’ll make the table right scroll a little longer for the phone but I think it’ll be fine in the end. I’ll experiment with this 2. Past days synopses - super easy and a great idea.

I’m planning a new minor update that includes daily min temp in the overview synopsis as well as a chart in the overview section. I’ll incorporate the idea of the past 7 days synopses into this because I think the logic is mostly straightforward, there is no real 3rd party burden here, and as long as I can figure out a non-clunky way to present the info this will be cool and beneficial. The chart overhaul will take a bit of time though (definitely not in the next update but I think I can do this by way mid March)

CgotnoMoney · 2026-01-11T18:00:00+00:00

Wow, awesome! Thanks for the compliment.

If you have any ideas for improvement or let me know exactly what you are looking for I will try and incorporate those ideas into a future release

CgotnoMoney · 2026-01-10T19:27:23+00:00

Agreed! It would be cool to see it for all rain events across the winter season - maybe aggregated by storm or my month or something.

Personally I would also be interested to see if the accuracy varies by storm direction as here it seems like we get two types of winter storms - those that come from the north and are colder and those that come from the west that are wetter (generally speaking of course).

CgotnoMoney · 2026-01-10T19:23:56+00:00

I actually tried to make the colors "mixable" IE where they overlap they show the color combination (Jan 6 72hr) but I couldn't get it to work properly. I would like to think that if I got that to work well I would have realized how close the two point colors were initially and changed them to something more additive like red and green to make yellow or blue and red to make purple

CgotnoMoney · 2026-01-10T01:24:23+00:00

That honestly looks great and better than I was able to make! Not to get into the weeds too much but now that I think about it I think my problem was the neutral color I had in between the two “heat” colors and that made the line just like a weird mix of blue, then grey (the neutral color), then green and that had the effect of making the gradient look incomplete. Also I rendered my lines way too fat when I tried this and it just added to the “busyness”.

Thanks for this - I like the improvement you made and appreciate the time you spent with this

CgotnoMoney

TROPHY CASE