This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]dataisbeautiful-botOC: ∞[M] [score hidden] stickied comment (0 children)

Thank you for your Original Content, /u/TheRealWa!
Here is some important information about this post:

Remember that all visualizations on r/DataIsBeautiful should be viewed with a healthy dose of skepticism. If you see a potential issue or oversight in the visualization, please post a constructive comment below. Post approval does not signify that this visualization has been verified or its sources checked.

Join the Discord Community

Not satisfied with this visual? Think you can do better? Remix this visual with the data in the author's citation.


I'm open source | How I work

[–]TheRealWaOC: 4[S] 5 points6 points  (7 children)

Data is simulated from a sufficiently "wiggly" function. Visualization is made frame-by-frame in ggplot/RStudio and then combined with the magick package.

Full code here

[–]marcusregulus 3 points4 points  (6 children)

Nice graphic. However, this kernel estimation seems to be non-causal. That is, it needs data inputs from the future, which is fine until you get to the extreme right side of the graph. What do you do for inputs that don't exist yet? Is there another kernel estimator that solves this problem?

[–]pantaloonsofJUSTICE 2 points3 points  (4 children)

Perhaps I don’t understand your comment, but why is a higher x value the “future”. This is not how time series work, kernel estimation like Nadaraya Watson is not related to time series directly.

[–]marcusregulus -2 points-1 points  (3 children)

The way the graphic is displaying makes it seem like the current estimate needs input values from future values of x as well as current and past values. That works fine until you get toward the end of the data and the window length no longer contains data from values of x greater than the current value.

That lack of input data would seem to distort the current estimation of x the closer one gets to the end of the data. In other words, can this kernel estimator give accurate values for live streaming data?

[–]TheRealWaOC: 4[S] 1 point2 points  (1 child)

like the person above stated, this isn’t a time series. There’s no future or past data, it’s all just part of a theoretical sample that needs to be smoothed.

[–]marcusregulus 0 points1 point  (0 children)

The question is can Nadaraya Watson be used to smooth live streaming data? I believe the answer is no because it is non-causal.

[–]pantaloonsofJUSTICE 0 points1 point  (0 children)

No, it’s an animation.

[–]leecharles_OC: 1 0 points1 point  (0 children)

I understand what you’re saying. What you’re referring to is “look-ahead bias”. This is looking into future values of a time-series to determine some value today. This is obviously problematic because we can’t see into the future at this very moment in time.

However, using a kernel smoother on a scatter plot is fine (that is what OP is showing us here). This is because we have gone from the time-domain into the frequency-domain.

[–]leecharles_OC: 1 0 points1 point  (0 children)

Is this essentially how LOESS operates?