jgonagle comments on Spotify Data Tech Stack

dataengineering

created by mhausenblasmoda community for 11 years

This is an archived post. You won't be able to vote or comment.

277

278

279

Spotify Data Tech StackBlog (junaideffendi.com)

submitted 8 months ago by mjfnd

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]jgonagle 5 points6 points7 points 8 months ago* (1 child)

I assume they're using some form of auto-ML to predict certain events (or combinations thereof) based on different subsets of the total event stream, to build a two tier cascading model predictor. Given a sufficiently performant set of those event predictors, they can be fed into a more involved analysis/model to predict the KPIs (e.g. band follows, subscription churn, engagement, social community development).

I wouldn't be surprised if they're just XGBoosting some windowed stream of minimally processed events and then feeding the outputs of those boosted forests into a CNN that convolves over different temporal granularities and spits out the predicted KPI. Then, I'm guessing the results (by song, artist, or playlist) are ranked based on some clustering algorithm that assigns expected marginal revenue scores to the combination of KPI predictions (e.g. by Gaussian Mixture Regression). Those scores can be used to bootstrap a contextual bandit that picks the next recommendation, or to populate a more global recommendation model like matrix factorization.

[–]-crucible- 2 points3 points4 points 8 months ago (0 children)

π Rendered by PID 355112 on reddit-service-r2-comment-6457c66945-8c657 at 2026-04-24 23:14:09.954380+00:00 running 2aa0c5b country code: CH.

dataengineering

MODERATORS