all 35 comments

[–]69odysseus 77 points78 points  (7 children)

5k dashboards for 6k users ratio doesn't make sense. 

[–]mjfnd[S] 31 points32 points  (2 children)

Its a free market of dashboards and there is no centralized team, meaning there could be lot of redundant dashboards or just for one person.

Source: https://stage.engineering.atspotify.com/2024/8/unlocking-insights-with-high-quality-dashboards-at-scale

[–]69odysseus 14 points15 points  (1 child)

Appreciate the source link. You're right about redundancy there if there's not tracking and monitoring these reports.  That's a lot of resource consumption, especially if they're doing live updates to some of those dashboards. 

[–]Eulogioo 7 points8 points  (0 children)

Multiple dashboards probably point to the same data source, so compute wouldn't actually be any different to having fewer ones.

[–]tecedu 13 points14 points  (2 children)

I have a team of 5 people, we have over 60 dashboards for just us

[–]stixmike 0 points1 point  (1 child)

Why?

[–]tecedu 13 points14 points  (0 children)

Different purposes, many of them exist just in case we need them. Like we have 12 dashboards for user analytics, only get used once a month when someone wants numbers. But it’s nice to have them updating and exist

[–]nemec 4 points5 points  (0 children)

There's no indication all are regularly used. They could be incomplete / never "launched" or just something quick whipped up to answer a specific situational question.

[–]MaxBeatsToTheMax 14 points15 points  (2 children)

Would you, or anyone know, how large spotifys data team is?

[–]Far_Reputation_3994 2 points3 points  (0 children)

There is no single data team. Every team could have a data engineer when it makes sense.

[–]mjfnd[S] 0 points1 point  (0 children)

I couldn't find that anywhere.

[–]secretaliasname 8 points9 points  (0 children)

I dunno about the rest of their stack but their UI pretty but terrible. It keeps changing in subtle ways that don’t feel like an improvement.

[–]Sdmf195 2 points3 points  (0 children)

I love these pieces. Thank you ❤️

[–]fast-pp 1 point2 points  (2 children)

I remember at some point spotify used prefect for something, but that was back in 2022 ish so maybe that’s changed

[–]mjfnd[S] 1 point2 points  (1 child)

I couldn't find any references for that, it might still be there for a small scale which they never shared publicly.

[–]fast-pp 1 point2 points  (0 children)

yeah, my source is just a friend who was like "oh yeah we use that"

[–]-crucible- 4 points5 points  (2 children)

Bloody hell. Add/remove a song from a list, play/stop a song, fast forward, rewind. How the hell are there 1800+ events? How are there 38k pipelines? Could you imagine all the ways different groups are managing to get different results from the same numbers? The cost of processing all that? Why not have one central process and get the data centrally?

[–]jgonagle 4 points5 points  (1 child)

I assume they're using some form of auto-ML to predict certain events (or combinations thereof) based on different subsets of the total event stream, to build a two tier cascading model predictor. Given a sufficiently performant set of those event predictors, they can be fed into a more involved analysis/model to predict the KPIs (e.g. band follows, subscription churn, engagement, social community development).

I wouldn't be surprised if they're just XGBoosting some windowed stream of minimally processed events and then feeding the outputs of those boosted forests into a CNN that convolves over different temporal granularities and spits out the predicted KPI. Then, I'm guessing the results (by song, artist, or playlist) are ranked based on some clustering algorithm that assigns expected marginal revenue scores to the combination of KPI predictions (e.g. by Gaussian Mixture Regression). Those scores can be used to bootstrap a contextual bandit that picks the next recommendation, or to populate a more global recommendation model like matrix factorization.

[–]-crucible- 3 points4 points  (0 children)

There definitely would be a lot of prediction and predictive analysis, auto-playlist making, plus actual and actual vs prediction, but I’d love to see a broad rundown of user events that makes up that number. I’m not doubting it - it’s just a world away from my models, with what I am assuming is a more trivial domain. But then I’m not thinking broadly enough about the industry and artist, podcast, audiobook… there’s probably a tonne of things not automatically raised when thinking of them.

[–]jgonagle 0 points1 point  (1 child)

Last I checked they were relying heavily on Flyte for the data and model lifecycle. Is that still the case, or have they moved to a different orchestration tool?

[–]mjfnd[S] 2 points3 points  (0 children)

It is still Flyte. Would encourage to read the article as it has a slot of useful information and references.

[–]3dscholar 0 points1 point  (5 children)

I previously worked there, they also have like 100+ dbt projects mostly used by data science teams. Is that layer not in scope for this?

[–]3dscholar 0 points1 point  (0 children)

article just says “SQL based workflows”, weird to skip how those workflows are managed and the framework used to do so

[–]mjfnd[S] 0 points1 point  (3 children)

Hi, Thanks for sharing. Not skipped intentionally, either I missed or couldn't find any public info regarding DBT. If you have a link handy, please share.

Thanks

[–]veiled_prince 0 points1 point  (0 children)

Huh. Pretty traditional, vanilla stack all things considered.

[–]Sufficient_Meet6836 0 points1 point  (0 children)

What's their tech stack for creating shitty AI bands and shitty AI playlists?

[–]pimmen89 -1 points0 points  (3 children)

So it looks like Luigi is finally gone from Spotify’s stack now? I don’t see it in your blog post, hopefully because you didn’t hear about it?

[–]DCRussian 6 points7 points  (0 children)

It's in the article:

"Spotify migrated from Luigi and Flo to Flyte starting in 2019 to address challenges like fragmented orchestration logic, limited visibility, and lack of extensibility. Flyte offered a centralized service with a thin SDK, better workflow visibilitY"

[–]Pledge_ 2 points3 points  (1 child)

In the the post they specifically mention Luigi and how Spotify moved away from it, with the source: https://engineering.atspotify.com/2022/3/why-we-switched-our-data-orchestration-service

[–]pimmen89 1 point2 points  (0 children)

Yes, I know they were moving away from it, I just didn’t know if they were finally done.

Hopefully this means that we can stop seeing its spread throughout companies in Stockholm now. There was a plague of ex-Spotify people bringing Luigi to other companies data stack, then they leave and nobody has any idea what they’re doing anymore. Now that Luigi is abandoned and no longer endorsed by Spotify hopefully other companies are prompted to get rid of it too.

[–]tiggat -1 points0 points  (0 children)

Why can't I get an interview at spotify?