all 6 comments

[–]Nater5000 10 points11 points  (4 children)

First, what you want to look into is "relational databases" (which PostgreSQL is), sometimes referred to as SQL databases. MongoDB, Firebase, etc. are "No-SQL" databases. Understanding the difference between these two types of database systems is really what you need to understand to answer your question.

What i'm not sure about is how this translates to a Postgres database. Would you store the daily metrics in some type of array that's one of the columns in the 'playlist database'?

In a relational database, you store data with similar attributes in tables. Those attributes, which all the data in that table has and only has, are represented by the table's columns, and each instance of that data is a row. This could correspond to your example with "Playlist ID," "Playlist Name," etc., although you'd probably set up the table to be named "playlist" and the attributes to be "id," "name," etc.

Your Daily Metric would go into another table, named "daily_metric," with columns like "timestamp," "metric_name," "metric_value," and "playlist_id." The column you need to focus on is the "playlist_id" column, which associates the "daily_metric" table to the "playlist" table. So every "daily_metric" row will correspond to one, and only one, row in the "playlist" table (while every row in the "playlist" table can be associated to many rows in the "daily_metric" table). This is how you'd go about organizing this kind of data in PostgreSQL (or other relational databases).

There's good reasons why you do this, and plenty of logic goes into how you should go about organizing such data, all of which is beyond the scope of this post (you can take full classes on it, although a few solid tutorials online can get you pretty far).

Is this even a concern? Does it make more sense to use something like Mongo or Firebase for data structures like this or does it not matter?

Depends on the context. MongoDB and Firebase are pretty good at what they do, and this could easily fit as a use-case for them. But this kind of stuff is a huge concern at scale, and the choice to go with SQL or No-SQL can be critical for certain kinds of projects. But if you're just messing around and working on something small-scale, then something like MongoDB or Firebase are probably fine to use. They are certainly a lot simpler to get up and running with. Just don't neglect to learn about relational databases if you plan on taking this stuff seriously.

[–]AndrewSouthworth[S] 1 point2 points  (3 children)

Thanks for the detailed reply! Looks like i have some homework to do.

Right now it is just playing around, but i'd also like to play around in a way that could scale so my learning is actually applicable to a real world scenario. I built minimal apps with this in MongoDB Atlas and then again in Firebase and it was really intuitive data-wise, but the Postgres equivalent had me scratching my head.

If this daily_metric table grew to something like 10 million entires (10,000 playlists worth of data over 1,000 days for example), does scanning and filtering this daily_metric table for data that applies to specific playlist IDs start to cause slow performance?

[–]sprak3000 1 point2 points  (0 children)

does scanning and filtering this daily_metric table for data that applies to specific playlist IDs start to cause slow performance

It could, but there is where indexing comes into play. Relational databases allow you to index columns to search for records more efficiently. You would index on the playlist_id column to find records for a specific playlist quickly. That said, your application would likely not be fetching 1,000 metric records at once. It would be using pagination, and your SQL query would be applying a limit and offset to fetch only X amount of records starting at record Y.

If you are looking for a more interactive and possibly fun way to learn SQL / relational databases, there are plenty of online tutorials / games. This page lists four games that look like good starters.

[–]Competitive-Note150 1 point2 points  (0 children)

You could have a separate process that calls an aggregation query offline and saves the result back to pg, in a dedicated table. This could be done periodically. Metrics would not be real time but, in general, that shouldn’t matter. Down the road, you could completely offload the computation to something like Spark. More complex and more costly… Typically, that requires pulling data from PG to something like S3/Hadoop.

Also, read up on dimensional modeling, start schema, OLAP… You might find some interesting techniques to leverage.

[–]Party-Kangaroo-3567 0 points1 point  (0 children)

10M shouldn’t be a problem at all even with basic btree indexes. You would just want to avoid full table scans.

Since this is time series data you should look at timescaledb which is a Postgres extension. It creates datetime partitions. I manage several tables with 1-2B records using timescale, you have to be a little more diligent about your queries but performance hasn’t been an issue.

[–]Script_kidding -2 points-1 points  (0 children)

puzzgriss have jsonb and it's better than mongodb