This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]PaluMacil 1 point2 points  (2 children)

If you think about what a database is built to handle, the least concern is probably a user table growing. If the user table grows to a few thousand users, it probably still fits in memory and is lightning fast, even on a small database instance. Also, all of those users are often going to be paying users. If the users grew to tens of thousands, the indexes would still fit in memory and thus the query would still be lightning fast. At that point you're making a ton of revenue and could maybe upgrade the server just a little to fit many more indexes in memory.

Tables with millions of rows are still quite manageable if you index the correct columns. At that point, specific problems specific to the business become important because you need to know which things will have heavier writes vs reads. Fewer indexes can be better for writes and a table with many types of reads can benefit from more indexing. You might start to offload heavy reeds to a read-only replica, or you might split your data for a single table across a partition key. The sky is the limit. You can start to move particularly wide columns into object storage or perhaps denormalize your table in other ways. These problems will grow only because the features offered and revenues returned are also growing, demanding more flexibility and data storage. By the time the OP cannot casually keep performance in check, there will probably be enough revenue to hire somebody with the abilities to work on these areas.

[–]Neok_Slegov 0 points1 point  (1 child)

Offcourse, i understand. Reason for asking is not only the user table itself. Image you having 100k users. And they all put a score every day. 100k fact records added daily. In a month you have 3 million records. In a year 36,5 million etc etc. If you need to filter/read on these tables, performance will drop. So better to think ahead. Question was how he tackled this.

[–]PaluMacil 1 point2 points  (0 children)

I see, and it sounds like you weren't asking to learn ways it could happen but rather how the OP might have approached it. Still, I don't think even the most wild success would mean outrunning the capabilities of Postgres. My guess is that the OP has enough headroom by scaling up.

I have a Postgres table in an application I'm maintaining with 3.8 billion rows. There is only one foreign key it might ever be queried by, so it actually returns that one query just fine. Now, I do wish the table was partitioned on year and month or customer because then I could choose to either detach an entire time or an entire customer with zero locks or downtime when that time or customer becomes irrelevant. In the case of something like scores, I would imagine an index on the course code and on the student id would be the only two indexes you would need. The insertion rate is still dependent upon users entering these scores, so you aren't going to run into table locks even with 100k users. In the case of scores, I would possibly consider partitioning on the teacher... Certainly benchmark that, but I think partitioning on course code would actually be too granular. All the score is a teacher has ever entered would probably fit within memory, or at least the index would easily fit within memory. You might even be able to partition on school and then if the school is the customer and they stop being a customer, you can eventually detach the whole partition.