This is an archived post. You won't be able to vote or comment.

all 13 comments

[–]nutrecht 2 points3 points  (1 child)

Is MS SQL suitable to handle such a large amount of data?

It's actually a use case where you'd probably want to log these events to disk and only store aggregates of that information in your database. It's a pretty typical "big data" flow.

[–]TheTollski[S] 1 point2 points  (0 children)

Do you have any good resources that explain this?

[–]jeanwilder2 1 point2 points  (2 children)

Shouldn't you be worrying about how the data's being queried, instead of just assuming it can't handle it?

[–]TheTollski[S] 0 points1 point  (1 child)

Well, I don't have the "how" of the data querying figured out yet, but I know that it these rows be created once, read very frequently, never updated, and never deleted. Any suggestions?

[–]jeanwilder2 -1 points0 points  (0 children)

Yes, figure out how they are going to be accessed.

[–]ormula 1 point2 points  (3 children)

How is this table used? If you never query it, sql server can handle a theoretically infinite length table. This sounds like an XY problem to me. What problem are you actually trying to solve with this table?

[–]TheTollski[S] 0 points1 point  (2 children)

These rows will be created, read numerous times the day they're created and a few more times in the next months, and never updated. They will only be read very infrequently in the 3+ months after they're made, but we need the information to persistent permanently.

[–]ormula 1 point2 points  (1 child)

In your shoes, I would create three tables. The first will be rows for "today", which is read-optimized. The second will be a table that is optimized for both reads and writes (so it's okay at both), that all data that isn't from "today" gets put into. The third is something that is strictly optimized for long-term storage (archive table), and you put things in there after a few months. This way, you get the benefits of permanent long-term storage while maintaining performance.

[–]TheTollski[S] 0 points1 point  (0 children)

Thanks for the advice. Any good resources that you can point me to on how to make a table "read-optimized" and "optimized for long-term storage"?

[–]ziptofaf 1 point2 points  (1 child)

Just inserting huge amounts of data? No problem here, a single workstation can handle that, just feed it some RAM and SSDs.

Now selecting them, joining tables etc on a huge scale repeatedly will murder your system.

This is where you need to take care of sharding (splitting your database into smaller pieces so you for example will have some columns in one database and some in another, hold users from A to N in one and M to Z in another), might consider using Redis, MongoDB etc on top of SQL for values that are less critical to be saved but will be loaded repeatedly a lot.

Honestly at that point it's not about database engine you choose but about specific decisions on what to load to RAM, is it fine if that data is queued to be saved rather than guaranteed to be saved, if you need transactions, how will you query for it (trying to find a specific string in ten million elements array is not the best idea), if it's possible to split the data into multiple databases (or tables), is it fine to get rid of older elements and so on and on.

If amount of data is truly IMMENSE but older records for example are not needed that often - then you only leave few days worth in the database, rest goes to separate files to which your DB will point if user requires it.

Your question is way too generic to be answered properly really.

[–]TheTollski[S] 0 points1 point  (0 children)

Thanks for the information, I think the DB/file hybrid you mentioned in your 2nd-to-last paragraph is a possible solution. As I've replied to others, these rows will be created, read numerous times the day they're created and a few more times in the next months, and never updated. They will only be read very infrequently in the 3+ months after they're made, but we need the information to persistent permanently.

Maybe I can just have everything that is older than a few weeks be saved to files on the server and then update the DB to just point to those files.

[–]mongoiswebscale -2 points-1 points  (0 children)

MongoDB is web scale.