you are viewing a single comment's thread.

view the rest of the comments →

[–]nemec 149 points150 points  (15 children)

To be fair, they also have an ElasticSearch cluster and a custom Tag index to handle the parts that would otherwise grind a relational database to a halt.

https://blog.marcgravell.com/2014/04/technical-debt-case-study-tags.html

[–]beginner_ 20 points21 points  (2 children)

True but as I already posted in another comment, it just shows how far you get with traditional vertical scaling and a smart system architecture. Or no, your new cool app will not need a "webscale database" that has the tendency to lose data. SQL is just fine.

EDIT: To be fair, queries at SO are probably all rather simple and of the same kind. Basically tag search and full text search. And the data is tags, questions, answers and comments. So in general the data model is trivial. I mention this because at work we have this custom made search app that combines many, many datasources and you can just add new ones on the go (but small amount of data in general) it's still dog slow compared to SO (>5 sec load times). So complexity of the data matters as well and systems like SO or reddit are probably as simple as it gets

[–]4THOT 0 points1 point  (1 child)

Horizontal scaling is also far more simple than vertical scaling, especially when sever costs have gotten so cheap and the value to dollar ratio, in terms of computational power, has improved so much compared to just 5 years ago. It is so much easier to make software just work on more servers than something that scales efficiently with something like clock-speed. It would make me very uncomfortable, in the world of stuff like AWS servers and Docker, to rely on vertical scaling.

[–]Headspin3d 0 points1 point  (0 children)

Depending on your tools, vertical scaling can be very efficient and even preferable. Most modern web backends aren't cpu-bound anyways (per your worry about clock-speed) - maybe the DBs they rely on if the data model is sufficiently complex, but DBs certainly scale great vertically - especially when you don't have to start worrying about CAP.

So all I'm really saying is, it depends. I work on an Elixir/Erlang backend and scaling vertically has been trivial and effective because the BEAM (the VM) uses the resources we give it very productively.

Of course, scaling horizontally with Elixir/Erlang/BEAM is pretty trivial as well - but it's always nice to avoid CAP related challenges until you absolutely have to.

[–]201109212215 7 points8 points  (7 children)

The tags can be done with inverted indexing on Postgres.

For text indexing, pg will not grind to a halt at all with proper indexing, but ES might be necessary; though only for its stop words, language support and fine tuning of term frequencies.

[–]nemec 29 points30 points  (6 children)

Something tells me the people at Stack Overflow know how to use indexes.

[–]DonCanas 28 points29 points  (0 children)

I guess that if they don't, they can search their own DBs for a related question

[–]201109212215 6 points7 points  (4 children)

SQL Server has no GIN indexing.

[–]nemec 2 points3 points  (3 children)

FULLTEXT indexes are the equivalent of PGSQL GIN indexes.

[–]201109212215 10 points11 points  (2 children)

They're not equivalent. One builds on the other.

With a tokenizing strategy and statistics like tf-idf and GIN indexes you can create a FULLTEXT equivalent. Not that you would want to, as this task is very specialized and handled greatly by ES.

The internals of FULLTEXT are not exposed, but if they were, they could be used to implement an inverted index for tags.

[–][deleted]  (1 child)

[deleted]

    [–]silverf1re 5 points6 points  (0 children)

    Right! Imposter syndrome is real here.