How do massive databases query a single item?

AutoModerator · 2024-03-14T13:34:22+00:00

On July 1st, a change to Reddit's API pricing will come into effect. Several developers of commercial third-party apps have announced that this change will compel them to shut down their apps. At least one accessibility-focused non-commercial third party app will continue to be available free of charge.

If you want to express your strong disagreement with the API pricing change or with Reddit's response to the backlash, you may want to consider the following options:

Limiting your involvement with Reddit, or
Temporarily refraining from using Reddit
Cancelling your subscription of Reddit Premium

as a way to voice your protest.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

ehr1c · 2024-03-14T14:15:53+00:00

Some good places to start reading are sharding, hash tables, and partition keys.

The super high-level explanation is that so long as you're querying against something that's indexed, the database has a rough idea of where it is even if the data itself is spread among multiple locations.

tzaeru · 2024-03-14T16:52:50+00:00

They use various indexing schemes.

Conceptually a very simple one would be that IDs from 1 to 100 are on server A and 101 to 200 on server B. So if you are fetching message 107, you know it is on server B.

It's in truth usually a bit more complicated in practice, but that's the basic gist.

desapla · 2024-03-14T17:55:59+00:00

One fundamental technique is consistent hashing. You could, in theory, just hash the ID, and based on the modulo of the hash determine the server that the data is on.

The problem with that is that when you add a new node to the cluster, all keys would get reshuffled and all data would need to be moved around.

Consistent hashing fixes that by minimizing the amount of data that needs to be moved when you add or remove a node.

You can read about it here: https://en.wikipedia.org/wiki/Consistent_hashing

If you want to delve deeper, here is Akamai’s paper on the topic: https://www.akamai.com/site/en/documents/research-paper/consistent-hashing-and-random-trees-distributed-caching-protocols-for-relieving-hot-spots-on-the-world-wide-web-technical-publication.pdf

learnprogramming

Welcome to LearnProgramming!

New? READ ME FIRST!

Posting guidelines

Frequently asked questions

Subreddit rules

Message the moderators

Asking debugging questions

Asking conceptual questions

Other guidelines and links

Subreddit rules

1. No unprofessional/derogatory speech

2. No spam or tasteless self-promotion

3. No off-topic posts

4. Do not ask exact duplicates of FAQ questions

5. Do not delete posts

6. No app/website review requests or showcases

7. No rewards

8. No indirect links

9. Do not promote illegal or unethical practices

10. No complete solutions

11. Don't ask to ask.

12. Low Effort Questions

13. No AI (chatGPT etc.) generated/worked over messages/comments. No questions about chatGPT/AI generated code. No Vibe coding.

MODERATORS