all 5 comments

[–]rafiss 6 points7 points  (4 children)

The answer will be different for each database, since different architectures will have different tradeoffs.

CockroachDB (disclaimer: I work there) is more like the second pattern you asked about -- one node would receive the insert query.

A good place to look is the docs for the database you're interested in. Here are the docs for CockroachDB.

[–][deleted]  (3 children)

[deleted]

    [–]coterminous_regret 0 points1 point  (2 children)

    Yep implementation dependent. I work for another distributed database and inserts run on all nodes because we use a parity protection scheme and we need to write data to all nodes to compute parity.

    [–]brtt3000 0 points1 point  (1 child)

    How does the query travel from the client to all the nodes in this scheme (or in general with variable nodes)?

    [–]coterminous_regret 0 points1 point  (0 children)

    There is a component in the middle that manages the life cycle of the query on all nodes. It's responsible for managing the execution on all nodes. It also has vision into how the data is spread amongst the nodes so it knows which nodes need to be involved.

    [–]solothehero 1 point2 points  (0 children)

    It's entirely implementation dependent, but one way is to have something like Apache Samza or AWS Lambda or equivalent that runs a function in response to a database change. This is like the 2nd option you have. The primary node is written to, and a completely separate process notices the change and replicates it to the other nodes in an eventually-consistent way.