Day 41/100

Designing Data-Intensive Applications [Book Highlights]

[ Part II : Chapter V ] Distributed Data

Multi-Leader Replication

  • Multi-datacenter operation - we have a database with replicas in several different datacenters. With a normal leader-based replication setup, the leader has to be in one of the datacenters, and all writes must go through that datacenter.
  • In a multi-leader configuration, you can have a leader in each datacenter.
  • Within each datacenter, regular leader– follower replication is used; between datacenters, each datacenter’s leader replicates its changes to the leaders in other datacenters.

image.png

  • Performance - In a multi-leader configuration, every write can be processed in the local datacenter and is replicated asynchronously to the other datacenters
  • Tolerance of datacenter outages - In a multi-leader configuration, each datacenter can continue operating independently of the others, and replication catches up when the failed datacenter comes back online
  • Tolerance of network problems - A multi-leader configuration with asynchronous replication can usually tolerate network problems better: a temporary network interruption does not prevent writes being processed
  • it also has a big downside: the same data may be concurrently modified in two different datacenters, and those write conflicts must be resolved

Handling Write Conflicts

  • The biggest problem with multi-leader replication is that write conflicts can occur, which means that conflict resolution is required.

image.png

  • In a single-leader database, the second writer will either block and wait for the first write to complete, or abort the second write transaction, forcing the user to retry the write.

Multi-Leader Replication Topologies

image.png

  • The most general topology is all-to-all, in which every leader sends its writes to every other leader.
  • circular topology, in which each node receives writes from one node and forwards those writes to one other node.
  • In circular and star topologies, a write may need to pass through several nodes before it reaches all replicas
  • A problem with circular and star topologies is that if just one node fails, it can interrupt the flow of replication messages between other nodes, causing them to be unable to communicate until the node is fixed