Designing Data-Intensive Applications [Book Highlights]

why we distribute a database across multiple machines
- Scalability - data volume, read/write load grows more than what a single machine can handle
- Fault Tolerance/ HA - we want application/db to continue operating even if a node/network/datacenter goes down.
- Latency - Avoid wait time for user requests to travel halfway around the world for results.

Many CPUs, many RAM chips, and many disks can be joined together under one operating system, and a fast interconnect allows any CPU to access any part of the memory or disk.
Problems with shared memory architecture,
- the cost grows faster than linearly
- a machine twice the size cannot necessarily handle twice the load
- limited fault tolerance
- limited to a single geographic location.

In this approach, each machine or virtual machine running the database software is called a node. Each node uses its CPUs, RAM, and disks independently.
Coordination between nodes is done at the software level, using a conventional network.
you can use whatever machines have the best price/performance ratio.
distribute data across multiple geographic regions, and thus reduce latency for users, also might be able to survive loss of a datacenter.

keeping a copy of the same data on multiple machines that are connected via a network
Why we might do it,
- To keep data geographically close to your users (and thus reduce latency)
- To allow the system to continue working even if some of its parts have failed (and thus increase availability)
  - To scale out the number of machines that can serve read queries (and thus increase read throughput)
The main challenge of replication is handling the changes to replicated data

Day 37/100