Scylla Repairs
- Scylla node may go out of sync over time - increasing the Entropy
- Network issues
- Node issues
- Rolling Upgrades
- This will impact scylla, inconsistent data and Data Resurrection
- Anti Entropy tools
- Repair, Read Repair and Hinted Handoff
Inconsistent Data
nodetool repair -pr
command repairs differences between the copies of each replica
- After data reconciliation, nodes will become coherent.
The resurrection problem
- Delete fails to propagate to all nodes
- after gc_grace_seconds (default is 10 days) tombstones are removed
- the node that didn’t get the deleted data, will be propagated to the other two nodes
What are repairs
- whenever there is a write, scylla tries to write to all replicas, that includes all replicas
- consistency level says to wait for ack from certain nodes and we don't know about rest of the nodes
- reads may return stale data depending on read CL, repair - the database is scanned and differences are fixed
Do we really need to run repair?
- Single datacenter, safe to skip if
- CLread + CLwrite > RF for all operations
- no delete operations
- No TTLd data
- Multi datacenter
- LOCAL_QUORUM + LOCAL_QUORUM is not enough
- EACH_QUORUM + LOCAL_QUORUM is ok but consider HA