Day 14/100

Day 14/100

Scylla Repairs

Screenshot 2022-04-04 at 12.41.42 AM.png

  • Scylla node may go out of sync over time - increasing the Entropy
    • Network issues
    • Node issues
    • Rolling Upgrades
  • This will impact scylla, inconsistent data and Data Resurrection
  • Anti Entropy tools
    • Repair, Read Repair and Hinted Handoff

Inconsistent Data

  • nodetool repair -pr command repairs differences between the copies of each replica
  • After data reconciliation, nodes will become coherent.

The resurrection problem

  • Delete fails to propagate to all nodes
  • after gc_grace_seconds (default is 10 days) tombstones are removed
  • the node that didn’t get the deleted data, will be propagated to the other two nodes

What are repairs

  • whenever there is a write, scylla tries to write to all replicas, that includes all replicas
  • consistency level says to wait for ack from certain nodes and we don't know about rest of the nodes
  • reads may return stale data depending on read CL, repair - the database is scanned and differences are fixed

Do we really need to run repair?

  • Single datacenter, safe to skip if
    • CLread + CLwrite > RF for all operations
    • no delete operations
    • No TTLd data
  • Multi datacenter
    • LOCAL_QUORUM + LOCAL_QUORUM is not enough
    • EACH_QUORUM + LOCAL_QUORUM is ok but consider HA