Day 13/100

Day 13/100

Scylla Compaction Strategies

Size tiered compaction strategy [STCS]

Screenshot 2022-04-02 at 11.17.26 PM.png

  • STCS organises SSTable into tiers
  • the tiers are based on the size of SSTable on an exponential scale
  • When compacting several SSTables, a single SSTable is created.
    • It may be as large as union of all of them, then it's moved to next tier
    • or become much smaller due to deletes and expirations, potentially dropping to lower tier

STCS Space Amplification

  • STCS requires space of at least twice the data size, this is called Space Amplification
  • Temporary spaces due to compaction and
  • Accumulation of updates and deletes across different tiers.

Levelled Compaction Strategy [LCS]

Screenshot 2022-04-02 at 11.19.54 PM.png

  • This compaction is triggered when a level has more than 10 SSTables
  • LCS picks one SSTable from level 'i', with size 'x', to compact
  • it then roughly finds 10 SSTables in the next level
    • Overlapping with SSTable and compacts all of them together
  • It then writes the resulting run to next level, run size bound by (1+10)*X
  • while LCS limits space amplification but it results in higher write amplification

Time Window Compaction Strategy [TWCS]

  • Memtables have a write time, SSTables inherit this write time
  • Only SSTables that belong to the same window are compacted together
create table twcs.example (
     id int,
     value int,
     text_value text,
     PRIMARY KEY (id, value)
) with clustering order by (value ASC)
       AND compaction = {
           'compaction_window_size' : '1',
               'compaction_window_unit' : 'DAYS', 
               'class' : TimeWIndowCompactionStrategy', 
       } ;
  • All data in the partition is inserted in the same window or in small number of windows
  • Deletes and writes -
    • Data was written 1 year ago, data point is in SSTable 1
    • Delete will be written now, tombstone is in the SSTable N
    • Reads have to read them both
    • Data is never really gone

Incremental Compaction Strategy

  • Observed problems with legacy compaction Strategies
  • STCS and LCS has high space and write Amplification respectively.
  • Sorted set of SSTables
  • The SSTables are non-overlapping
    • those are called Fragments
  • A run is equivalent to a large SSTable, split into several smaller SSTables
  • Fragments are disjoint and sorted with respect to each other, so we scan the runs, fragment by fragment and compact them increamentally