Day 27/100

Day 27/100

Designing Data-Intensive Applications [Book Highlights]

[Part I : Chapter III] Storage and Retrieval

  • fundamentals of a database are two things: it should store the data, it should give it back as well.

  • Consider the world’s simplest database, implemented as two Bash functions:

#!/bin/bash
db_set () {
    echo "$1,$2" >> database
}

db_get () {
    grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
}
  • This is how it works
$ db_set 123456 '{"name":"London","attractions":["Big Ben","London Eye"]}'

$ db_set 42 '{"name":"San Francisco","attractions":["Golden Gate Bridge"]}'

$ db_get 42
{"name":"San Francisco","attractions":["Golden Gate Bridge"]}
  • what db_set does, many databases internally use a log, which is an append-only data file.
  • A Log is nothing but, an append-only sequence of records. It doesn’t have to be human-readable; it might be binary and intended only for other programs to read.
  • db_get has to scan the entire database file from beginning to end, looking for occurrences of the key. In algorithmic terms, the cost of a lookup is O(n):
  • if you double the number of records n in your database, a lookup takes twice as long. That’s not good.
  • An index is an additional structure that is derived from the primary data
  • Many databases allow you to add and remove indexes, and this doesn’t affect the contents of the database; it only affects the performance of queries.
  • Any kind of index usually slows down writes, because the index also needs to be updated every time data is written.
  • well-chosen indexes speed up read queries, but every index slows down writes.

Hash Indexes

  • Let’s say our data storage consists only of appending to a file
  • Then the simplest possible indexing strategy is this: keep an in-memory hash map where every key is mapped to a byte offset in the data file—the location at which the value can be found.

image.png Storing a log of key-value pairs in a CSV-like format, indexed with an in-memory hashmap.

  • This approach is useful in workloads where there are a lot of writes, but there are not too many distinct keys—you have a large number of writes per key, but it’s feasible to keep all keys in memory.
  • In this approach, we only ever append to a file.
  • to avoid eventually running out of space, it is good to break the log into segments of certain size.
  • We can then perform compaction on these segments, as illustrated in below image

image.png Compaction of a key-value update log (counting the number of times each cat video was played), retaining only the most recent value for each key.

  • The merging and compaction of frozen segments can be done in a background thread, and while it is going on, we can still continue to serve read and write requests as normal, using the old segment files.
  • File format - CSV is not the best format for a log. It’s faster and simpler to use a binary format that first encodes the length of a string in bytes, followed by the raw string (without need for escaping).
  • Deleting records - If you want to delete a key and its associated value, you have to append a special deletion record to the data file (sometimes called a tombstone). When log segments are merged, the tombstone tells the merging process to discard any previous values for the deleted key.
  • Crash recovery - If the database is restarted, the in-memory hash maps are lost. In principle, you can restore each segment’s hash map by reading the entire segment file
  • Partially written records - The database may crash at any time, including halfway through appending a record to the log. Bitcask files include checksums, allowing such corrupted parts of the log to be detected and ignored.
  • Concurrency control - As writes are appended to the log in a strictly sequential order, a common implementation choice is to have only one writer thread. Data file segments are append-only and otherwise immutable, so they can be read concurrently by multiple threads.

  • Advantages

    • Appending and segment merging are sequential write operations, which are generally much faster than random writes,
    • Concurrency and crash recovery are much simpler if segment files are appendonly or immutable.
    • Merging old segments avoids the problem of data files getting fragmented over time.
  • Drawbacks

    • The hash table must fit in memory, so if you have a very large number of keys, you’re out of luck.
  • Range queries are not efficient. For example, you cannot easily scan over all keys between kitty00000 and kitty99999—you’d have to look up each key individually in the hash maps.