Day 24/100

Day 24/100

Designing Data-Intensive Applications [Book Highlights]

[Part I : Chapter II] Data Models and Query Languages

Query Languages for Data

  • An imperative language tells the computer to perform certain operations in a certain order.
  • In a declarative query language, like SQL, you just mention what conditions the data should meet and what transformations. but not how.
  • A declarative query language is attractive, as it hides implementation details of engine, making the database system able to introduce performance improvements without requiring any changes to queries.
  • Declarative languages have a better chance of getting faster in parallel execution because they specify only the pattern of the results, not the algorithm.

MapReduce Querying

  • MapReduce is a programming model for processing large amounts of data in bulk across many machines, popularized by Google
  • A limited form of MapReduce is supported by some NoSQL datastores, including MongoDB and CouchDB, as a mechanism for performing read-only queries across many documents.
  • MapReduce is neither a declarative query language nor a fully imperative query API, but somewhere in between
  • It is based on the map (also known as collect) and reduce (also known as fold or inject) functions
  • A usability problem with MapReduce is that you have to write two carefully coordinated functions,

Graph-Like Data Models

  • if many-to-many relationships are very common in your data, the connections within your data become more complex, it becomes more natural to start modeling your data as a graph.

image.png Example of graph-structured data (boxes represent vertices, arrows represent edges).

Property Graphs

  • In the property graph model, each vertex consists of:
    • A unique identifier
    • A set of outgoing edges
    • A set of incoming edges
    • A collection of properties (key-value pairs)
  • Each edge consists of:

    • A unique identifier
    • The vertex at which the edge starts (the tail vertex)
    • The vertex at which the edge ends (the head vertex)
    • A label to describe the kind of relationship between the two vertices
    • A collection of properties (key-value pairs)
  • Any vertex can have an edge connecting it with any other vertex. There is no schema that restricts which kinds of things can or cannot be associated.

  • Given any vertex, you can efficiently find both its incoming and its outgoing edges, and thus traverse the graph
  • By using different labels for different kinds of relationships, you can store several different kinds of information in a single graph