Day 26/100

Designing Data-Intensive Applications [Book Highlights]

[Part I : Chapter II] Data Models and Query Languages

The Cypher Query Language

  • Cypher is a declarative query language for property graphs, created for the Neo4j graph database
  • Example 2-4. Cypher query to find people who emigrated from the US to Europe
    MATCH
       (person) -[:BORN_IN]-> () -[:WITHIN*0..]-> (us:Location {name:'United States'}),
       (person) -[:LIVES_IN]-> () -[:WITHIN*0..]-> (eu:Location {name:'Europe'})
    RETURN person.name
    
  • for a declarative query language, you don’t need to specify such execution details when writing the query.

Triple-Stores and SPARQL

  • In a triple-store, all information is stored in the form of very simple three-part statements: (subject, predicate, object).
  • For example, in the triple (Jim, likes, bananas), Jim is the subject, likes is the predicate (verb), and bananas is the object.
  • A value in a primitive datatype, such as a string or a number. In that case, the predicate and object of the triple are equivalent to the key and value of a property on the subject vertex. For example, (lucy, age, 33) is like a vertex lucy with properties {"age":33}.
  • Another vertex in the graph. In that case, the predicate is an edge in the graph, the subject is the tail vertex, and the object is the head vertex. For example, in (lucy, marriedTo, alain) the subject and object lucy and alain are both vertices, and the predicate marriedTo is the label of the edge that connects them.

The semantic web

  • The Resource Description Framework (RDF) [41] was intended as a mechanism for different websites to publish data in a consistent format, allowing data from different websites to be automatically combined into a web of data—a kind of internet-wide “database of everything.”
  • Unfortunately, the semantic web was overhyped in the early 2000s but so far hasn’t shown any sign of being realized in practice,

The SPARQL query language

  • SPARQL is a query language for triple-stores using the RDF data model
  • Inspiration from cypher look quite similar -Example 2-9. The same query as Example 2-4, expressed in SPARQL
PREFIX : <urn:example:>

SELECT ?personName WHERE {
      ?person :name ?personName.
      ?person :bornIn / :within* / :name "United States".
      ?person :livesIn / :within* / :name "Europe".
}
  • RDF doesn’t distinguish between properties and edges but just uses predicates for both, you can use the same syntax for matching properties.

Summery

  • Historically, data started out being represented as one big tree (the hierarchical model), but that wasn’t good for representing many-to-many relationships, so the relational model was invented to solve that problem.
  • “NoSQL” datastores have diverged in two main directions:
    • Document databases target use cases where data comes in self-contained documents and relationships between one document and another are rare.
    • Graph databases go in the opposite direction, targeting use cases where anything is potentially related to everything.
  • All three models (document, relational, and graph) are widely used today, and each is good in its respective domain.
  • Graph and document dbs don’t enforce a schema for the data they store