Day 2/100

HDFS [Hadoop Distributed File System] - Part 2

Snapshots

  • Snapshots lets you save the current state of filesystem, so that rollbacks while upgrade are possible.
  • Only one snapshot can exists, basically it reads existing and creates new checkpoint with empty journal at different location to avoid changes to old checkpoint and journals.
  • Having chosen to roll back, there is no provision to roll forward.

File I/O

  • HDFS implements a single-writer, multiple-reader model.
  • HDFS client that opens a file for writing is granted a lease for the file; no other client can write to the file
  • When the file is closed, the lease is revoked
  • The writer's lease does not prevent other clients from reading; a file may have many concurrent readers.
  • When a client creates an HDFS file, it computes the checksum sequence for each block and sends it to a DataNode along with the data
  • When HDFS reads a file, each block's data and checksums are shipped to the client
  • When a client opens a file to read, it fetches the list of blocks and the locations of each block replica from the NameNode
  • The design of HDFS I/O is particularly optimized for batch processing systems

Block Placement

  • For a large cluster, flat topology for nodes is not possible. nodes are spread across multiple racks. Nodes of a rack share a switch, and rack switches are connected by one or more core switches.
  • Communication between two nodes in different racks has to go through multiple switches.
  • NameNode runs the configured script to decide which rack the node belongs to
  • first replica on the node where the writer is located
  • second and the third replicas are placed on two different nodes in a different rack
  • The policy reduces the inter-rack and inter-node write traffic and generally improves write performance

Replication

  • NameNode detects that a block has become under- or over-replicated when a block report from a DataNode arrive
  • over replicated blocks, the NameNode chooses a replica to remove
  • under-replicated blocks, it is put in the replication priority queue. HDFS places the next replica on a different rack

Balancer

  • The balancer is a tool that balances disk space usage on an HDFS cluster.
  • It takes a threshold value as an input parameter, which is a fraction between 0 and 1.
  • balancer optimizes the balancing process by minimizing the inter-rack data copying
  • A configuration parameter limits the bandwidth consumed by rebalancing operations. The higher the allowed bandwidth, the faster a cluster can reach the balanced state, but with greater competition with application processes.

Block Scanner

  • Each DataNode runs a block scanner that periodically scans its block replicas and verifies that stored checksums match the block data.
  • Whenever a read client or a block scanner detects a corrupt block, it notifies the NameNode. The NameNode marks the replica as corrupt, but does not schedule deletion of the replica immediately.

Random

  • DistCp for large inter/intra-cluster parallel copying
  • The key understanding is that about 0.8 percent of nodes fail each month