Snapshots lets you save the current state of filesystem, so that rollbacks while upgrade are possible.
Only one snapshot can exists, basically it reads existing and creates new checkpoint with empty journal at different location to avoid changes to old checkpoint and journals.
Having chosen to roll back, there is no provision to roll forward.
File I/O
HDFS implements a single-writer, multiple-reader model.
HDFS client that opens a file for writing is granted a lease for the file; no other client can write to the file
When the file is closed, the lease is revoked
The writer's lease does not prevent other clients from reading; a file may have many concurrent readers.
When a client creates an HDFS file, it computes the checksum sequence for each block and sends it to a DataNode along with the data
When HDFS reads a file, each block's data and checksums are shipped to the client
When a client opens a file to read, it fetches the list of blocks and the locations of each block replica from the NameNode
The design of HDFS I/O is particularly optimized for batch processing systems
Block Placement
For a large cluster, flat topology for nodes is not possible. nodes are spread across multiple racks. Nodes of a rack share a switch, and rack switches are connected by one or more core switches.
Communication between two nodes in different racks has to go through multiple switches.
NameNode runs the configured script to decide which rack the node belongs to
first replica on the node where the writer is located
second and the third replicas are placed on two different nodes in a different rack
The policy reduces the inter-rack and inter-node write traffic and generally improves write performance
Replication
NameNode detects that a block has become under- or over-replicated when a block report from a DataNode arrive
over replicated blocks, the NameNode chooses a replica to remove
under-replicated blocks, it is put in the replication priority queue. HDFS places the next replica on a different rack
Balancer
The balancer is a tool that balances disk space usage on an HDFS cluster.
It takes a threshold value as an input parameter, which is a fraction between 0 and 1.
balancer optimizes the balancing process by minimizing the inter-rack data copying
A configuration parameter limits the bandwidth consumed by rebalancing operations. The higher the allowed bandwidth, the faster a cluster can reach the balanced state, but with greater competition with application processes.
Block Scanner
Each DataNode runs a block scanner that periodically scans its block replicas and verifies that stored checksums match the block data.
Whenever a read client or a block scanner detects a corrupt block, it notifies the NameNode. The NameNode marks the replica as corrupt, but does not schedule deletion of the replica immediately.
Random
DistCp for large inter/intra-cluster parallel copying
The key understanding is that about 0.8 percent of nodes fail each month