HDFS is the storage system of the Hadoop framework. It is a distributed file system that can conveniently run on commodity hardware for processing unstructured data. It is highly fault-tolerant. The same data is stored in multiple locations and in the event that one storage location fails the same data can be easily fetched from another location. It began under the Apache Nutch project but today is a top level Apache Hadoop project. HDFS is a major constituent of Hadoop along with Hadoop YARN, Hadoop MapReduce and Hadoop Common.

HDFS key features

HDFS key features Description
Stores bulks of data Capable of storing terabytes and petabytes of data
Minimum intervention HDFS manages thousands of nodes without operator intervention
Computing Benefits of distributed and parallel computing at once
Scaling out It works on scaling out rather than scaling up without single downtime
Rollback Allows returning to its previous version post an upgrade
Data integrity Deals with corrupted data by replicating it several times

The servers in HDFS are fully connected and communicate through TCP-based protocols. Though designed for huge databases, normal file systems (FAT, NTFS) can also be viewed.

Tell us about a new Kubernetes application


Never miss a thing! Sign up for our newsletter to stay updated.


Discover and learn about everything Kubernetes