HDFS is the storage system of the Hadoop framework. It is a distributed file system that can conveniently run on commodity hardware for processing unstructured data. It is highly fault-tolerant. The same data is stored in multiple locations and in the event that one storage location fails the same data can be easily fetched from another location. It began under the Apache Nutch project but today is a top level Apache Hadoop project. HDFS is a major constituent of Hadoop along with Hadoop YARN, Hadoop MapReduce and Hadoop Common.

HDFS key features

HDFS key featuresDescription
Stores bulks of dataCapable of storing terabytes and petabytes of data
Minimum interventionHDFS manages thousands of nodes without operator intervention
ComputingBenefits of distributed and parallel computing at once
Scaling outIt works on scaling out rather than scaling up without single downtime
RollbackAllows returning to its previous version post an upgrade
Data integrityDeals with corrupted data by replicating it several times

The servers in HDFS are fully connected and communicate through TCP-based protocols. Though designed for huge databases, normal file systems (FAT, NTFS) can also be viewed.

Tell us about a new Kubernetes application

Newsletter

Never miss a thing! Sign up for our newsletter to stay updated.

About

Discover and learn about everything Kubernetes

Navigation