HDFS is the storage system of the Hadoop framework. It is a distributed file system that can conveniently run on commodity hardware for processing unstructured data. It is highly fault-tolerant. The same data is stored in multiple locations and in the event that one storage location fails the same data can be easily fetched from another location. It began under the Apache Nutch project but today is a top level Apache Hadoop project. HDFS is a major constituent of Hadoop along with Hadoop YARN, Hadoop MapReduce and Hadoop Common.
|HDFS key features||Description|
|Stores bulks of data||Capable of storing terabytes and petabytes of data|
|Minimum intervention||HDFS manages thousands of nodes without operator intervention|
|Computing||Benefits of distributed and parallel computing at once|
|Scaling out||It works on scaling out rather than scaling up without single downtime|
|Rollback||Allows returning to its previous version post an upgrade|
|Data integrity||Deals with corrupted data by replicating it several times|
The servers in HDFS are fully connected and communicate through TCP-based protocols. Though designed for huge databases, normal file systems (FAT, NTFS) can also be viewed.
Tell us about a new Kubernetes application
Never miss a thing! Sign up for our newsletter to stay updated.
Discover and learn about everything Kubernetes