Basics of Hadoop Distributed System

1 min readJul 27, 2020

Hadoop is a type of storage system. Data in any format (example: csv) is irrespective of its size (say 1 GB, 1 TB, 1 PB) is divided into various chunks. The data is stored in various data nodes with multiple copies of each chunk throughout. This is done so that the data is preserved even if a single data node fails. The system which has all the tracks and copies is name-node.

Now, what if the name node collapse? This problem is solved by adding a secondary name node which acts as a backup for name node.

Why to use HDFS?

Scalable :- It is scalable as it has huge data spread across inexpensive servers.
It is flexible.
It is cost effective.
It is resilient to failure.

Written by Aki Kapoor