Basics of Hadoop Distributed System

1 min readJul 27, 2020

Hadoop is a type of storage system. Data in any format (example: csv) is irrespective of its size (say 1 GB, 1 TB, 1 PB) is divided into various chunks. The data is stored in various data nodes with multiple copies of each chunk throughout. This is done so that the data is preserved even if a single data node fails. The system which has all the tracks and copies is name-node.

Now, what if the name node collapse? This problem is solved by adding a secondary name node which acts as a backup for name node.

Why to use HDFS?

Scalable :- It is scalable as it has huge data spread across inexpensive servers.
It is flexible.
It is cost effective.
It is resilient to failure.

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Aki Kapoor

No responses yet