Aki Kapoor
1 min readJul 27, 2020

--

Basics of Hadoop Distributed System

Hadoop is a type of storage system. Data in any format (example: csv) is irrespective of its size (say 1 GB, 1 TB, 1 PB) is divided into various chunks. The data is stored in various data nodes with multiple copies of each chunk throughout. This is done so that the data is preserved even if a single data node fails. The system which has all the tracks and copies is name-node.

Hadoop distributed system

Now, what if the name node collapse? This problem is solved by adding a secondary name node which acts as a backup for name node.

Why to use HDFS?

  1. Scalable :- It is scalable as it has huge data spread across inexpensive servers.
  2. It is flexible.
  3. It is cost effective.
  4. It is resilient to failure.

--

--

Aki Kapoor

Masters in Applied data science, University of Canterbury, New Zealand. Data scientist who loves to play with the data and make sense from it.