Basics of Hadoop Distributed System
Hadoop is a type of storage system. Data in any format (example: csv) is irrespective of its size (say 1 GB, 1 TB, 1 PB) is divided into various chunks. The data is stored in various data nodes with multiple copies of each chunk throughout. This is done so that the data is preserved even if a single data node fails. The system which has all the tracks and copies is name-node.
Now, what if the name node collapse? This problem is solved by adding a secondary name node which acts as a backup for name node.
Why to use HDFS?
- Scalable :- It is scalable as it has huge data spread across inexpensive servers.
- It is flexible.
- It is cost effective.
- It is resilient to failure.