Basics of Hadoop Distributed System

Hadoop is a type of storage system. Data in any format (example: csv) is irrespective of its size (say 1 GB, 1 TB, 1 PB) is divided into various chunks. The data is stored in various data nodes with multiple copies of each chunk throughout. This is done so that the data is preserved even if a single data node fails. The system which has all the tracks and copies is name-node.

Now, what if the name node collapse? This problem is solved by adding a secondary name node which acts as a backup for name node.

Why to use HDFS?

  1. Scalable :- It is scalable as it has huge data spread across inexpensive servers.
  2. It is flexible.
  3. It is cost effective.
  4. It is resilient to failure.

--

--

--

Masters in Applied data science, University of Canterbury, New Zealand. Data scientist who loves to play with the data and make sense from it.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How Web and Mobile App Development Agencies Price Things, and Ways to Avoid Overpaying

🎉 Your Pockit account is active again!🎉

Why I still fanboy over GitLab

This Week In TurtleCoin (FEB 10, 2019)

Architecture for a startup

CS371P Fall 2020: Final Entry

How to Fix Kast Audio Not Working — 4 Methods

[Django][Restframework] Filter data in Django Rest Framework

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aki Kapoor

Aki Kapoor

Masters in Applied data science, University of Canterbury, New Zealand. Data scientist who loves to play with the data and make sense from it.

More from Medium

CS373 Spring 2022: Feb 21-Feb 27

Embedded System

WHAT IS DATA STRUCTURE?

User Acceptance Testing (UAT) — The three W’s (What, Who, Why)