Basics of Hadoop Distributed System

Hadoop is a type of storage system. Data in any format (example: csv) is irrespective of its size (say 1 GB, 1 TB, 1 PB) is divided into various chunks. The data is stored in various data nodes with multiple copies of each chunk throughout. This is done so that the data is preserved even if a single data node fails. The system which has all the tracks and copies is name-node.

Now, what if the name node collapse? This problem is solved by adding a secondary name node which acts as a backup for name node.

Why to use HDFS?

  1. Scalable :- It is scalable as it has huge data spread across inexpensive servers.
  2. It is flexible.
  3. It is cost effective.
  4. It is resilient to failure.

--

--

--

Masters in Applied data science, University of Canterbury, New Zealand. Data scientist who loves to play with the data and make sense from it.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Cast Update: New Publisher Categories, New Podcast Types and Self-Serve Invoices

Bokeh Interactive Plots: Part 3

Effort pie: how to have your pie and eat it too!

Top 7 Replies by Programmers when their programs don’t work:

You should know how to Vim

Static type checking in the programmable programming language (Lisp)

Sand Dunes & Digital Mires cont.

IT Department for non-IT Companies

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aki Kapoor

Aki Kapoor

Masters in Applied data science, University of Canterbury, New Zealand. Data scientist who loves to play with the data and make sense from it.

More from Medium

Elasticsearch: Guide to Sharding an Index

The Benefits of AI-Assisted Software Development

Deploy IBM Sterling Order Manager Software on OpenShift — Part 2

7 Reasons to Publish a Unique Database Performance Ranking