Imagine someone with a brain that can store and process zettabytes of data. It is not possible for a "Normal" human to have such powerful brain. But, even though you don't possess a brain with such massive processing and storing capacity, you could use a tool that can do the processing and storing part on your behalf.
Yes, you guessed it correct - HADOOP!! is what I am talking about.
What is Hadoop?
It's a tool that can store and process huge amount of data as it follows parallel processing concept and stores multiple copies of the data in different systems so that, if one system fails, you don't lose your data.
Hadoop Architecture
There are mainly 4 components of Hadoop
- MapReduce
- Hadoop Distributed File System/HDFS
- Yet Another Resource Negotiator/YARN
- Utilities/Java Library
MapReduce
This component work as two units i.e. Map and Reduce. First, the data set goes as input into map function which is nothing but tuples having key-value pair. These key-value pair then goes to reduce function where the data are shuffled and sorted and then aggregated and written in the file with the help of record writer.
HDFS
HDFS is used for storage purpose. It works on master slave architecture where we have NameNode as Master and DataNode as slave. NameNode instructs DataNode, where and how to store data. It stores the metadata where as DataNode actually stores the data.
YARN
Its main task is to schedule jobs and manage resources. YARN decides when and which task or job is to be performed and how much resources to be allocated to perform the job.
Utilities/Java libraries
To run other components of Hadoop (MapReduce, HDFS and YARN) smoothly we need these java libraries. These libraries make sure that incase of any hardware failure or any other unfortunate circumstances Hadoop doesn't crash.
References
Wanna know more about Hadoop? click on the below links.
HDFS Architecture
YARN Architecture
Top comments (0)