DEV Community

shubham mishra
shubham mishra

Posted on • Originally published at developerindian.com

2

Type of data in hadoop

First developed by Doug Cutting and Mike Cafarella in 2005, Licence umder forApache License 2.0
Apache Hadoopis an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. The Hadoop Distributed File System (HDFS) is Hadoop's storage layer. Housed on multiple servers, data is divided into blocks based on file size. These blocks are then randomly distributed and stored across slave machines. HDFS in Hadoop Architecture divides large data into different blocks
Cloudera HadoopCloudera's open source platform, is the most popular distribution of Hadoop and related projects in the world .

** What is big data ?
**
Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size. For this we can useApache Hadoopandcloudera hadoop

Type of data used in big data(Apache Hadoop / Spark)?

**Structured –
**that which can be stored in rows and columns like relational data sets

Unstructured – data that cannot be stored in rows and columns like video, images, etc.
Semi-structured – data in XML that can be read by machines and human
Unstructured
Structured –that which can be stored in rows and columns like relational data sets

**Semi-structured
**Unstructured – data that cannot be stored in rows and columns like video, images, etc.

Lights
Semi-structured – data in XML that can be read by machines and human

About Apache Hadoop

Hadoop is the most important framework for working with Big Data. The biggest strength of Hadoop is scalability. It can upgrade from working on a single node to thousands of nodes without any issue in a seamless manner.
Advantages of hadoop
Apache hadoopstores data in a distributed fashion, which allows data to be processed distributedly on a cluster of nodes

In short, we can say thatApache hadoopis an open-source framework. Hadoop is best known for its fault tolerance and high availability feature
Apache hadoopclusters are scalable.
TheApache hadoopframework is easy to use.
In HDFS, the fault tolerance signifies the robustness of the system in the event of failure. The HDFS is highly fault-tolerant that if any machine fails, the other machine containing the copy of that data automatically become active.

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Engage with a sea of insights in this enlightening article, highly esteemed within the encouraging DEV Community. Programmers of every skill level are invited to participate and enrich our shared knowledge.

A simple "thank you" can uplift someone's spirits. Express your appreciation in the comments section!

On DEV, sharing knowledge smooths our journey and strengthens our community bonds. Found this useful? A brief thank you to the author can mean a lot.

Okay