DEV Community

shubham mishra
shubham mishra

Posted on • Originally published at developerindian.com

2

Type of data in hadoop

First developed by Doug Cutting and Mike Cafarella in 2005, Licence umder forApache License 2.0
Apache Hadoopis an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. The Hadoop Distributed File System (HDFS) is Hadoop's storage layer. Housed on multiple servers, data is divided into blocks based on file size. These blocks are then randomly distributed and stored across slave machines. HDFS in Hadoop Architecture divides large data into different blocks
Cloudera HadoopCloudera's open source platform, is the most popular distribution of Hadoop and related projects in the world .

** What is big data ?
**
Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size. For this we can useApache Hadoopandcloudera hadoop

Type of data used in big data(Apache Hadoop / Spark)?

**Structured –
**that which can be stored in rows and columns like relational data sets

Unstructured – data that cannot be stored in rows and columns like video, images, etc.
Semi-structured – data in XML that can be read by machines and human
Unstructured
Structured –that which can be stored in rows and columns like relational data sets

**Semi-structured
**Unstructured – data that cannot be stored in rows and columns like video, images, etc.

Lights
Semi-structured – data in XML that can be read by machines and human

About Apache Hadoop

Hadoop is the most important framework for working with Big Data. The biggest strength of Hadoop is scalability. It can upgrade from working on a single node to thousands of nodes without any issue in a seamless manner.
Advantages of hadoop
Apache hadoopstores data in a distributed fashion, which allows data to be processed distributedly on a cluster of nodes

In short, we can say thatApache hadoopis an open-source framework. Hadoop is best known for its fault tolerance and high availability feature
Apache hadoopclusters are scalable.
TheApache hadoopframework is easy to use.
In HDFS, the fault tolerance signifies the robustness of the system in the event of failure. The HDFS is highly fault-tolerant that if any machine fails, the other machine containing the copy of that data automatically become active.

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (0)

Billboard image

Try REST API Generation for Snowflake

DevOps for Private APIs. Automate the building, securing, and documenting of internal/private REST APIs with built-in enterprise security on bare-metal, VMs, or containers.

  • Auto-generated live APIs mapped from Snowflake database schema
  • Interactive Swagger API documentation
  • Scripting engine to customize your API
  • Built-in role-based access control

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay