DEV Community

Cover image for 6 things you should know about Azure Data Lake
Optisol Business
Optisol Business

Posted on

6 things you should know about Azure Data Lake

1) What is Azure Data Lake storage?

It’s the enterprise wise repository for big data analytics workloads. Data stored can be of any type and any size.
• A single store for all data
• All ranges of data can be stored such as raw data to the highly transformed data

Data Lake Store is a no-limit cloud Data Lake built so enterprises can unlock value from unstructured, semi-structured, and structured data.
Data Lake Analytics is a cloud analytics service for developing and running massively parallel data transformation and processing programs in U-SQL, R, Python, and .NET over petabytes of data

Azure HDInsight is a cloud service that allows cost-effective data processing using open-source frameworks such as Hadoop, Spark, Hive, Storm, and Kafka, among others.

2) How Azure Data Lake works?

• Ingest all data regardless of requirement
• Store all data in native format without any schema definition
• Later, the analysis part can be done with Hadoop, Spark, R & Azure Data Lake Analytics (ADLA)

3) How the data is stored in Azure Data Lake?

A data lake is a storage repository that holds a large amount of data in its own raw format. Advantages of a data lake: Data is never thrown away, because the data is
stored in its raw format.

4) What Azure Data Lake does?

• Storage in form of petabyte size files and trillions of unlimited data.
• Develop massively parallel programs.
• Pay per job
• Can debug and optimize big data problems.
• It can start the job within seconds as there are no virtual machines or cluster loading like stuff to wait for.
• U-SQL is used to parallelize the scaled job massively

5) What is Data Lake architecture?

A Data Lake is a storage repository that can store large amount of structured, semi-structured, and unstructured data.
Unlike a hierarchical Data warehouse where data is stored in Files and Folder, Data Lake has a flat architecture

6) How Azure Data Factory, Azure Data Lake and Power BI works together?

U-SQL
The “U” in U-SQL stands for “Unified”; which is aptly named whereas it is designed to execute parallel queries across distributed relational or unstructured data sources using the SQL syntax.

U-SQL in Azure
U-SQL is a language that combines declarative SQL with imperative C# to let you process data at any scale. Through the scalable, distributed-query capability of U-SQL, you can efficiently analyse data across relational stores such as Azure SQL Database.

Power BI
Power BI is a powerful business intelligence platform. It is known for the abilities to connect to various data sources, tools for aggregating and analyzing data, and for the rich library of visualizations with many styling options.
We can connect Power BI with Azure Data Lake Store (ADLS) which is one of the most popular storage products for massive datasets.

Why Power BI
Microsoft Power BI is used to find insights within an organization’s data. Power BI can help connect disparate data sets, transform and clean the data into a data model and create charts or graphs to provide visuals of the data.

Author Bio:

B. Anitha Letchumi, BI Lead at OptiSol Business Solutions, having 10 years of experience in Business Intelligence and working with OptiSol for the last 7 years. The area of expertise are Microsoft BI, Power BI & SQL programming and worked on a couple of projects with Azure Data Lake Storage & Azure Data Factory

Top comments (0)