DEV Community

Cover image for Azure DP-900 Short Notes: Explore Core Data Concepts
Newanga  Wickramasinghe
Newanga Wickramasinghe

Posted on • Originally published at

Azure DP-900 Short Notes: Explore Core Data Concepts

👉 Learn Module: Explore Core Data Concepts

Identify the need for data solutions

👉Data is a collection of facts such as numbers, descriptions, and observations used in making of decision.

👉Three types of data

  1. Structured data
    • Tabular data that is represented by rows and columns in a database.
    • Tables in this form are called relational databases.
  2. Semi-structured data
    • Information that doesn't reside in a relational database but still has some structure to it.
    • Ex: JSON, key-value stores and graph databases
  3. Unstructured data
    • Data with no proper structure.
    • Ex: Audio, Video , Binary dat files

👉Based on the type of data, there are multiple ways to store and access data in Azure cloud.

👉Stored data needs to be processed. There are two types of data processing solutions.

  1. Transaction processing systems
    • primary function of business computing.
    • work performed by transactional systems is often referred to as Online Transactional Processing (OLTP).
    • Data is divided into small pieces for faster processing.
    • For example in a database tables are split out into separate groups of columns and this is called normalization.
  2. Analytical systems
    • Support business users who need to query data and gain a big picture view.
    • Capturing raw data and generate insights to make future business decisions.
    • Common tasks of a analysis system
      1. Data Ingestion - Capturing the raw data.
      2. Data Processing - Converting captured data into a common format to be processed.
      3. Data Querying - Querying data to analyze.
      4. Data Visualization - Generating charts such as bar charts, line charts out of queried data in order.

Identify types of data and data storage

👉Relational Data and Non-relational Data have different characteristics.

  1. Relational Data

    • Most well-understood model for holding data.
    • Data normalization helps to reduce any data redundancy within the database.
  2. Non-relational Data

    • Store data in a format that more closely matches the original structure.
    • Data duplication present which increases the storage required.
    • Due to data duplication, any data modification may cause to update data present at multiple locations.

👉Two different types of workloads.

  1. Transactional workloads

    • Transaction is a sequence of operations that are atomic.
    • Mostly commonly use relational databases.
    • A transactional database must adhere to the ACID.
      1. Atomicity = A transaction is treated as a single unit, which either succeeds completely, or fails.
      2. Consistency = A transaction can only take the data in the database from one valid state to another.
      3. Isolation = Concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.
      4. Durability = Once a transaction has been committed, it will remain committed even if there's a system failure.
  2. Analytical workloads

    • Read-only systems that store vast volumes of historical data or business metrics.
    • Used for data analysis and decision making.

Describe the difference between batch and streaming data

👉Data processing is converting data into meaningful information.

👉There are two types of data processing.

  1. Batch Processing
    • New data elements are collected into a group and the whole group is then processed at a future time as a batch.
    • Data Scope = Process all the data in the dataset.
    • Data Size = large datasets.
    • Performance = latency is a few hours.
    • Analysis = performing complex analytics.
  2. Streaming and real-time data
    • In stream processing, each new piece of data is processed when it arrives.
    • Beneficial for dynamic data.
    • Ideal for time-critical operations that require an instant real-time response.
    • Data Scope = Access to the most recent data received.
    • Data Size = Individual records or micro batches.
    • Performance = latency in the order of seconds or milliseconds.
    • Analysis = simple response functions.

Top comments (0)