DEV Community

Ayush Kumar
Ayush Kumar

Posted on

Azure Synapse Analytics Security: Data Protection

Introduction

Data serves as the vital essence of any organization. Whether you’re dealing with sensitive customer information, or financial records, safeguarding your data is non-negotiable.

Many organizations face challenges such as:

  • How do you protect the data if you don't know where it is?
  • What level of protection is needed?—because some datasets require more protection than others.

Azure Synapse Analytics offers powerful features to help you achieve this, ensuring confidentiality, integrity, and availability.

In this blog, we’ll explore the Data Encryption capabilities integrated into Azure Synapse Analytics, discussing encryption techniques for data at rest and in transit, as well as approaches for detecting and categorizing sensitive data in your Synapse workspace.


What is Data Discovery and Classification?

Imagine your company that have massive amounts of information stored in their databases. But some of columns needs extra protection – like Social Security numbers or financial records. Manually finding this sensitive data is a time-consuming nightmare.

Here's the good news: there's a better way! Azure Synapse offers a feature called Data Discovery that automates this process.

How does Data Discovery work?

Think of Data Discovery as a super-powered scanner. It automatically goes through every row and column of your data lake or databases, looking for patterns that might indicate sensitive information. Just like a smart assistant, it can identify potentially sensitive data and classify those columns for you.

Once the data discovery process is complete, it provides classification recommendations based on a predefined set of patterns, keywords, and rules. These recommendations can then be reviewed, and then Sensitivity-classification labels can be applied to the appropriate columns. This process is known as Classification.

What happen after classifying sensitivity labels on columns?

Sensitivity-classification labels is a new metadata attributes that have been added to the SQL Server database engine. So, after classifying sensitivity labels on columns, the organization can leverage these labels to:

  • implement fine-grained access controls. Only authorized person with the necessary clearance can access sensitive data.
  • masking the sensitive data when accessed by users who do not have the necessary permissions, allowing them to see only anonymized versions of the data.
  • monitoring of access and modification activities on sensitive data (Auditing access to sensitive data). Any unusual or unauthorized activities can be flagged for investigation.

Steps for Discovering, Classifying or labelling columns that contain sensitive data in your database

The classification includes two metadata attributes:

  1. Labels: The main classification attributes, used to define the sensitivity level of the data stored in the column.

  2. Information types: Attributes that provide more granular information about the type of data stored in the column.

Step 1 -> Choose Information Protection policy based on your requirement

IPP Mode

SQL Information Protection policy is a built-in set of sensitivity labels and information types with discovery logic, which is native to the SQL logical server. You can also customize the policy, according to your organization's needs, for more information, see Customize the SQL information protection policy in Microsoft Defender for Cloud (Preview).

Step 2 -> View and apply classification recommendations

The classification engine automatically scans your database for columns containing potentially sensitive data and provides a list of recommended column classifications.

Classification Recommendation

  • After accepting recommendation for columns by selecting the check box in the left column and then select Accept selected recommendations to apply the selected recommendations.

You can also classify columns manually, as an alternative or in addition to the recommendation-based classification.

classify columns manually

To complete your classification, select Save in the Classification page.

Note: There is another option for data discovery and classification, which is Microsoft Purview, which is a unified data governance solution that helps manage and govern on-premises, multicloud, and software-as-a-service (SaaS) data. It can automate data discovery, lineage identification, and data classification. By producing a unified map of data assets and their relationships, it makes data easily discoverable.


Data Encryption

Data encryption is a fundamental component of data security, ensuring that information is safeguarded both at rest and in transit. So, Azure Synapse take care of this responsibility for us. It leverages robust encryption technologies to protect data.

Data at Rest

Azure offers various methods of encryption across its different services.

Azure Storage Encryption

By default, Azure Storage encrypts all data at rest using server-side encryption (SSE). It's enabled for all storage types (including ADLS Gen2) and cannot be disabled. SSE uses AES 256 to encrypts and decrypts data transparently. AES 256 stands for 256-bit Advanced Encryption Standard. AES 256 is one of the strongest block ciphers available and is FIPS 140-2 compliant.

Well, I know these sounds like some Hacking terms😅. But the platform itself manages the encryption key, so we don't have to understand these Hacking terms😅. Also, it forms the first layer of data encryption. This encryption applies to both user and system databases, including the master database.

Note: For additional security, Azure offers the option of double encryption. Infrastructure encryption uses a platform-managed key in conjunction with the SSE key, encrypting data twice with two different encryption algorithms and keys. This provides an extra layer of protection, ensuring that data at rest is highly secure.

Double the Protection with Transparent Data Encryption (TDE)

It is an industrial methodology that encrypts the underlying files of the database and not the data itself. This adds a second layer of data encryption. TDE performs real-time I/O encryption and decryption of the data at the page level. Each page is decrypted when it's read into memory and then encrypted before being written to disk. TDE encrypts the storage of an entire database by using a symmetric key called the Database Encryption Key. Means when data is written to the database, it is organized into pages and then TDE encrypts each page using DEK before it is written to disk, that makes it unreadable without the key. And when a page is read from disk into memory, TDE decrypts it using the DEK, making the data readable for normal database operations.

Why do we call it transparent?
because the encryption and decryption processes are transparent to applications and users, they have no idea that the data is encrypted or not, the only way they would know if they don't have access to it. This is because encryption and decryption happen at the database engine level, without requiring application awareness or involvement.

Enabling TDE

By default, TDE protects the database encryption key (DEK) with a built-in server certificate managed by Azure. However, organizations can opt for Bring Your Own Key (BYOK), that key can be securely stored in Azure Key Vault, offering enhanced control over encryption keys.

Data in transit

Data encryption in transit is equally crucial to protect sensitive information as it moves between clients and servers. Azure Synapse utilizes Transport Layer Security (TLS) to secure data in motion.

Azure Synapse, dedicated SQL pool, and serverless SQL pool use the Tabular Data Stream (TDS) protocol to communicate between the SQL pool endpoint and a client machine. TDS depends on Transport Layer Security (TLS) for channel encryption, ensuring all data packets are secured and encrypted between endpoint and client machine. It uses a signed server certificate from the Certificate Authority (CA) used for TLS encryption, managed by Microsoft. Azure Synapse supports data encryption in transit with TLS v1.2, using AES 256 encryption.

Top comments (0)