DEV Community

Alec Dutcher
Alec Dutcher

Posted on • Edited on

1

DP-203 Study Guide - Implement data security

Study guide

Implement data masking

  • Dynamic data masking
    • Prevents unauthorized access by limiting exposure of sensitive data
    • Configure masking policies on database fields to designate how much data to reveal to nonprivileged users
    • Available in:
      • Azure SQL Database
      • Azure SQL Managed Instance
      • Azure Synapse Analytics Dedicated SQL pools
      • SQL Server on Azure VMs
  • Masking policies
    • Created in the Security section in the portal or via T-SQL
    • Components of a policy
      • SQL users excluded from masking
      • Masking rule
      • Masking function
  • Masking functions
    • Default - predefined, fully masks a field, replaces values with XXXX
    • Email - replaces portion of address
    • Credit card - replaces everything but last four digits with XXXX
    • Custom text - replace with custom string (consists of exposed prefix, padding string, and exposed suffix)
    • Random number - replaces values with randomly generated values of the same data type and length (T-SQL function 'random(1,45)' where 1,45 are the low and high ends of the range)

Encrypt data at rest and in motion

  • Encryption
    • Uses a key to encrypt and decrypt data
    • Disguises the data through a process of symmetric encryption
    • Encryption key is stored in a secure location such as Key Vault
  • Encryption at rest
    • Encrypting data in a physical location
    • Azure Storage uses managed keys (customer or Microsoft managed)
    • Azure SQL and Synapse SQL use transparent data encryption (TDE) also with service- and customer-managed keys
      • TDE
        • Real-time I/O encryption and decryption of a backup
        • Prevents malicious party from restoring the backup
        • In the portal (Azure SQL) under the Security section, there is a TDE option with an on/off toggle
  • Always Encrypted
    • Protects data in Azure SQL, Azure SQL Managed Instance, and SQL Server databases by encrypting it inside the client application and never revealing the encryption key to the database engine
    • Allows separation of people who view data and those who manage data
    • Uses two types of keys
      • Column encryption keys (CEK) - used to encrypt data in a column
      • Column master keys (CMK) - used to encrypt a CEK
    • Supports two types of encryption
      • Randomized encryption
        • less predictable
        • more secure
        • prevents searching, grouping, indexing, and joining on encrypted columns
      • Deterministic encryption
        • always generates the same encryption value for a given plain text value
        • has opposite qualities of randomized
  • Encryption in motion
    • Securing data moving from one network location to another
    • Solution is transport layer security (TLS)
      • Enabled by default Azure Synapse SQL
      • Can be enabled in Azure Storage via Settings and Configuration

Implement row-level and column-level security

  • Row-level security
    • Restricts records in a table based on user running query
    • Not permission based, but predicate based (rows are hidden based on whether predicate condition is true or false)
    • Security policy defines users and predicates (inline table-valued functions)
    • Policies are created in T-SQL

Image description

Image description

  • RLS best practices

    • Create separate schema for security predicate function
    • Avoid recursion in predicate functions to prevent performance degradation
    • Components need to be dropped in a specific order if RLS is no longer used
      • 1) Security policy
      • 2) Table
      • 3) Function
      • 4) Schemas
    • Avoid excessive table joins in predicate function
  • Column level security

    • Controls access to columns based on user context
    • Configured using GRANT SELECT statement and specifying columns and user

Implement Azure role-based access control (RBAC)

  • Authorization for Azure Data Lake is controlled via
    • Shared key authorization
    • Shared access signature (SAS)
    • Role-based access control (RBAC)
    • Access Control Lists (ACL)
  • RBAC
    • Uses role assignment to apply permissions to security principals (users, groups, managed identities, etc)
    • Can limit access to files, folders, containers, and accounts
    • Roles
      • Storage blob data owner (full container access)
      • Storage blob data contributor (read, write, delete)
      • Storage blob data reader (read, list)

Implement POSIX-like access control lists (ACLs) for Data Lake Storage Gen2

  • ACL
    • Holds rules that grant or deny access to certain environments
    • RBAC is course grained (rice) vs ACL which is fine-grained (sugar)
    • Roles are determined before ACL is applied (if user has RBAC the operation succeeds, if not it falls to ACL)

Implement a data retention policy

  • Can be set on Azure SQL (long-term retention) or Azure Storage (Lifecycle Management)
  • Long-term retention
    • Automatically retain backups in separate blob container for up to 10 years
    • Can be used to recover database through portal, CLI, or Powershell
    • Enabled by defining policy with four parameters
      • Weekly (W)
      • Monthly (M)
      • Yearly (Y)
      • Week of the year (WeekofYear)
  • Lifecycle Management
    • Automated way to tier down files to cool and archive based on modified date
      • Enabled by creating a policy with one or more rules
      • Choose from number of days since blob was created, modified, or accessed (can enable access tracking)

Implement secure endpoints (private and public)

  • Endpoint is an address exposed by a web app to communicate with external entities
  • Service Endpoint
    • Secure and direct access to Azure service/resource over the Azure network
    • Firewall security feature
    • Virtual network rule
    • Allows for private IPs, but still uses a public address
    • Works on Azure SQL, Synapse, and Storage
  • Private link
    • Carries traffic privately so traffic between virtual network and Azure Service travels through the Microsoft Network
    • Uses private address on VNet instead of public address like Service Endpoint

Implement resource tokens in Azure Databricks

  • Token is an authentication method that uses a personal access token (PAT) to connect via REST API
  • PAT
    • Can be used instead of passwords
    • Enabled by default
    • Set expiration date or indefinite lifetime
    • Disabled, monitored, and revoked by workspace admins
  • Create the PAT in the Databricks portal, then use it when setting up the linked service in Azure Synapse

Load a DataFrame with sensitive information

  • Done through encryption using Fernet
  • Fernet

    • Symmetric authenticated cryptography (uses a secret key)
    • from cryptography.fernet import Fernet
    • encryptionKey = Fernet.generate_key()
  • Create master key

    Image description

  • Create UDFs to encrypt/decrypt

    Image description

  • Use UDFs to encrypt/decrypt

    Image description

Write encrypted data to tables or Parquet files

  • Write encrypted data to a table
    • df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable("Table")
  • Write encrypted data to a parquet file
    • encrypted.write.mode("overwrite").parquet("container_address/file_path")

Manage sensitive information

  • Data discovery and classification - discovering, classifying, labeling, and reporting the sensitive data in your databases
  • Capabilities
    • Discovery and recommendations
    • Labeling - apply sensitive classification labels to columns using metadata attributes
    • Query result-set sensitivity - calculates the sensitivity of a query result in real-time
    • Visibility - view DB classification state in a dashboard
  • Defender for Cloud
    • Cloud-native application protection platform (CNAPP)
    • Set of security measures and practices to protect cloud-based apps
    • Continuous monitoring, alerts, and threat mitigation
    • Separate services for Storage and SQL
  • Defender for SQL
    • Discover and mitigate database vulnerabilities
    • Alerts on anomalous activities
    • Performs vulnerability assessments and Advanced Threat Protection
  • Defender for Storage
    • Detects potential threats to storage accounts
    • Prevents three major impacts
      • Malicious file uploads
      • Sensitive data exfiltration
      • Data corruption
    • Includes
      • Activity monitoring
      • Sensitive data threat detection
      • Malware scanning

AWS GenAI LIVE image

How is generative AI increasing efficiency?

Join AWS GenAI LIVE! to find out how gen AI is reshaping productivity, streamlining processes, and driving innovation.

Learn more

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay