DEV Community

丁久
丁久

Posted on • Originally published at dingjiu1989-hue.github.io

Data Classification

This article was originally published on AI Study Room. For the full version with working code examples and related articles, visit the original post.

Data Classification

Data Classification

Data Classification

Data Classification

Data Classification

Data Classification

Data Classification

Data Classification

Data Classification

Why Classify Data?

Data classification ensures sensitive information receives appropriate protection. Without classification, you either over-protect everything (wasting resources) or under-protect critical data (inviting breaches).

Classification Levels

Define clear tiers:

| Level | Label | Examples | Controls | |-------|-------|----------|----------| | 4 | Restricted | PII, trade secrets | Encryption, MFA, DLP | | 3 | Confidential | Financial reports | Encryption at rest | | 2 | Internal | HR policies | Access control | | 1 | Public | Marketing materials | No restrictions |

Automated Classification

Use content inspection to classify data automatically:

import re

import hashlib

class DataClassifier:

def init(self):

self.patterns = {

"ssn": r"\d{3}-\d{2}-\d{4}",

"email": r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.[a-zA-Z]{2,}",

"credit_card": r"\b(?:\d[ -]*?){13,16}\b"

}

def classify_document(self, content, metadata):

score = 0

findings = []

for label, pattern in self.patterns.items():

matches = re.findall(pattern, content)

if matches:

score += len(matches) * 10

findings.append({"type": label, "count": len(matches)})

if score > 50:

return "restricted", findings

elif score > 10:

return "confidential", findings

elif metadata.get("internal"):

return "internal", findings

return "public", findings

Handling Procedures

Define procedures for each classification level:

handling-policies.yaml

restricted:

storage: encrypted_bucket_kms

transmission: require_tls_1.3

retention: 7_years

destruction: shred_and_degauss

sharing: require_nda_and_approval


Read the full article on AI Study Room for complete code examples, comparison tables, and related resources.

Found this useful? Check out more developer guides and tool comparisons on AI Study Room.

Top comments (0)