Ntombizakhona Mabaso

for AWS Community Builders

Posted on Jan 25

Explain Methods To Secure AI Systems

#aws #ai #aipractitioner #cloud

🤖 Exam Guide: AI Practitioner
Domain 5: Security, Compliance, and Governance for AI Solutions
📘Task Statement 5.1

🎯 Objectives

Domain 5 focuses on protecting data, models, and AI-powered applications in the real world. You’re expected to understand the main AWS security building blocks (IAM, encryption, PrivateLink, Macie, shared responsibility), plus AI-specific risks like prompt injection and data leakage, and how documentation (lineage, catalogs, model cards) supports governance.

1) AWS Services And Features Used To Secure AI Systems

1.1 IAM Roles, Policies, and Permissions

Control who/what can access AI services, data, and model endpoints.
Best Practice: least privilege aka grant only what’s needed and use roles instead of long-lived credentials.

1.2 Encryption

Protect sensitive data:
at rest: stored in S3, databases, logs
in transit: moving between services/users
Know the concept and why it matters for privacy and compliance.

1.3 Amazon Macie

Helps discover and protect sensitive data (PII) stored in Amazon S3.
Amazon Macie is Useful for reducing accidental exposure of regulated data in training sets, logs, or retrieval corpora.

1.4 AWS PrivateLink

Provides private connectivity to AWS services without traversing the public internet.
AWS PrivateLink is useful when AI workloads must remain within private networks for security/compliance reasons.

1.5 AWS Shared Responsibility Model

AWS secures the underlying cloud infrastructure and customers secure what they build/configure such as data, IAM, apps, network settings.
Know what is AWS’s responsibility vs the customer’s responsibility in AI solutions.

2) Source Citation And Documenting Data Origins

Governance Foundations

2.1 Source Citation

Attaching references to where an answer came from especially in RAG systems.
Benefits: improves trust, auditability, and helps users verify correctness.

2.2 Documenting Data Origins

1 Data lineage: tracking where data came from and how it changed over time.
2 Data cataloging: organizing datasets with metadata such as owner, sensitivity, schema, allowed use.
3 SageMaker Model Cards: Document model purpose, training/eval context, limitations, and considerations—helpful for governance and audits.

3) Best Practices For Secure Data Engineering

3.1 Assess Data Quality

Prevent “garbage in, garbage out” and reduce risk of training on corrupted or biased data.
Assessing Data Quality includes checking label quality, duplicates, missing values, outliers, and integrity.

3.2 Privacy-Enhancing Technologies (PETs)

Techniques that reduce privacy risk while enabling analytics/ML.
Examples: anonymization/pseudonymization, tokenization, differential privacy.

3.3 Data Access Control

Limit who can read/write sensitive datasets, features, prompts, and logs.
Use IAM + resource policies + least privilege and segment access by environment (dev/test/prod, etc).

3.4 Data Integrity

Ensure data is not altered maliciously or accidentally.
Data Integrity includes versioning, checksums/hashes, controlled pipelines, audit logs.

4) Security And Privacy Considerations For AI Systems

4.1 Application Security

Secure the app layer: authentication, authorization, input validation, secure APIs, rate limiting.

4.2 Threat Detection

Detect abnormal access patterns, data exfiltration attempts, suspicious usage spikes, or policy violations.

4.3 Vulnerability Management

Patch dependencies, scan containers, and manage CVEs in runtime environments and libraries.

4.4 Infrastructure Protection

Network segmentation, private connectivity where needed, least privilege IAM, secure endpoints.

4.5 Prompt Injection

Malicious input tries to override system instructions or exfiltrate data which is especially risky with RAG.
Mitigations:
1 input filtering,
2 strict tool permissions,
3 grounding rules,
4 guardrails,
5 isolating sensitive context,
6 and not trusting retrieved/user text as “instructions.”

4.6 Encryption At Rest And In Transit

Protects data used for:
1 training/fine-tuning datasets
2 embedding/vector stores
3 prompts and conversation history
4 logs and monitoring outputs

💡 Quick Questions

1 What AWS feature is used to enforce least-privilege access to AI resources?
2 What does Amazon Macie help you detect in S3?
3 Why is source citation valuable in a RAG-based assistant?
4 Name two secure data engineering practices that reduce privacy or integrity risk.
5 What is prompt injection, and why is it a security concern for GenAI apps?

Additional Resources

✅ Answers to Quick Questions

1 IAM roles and policies or permissions.

2 Sensitive data such as PII and other regulated/sensitive content in S3.

3 It improves trust and auditability by showing where answers came from so users can verify them and you can trace issues.

4 Data access control through least privilege and privacy-enhancing techniques such anonymization or tokenization.
also valid: data quality checks, integrity controls like versioning/audit logs.

5 Malicious prompts that try to override instructions or extract secrets, it’s dangerous because untrusted user/retrieved text can cause unsafe actions, data leakage, or policy violations.

DEV Community