Ecaterina Teodoroiu

Posted on Feb 20 • Originally published at thedatascientist.com

Secure by Design: Building AI data Analytics Platforms Enterprises Can Trust

#webdev #ai #python #devops

By Tarun Chauhan(Senior Software Engineer at AWS)

Security plays a critical role in adoption of AI data analytics platforms by enterprises. In this article we will discuss the unique security challenges faced by data analytics platforms and design principles that need to be kept in mind while building an AI data analytics platform enterprises can trust. As a Senior Software Engineer at AWS, I have built multiple critical data security services for data analytics products. I have relied on these tenets as guiding principles while designing these services for the AWS OpenSearch and Amazon FinSpace teams hence they are battle-tested and proven to work at massive scale required by big enterprises.

Why trust is bottleneck for AI data analytics –

Models are becoming powerful fast these days, data is everywhere yet enterprise adoption has been slow for AI products due to lack of trust by enterprise customers.
For enterprises proprietary data is their most valuable asset hence protecting that is top priority for them while integrating any AI data analytics system.
Trust is earned through robust and fail-safe security architectures.

Platforms failing to treat security as a high priority design concern fail the serious enterprise scrutiny that enterprise customers apply.

I have seen this first-hand with AWS Bedrock, where a customer’s number one concern when onboarding to the platform is the guardrails and security measures surrounding their data.

Why “Security as a Feature” Fails in AI Data Analytics –

Analytics platforms built with security to be added as a feature later often fail at scale for enterprise use cases hence it is important to design the architecture of the platform keeping security as a key tenet of the design.

Poorly designed systems from a security standpoint often result in data leaks and compliance issues whose consequences could be pretty severe for the enterprise customer. If the wrong users can access data, or if permissions are applied inconsistently across pipelines, the analytics output itself becomes untrustworthy.

At AWS, before the first line of code is even written, architectural designs are reviewed for security vulnerabilities. This helps us identify potential issues early on. This level of early review has helped AWS gain industry leadership in security and is a practice that should be followed when building any new analytics product.

Unique security challenges faced by AI analytics platforms –

AI data analytics platforms face unique security challenges compared to CRUD(Create, Read, Update, Delete) applications.

Some of these challenges are –

They aggregate data from a variety of sources – internal systems, third party APIs, user generated data and derived datasets. Each source may have different access constraints and schemas. At AWS, this often involved managing data received from various services like Amazon DynamoDB, Amazon Kinesis Streams and external vendors.
Analytics systems generate derived insights from raw data. Even if raw data is protected, model outputs can sometimes expose sensitive data through inference. During the development and testing of the AWS Bedrock platform, I frequently observed that without proper guardrails and security measures, models could sometimes expose sensitive data.
AI pipelines stay for a long time. Data persists, changes and gets reinterpreted over time. A permission mistake early in the pipeline can propagate silently across the system and cause issues over time. At AWS we have pipelines that are several years old and engineers who set those up have left so it’s often hard to regain context and fix underlying issues. So one can imagine how similar gaps can wreak havoc on permission sensitive data pipelines.
Analytics platforms have to serve many roles simultaneously: analysts, executives, automated systems and external customer integrations. Static role based access models are not capable of handling such complex access requirements. Even at AWS, while the AWS IAM service provided robust static role permissioning, we still had to build specialized security services for granular access within the OpenSearch data analytics product.

Secure-by-Design Principles for AI Analytics Platforms –

Following principles should serve as guidelines for building a secure data analytics platform –

1. Data aware access control –

Traditional role based access control works for applications with simple data boundaries but for analytics platforms we need data access level control like –

Which rows of data a user is allowed to see
Which attributes are sensitive
The context in which insights are generated
Hence data analytics system security requires data-aware access control apart from user-aware access control. Without these controls systems can overexpose data or restrict access so aggressively that analytics loses value. At AWS, we had to build a data access security service with granularity down to the Amazon DynamoDB row items for AWS OpenSearch, which showcases the level of precision required for modern data analytics products.

2. Ease of Data Audit –

In AI analytics, transparency is part of security hence ease of audit i.e. Knowing where data came from, how it was transformed, and which models touched – it is not just an observability concern, it is a security requirement. At AWS, often during major outages and operational reviews we have to perform data audits hence making that process easy is usually a primary concern during initial design reviews for data analytics services.

3. Model Access Is Not the Same as Data Access –

One common mistake many platforms make is equating model access with data access.

Allowing a user or system to query a model does not mean it should have visibility into the underlying data. Without clear separation, model interfaces can become unintended backdoors for data leaks.

Secure analytics platforms should treat model invocation, training, and inspection as distinct permission domains. At AWS Bedrock we developed special guardrail services to prevent unauthorized data access while allowing model access and a similar design can be followed here as well.

4. Isolated Execution is a security boundary –

Containerized execution can provide an additional layer of security for analytics applications by enforcing strong isolation boundaries.

In public cloud–based applications and services, it becomes essential to ensure that customer data is processed only within the containerized execution environment and does not escape those boundaries.

This approach provides stronger assurances to customers that their data remains confined within the defined security isolation and is protected throughout the analytics workflow.

At AWS Finspace(Financial analytics product) and Bedrock this containerized based approach was frequently used for isolated execution and providing an extra layer of security for highly confidential data like Finance data and other proprietary company data.

5. Network Boundaries Encode Trust Assumptions –

In enterprise analytics systems, network architecture is a core part of the security design.

Virtual private networks and isolated network segments are critical to analytics system architecture as they help define clear trust boundaries.

Analytics pipelines that span data ingestion, transformation, model execution, and consumption layers need to respect these boundaries explicitly.

When data is allowed to move freely across network domains without well defined controls, it becomes harder later to audit the access rules.

Treating network boundaries as first level security control helps enterprises understand more clearly about data exposure, compliance scope and how failures are contained.

At AWS, AWS VPC is the most widely used service and no secure design is complete without use of it.

My lessons from operating at scale working at AWS –

Systems running at scale often expose issues later related to security. Trust boundaries that appear clear early on eventually break down. Defaults that initially feel safe turn into liabilities over time when handling millions of requests. Shared infrastructure also introduces ambiguity that becomes increasingly difficult to manage and keep clear security boundaries, especially under operational stress.

I have seen this first hand with multiple outages and COEs(Correction of Errors) related to a bad configuration, improper classification of services in shared EC2 instances, inadequate throttling configurations causing excessive throttling etc.

At scale, security failures aren’t always loud or obvious. They are usually quiet, slow-moving problems that aren’t even noticeable until the damage is already done. A truly secure by design system doesn’t just work in a perfect world. It assumes that configurations will drift, credentials will leak, and parts of the system will fail. The goal isn’t just to prevent these things on paper—it’s to limit the blast radius so that we can contain the damage when the inevitable happens. At AWS, multiple outages and COEs have embedded this reality in our design philosophy and now our early design reviews specifically incorporate these lessons to prevent future failures.

The Hidden Risk of Shared Analytics Infrastructure –

Many analytics platforms rely on shared clusters and execution environments to optimize for cost. While efficient, this approach reduces security guarantees. When multiple datasets, teams, and models share execution contexts, isolation becomes more theoretical and doesn’t get enforced well in actual production environments. Over time, it becomes unclear which workloads can observe which data, and under what conditions.

Production ready analytics platforms enforce isolation at the execution and network layer, even when it is expensive operationally. I have seen multiple outages and COEs at AWS due to multiple services running on the same EC2 instance in a bid to reduce operational cost. But ultimately they had to separate out because of the operational and security challenges faced later.

Why Startups Underestimate Enterprise Security Requirements –

Startups are under pressure to deliver products and features quickly. Security features are often delayed with the assumption that it can be addressed once traction is achieved. However in analytics platforms, this assumption can be very risky.

Apart from judging the analytics engine on how good the analytics insights are, enterprises also judge the analytics solutions on security liabilities. Platforms that cannot clearly showcase access restrictions, easy audit, and governance often don’t pass the first security checks of enterprises. Security shortcuts taken early often become architectural constraints that are expensive and sometimes impossible to undo.

I have seen these challenges first hand with AWS Finspace which created financial analytics products for the big financial institutions and how difficult it is to pass their rigorous security checks for a product to be considered by them for adoption.

Conclusion: Trust Is the Real Competitive Advantage in AI Analytics –

The future of AI analytics won’t be won by model complexity alone. The platforms that succeed will be the ones that enterprises actually trust with their most sensitive data. This requires a system where security is a foundational requirement, not something added in the end. In this industry, trust isn’t a marketing slogan – it’s the direct result of how the architecture is built.

About the Author

Tarun Chauhan is a Senior Software Engineer at AWS (Amazon) with 11 years of experience designing and building end-to-end large-scale distributed systems using Cloud(AWS), Android/iOs, Backend technologies. He has designed and built critical data security and data infrastructure services for AWS OpenSearch, AWS FinSpace, and AWS Bedrock.

This Post Originally Posted on https://thedatascientist.com/

DEV Community