We live in a time when artificial intelligence has moved from being an emerging technology to a critical infrastructure. Algorithms decide what information we see, what transactions are suspicious, what diagnoses are likely, and increasingly, what decisions are optimal for organizations and governments. Companies like OpenAI, Google DeepMind, and Anthropic have accelerated the development of advanced models, making artificial intelligence a central element of the digital economy. In this context, security can no longer be an afterthought or a reactive mechanism. It must be an integral part of the system architecture from the moment of its conception.
Over the past two decades, digital transformation has been accelerated by cloud, mobility, and big data. In recent years, however, artificial intelligence (AI) has become the main driver of innovation.
But this revolution comes at a cost: the attack surface is growing exponentially.
Security by Design is not just about installing additional controls or enforcing stricter access policies. It means deeply integrating security principles into the technological DNA of a system. In the AI era, this approach becomes vital, because the complexity of models, their dependence on massive data, and the often opaque nature of algorithms create an environment in which vulnerabilities can be subtle but devastating.
The concept of Security by Design (SbD) involves integrating security from the design phase of a system, not as a later stage or a “patch” after vulnerabilities appear. In the AI era, this approach becomes critical because:
- AI models are dependent on massive data.
- Algorithms can be manipulated.
- Automated decisions can have a major impact on users.
- AI systems can be exploited in new ways (prompt injection, model inversion, data poisoning).
Security by Design in the AI era means responsible design, technical resilience, and solid governance.
From traditional security to algorithmic security
The concept of Security by Design is not new. It has been promoted since the 1990s in the field of application security and IT infrastructure. It initially appeared in the field of software security and IT infrastructures, being supported and formalized by organizations such as the National Institute of Standards and Technology (NIST) that developed frameworks such as the NIST Cybersecurity Framework, which promotes the integration of security into all stages of the life cycle. The idea was simple: it is more efficient and safer to prevent vulnerabilities by design than to correct them after the system has been compromised.
Classic principles include:
- Principle of least privilege
- Defense in depth
- Fail secure
- Zero Trust
In traditional systems, security focused on perimeter protection, authentication, encryption and vulnerability management. In the case of artificial intelligence, however, the object of protection is no longer only the infrastructure, but also the model itself. The algorithm becomes a critical asset. Training data becomes an attack surface. Automated decisions can have major legal and ethical implications.
Thus, security must be extended beyond servers and networks, to the algorithmic and epistemic level of the system, respectively, in the AI context, traditional principles must be extended to cover:
- Training data security
- Model integrity
- Robustness against adversarial attacks
- Explicability and auditability
Vulnerabilities specific to the AI era
Data poisoning
An AI system can be compromised not only through unauthorized access, but also through manipulation of the data on which it relies. Data poisoning attacks, in which malicious data is introduced into the training set, can subtly alter the behavior of the model. The result is not an obvious error, but a strategic degradation of performance or an intentional deviation of decisions.
Data poisoning is a type of attack in which the adversary compromises the integrity of an artificial intelligence model by deliberately introducing malicious data into the training set. Unlike attacks that target the system after implementation, data poisoning acts “at the source”, affecting the learning process of the model and influencing its long-term behavior. The injected data can be designed either to degrade the overall performance (unavailability or decreased accuracy) or to create a “backdoor” so that the model reacts erroneously only in the presence of a specific pattern or trigger. For example, a spam detection system could be trained with manipulated messages so that certain malicious expressions are later considered legitimate. The vulnerability is amplified in scenarios where the data comes from open, collaborative or automatically collected sources, without rigorous validation. From a Security by Design perspective, preventing data poisoning involves strict verification of the provenance of the data, auditing of datasets, anomaly detection mechanisms and controlled separation of data streams used for training.
Adversarial Attacks
Adversarial attacks are another sophisticated category of threats in which an adversary subtly manipulates the input data of an artificial intelligence model to cause systematic classification or decision errors, without the changes being obvious to human users. In the case of computer vision models, for example, adding minimal perturbations - invisible to the naked eye - to an image can cause the system to misidentify an object (a traffic sign can be misclassified, with serious implications for autonomous vehicles). Similarly, in natural language processing, the insertion of specially constructed linguistic structures or tokens can skew the interpretation of the model. These attacks exploit the mathematical sensitivity of neural networks to small variations in the multidimensional data space, demonstrating that high performance on standard data does not guarantee robustness under adversarial conditions. From a Security by Design perspective, countermeasures include adversarial training, robust input validation, and monitoring for abnormal model behavior in production.
Model Inversion & Data Extraction
Model Inversion is an attack technique in which an adversary attempts to reconstruct sensitive information about the data used to train a model, using only access to the final (black-box or sometimes white-box) model. The central idea is that machine learning models “retain” to some extent statistical characteristics of the training data. If the model is strategically interrogated, the attacker can approximate individual data or sensitive features associated with a particular user. For example, in a facial recognition system or a medical model trained on clinical data, an attacker could reconstruct facial features or information about a specific patient. The vulnerability arises especially when the model is overfitted or when its responses provide detailed probability scores, which can be exploited for reverse inference.
Data Extraction (or model extraction / training data extraction) goes even further, aiming at directly extracting fragments of data stored by the model during training. In the case of large language models, this can mean regenerating portions of sensitive texts accidentally included in datasets (personal data, API keys, confidential information). Attackers use iterative queries, strategic formulations or optimization techniques to “squeeze” the model of stored information. The risk is amplified when the models are integrated into public applications and provide very detailed answers. From a Security by Design perspective, protection against these attacks involves limiting the granularity of the outputs, using regularization and differential privacy techniques in the training phase, as well as implementing robust mechanisms for monitoring and filtering the generated answers.
Prompt Injection (in LLMs)
Unlike traditional attacks on web applications or infrastructure, prompt injection does not exploit a classic programming bug, but the very probabilistic and contextual nature of the model.
Prompt Injection is a technique by which a user inserts malicious instructions into an apparently legitimate input, with the aim of modifying the behavior of the model and causing it to ignore the rules or restrictions established by the system.
Typically, a system based on LLM works as follows:
- There is a system prompt (internal instructions, invisible to the user).
- There is a user prompt (user input).
- The model generates a response based on the entire context.
The vulnerability arises because the model treats all text as a sequence of tokens, without making a rigid structural distinction between system and user instructions. Thus, a user can try to "overwrite" the initial rules with a specially constructed input.
Security by Design in the AI era involves anticipating these scenarios and designing the system so that it is robust, resilient and able to detect behavioral deviations.
Integrating Security into the AI Lifecycle
An AI system goes through several stages: data collection, model training, validation, deployment, and ongoing operation. Each of these phases should be treated as a critical control point.
In the data collection phase, security means verifying sources, ensuring the integrity of datasets, and protecting sensitive data through anonymization or pseudonymization. Data is the foundation of the model; if this foundation is compromised, the entire system becomes fragile.
In the training phase, the infrastructure must be isolated and monitored. Models must be versioned, and each experiment must be documented to allow for later auditing. The integrity of generated artifacts must be verified through cryptographic mechanisms, and access to computing resources must be limited according to the principle of least privilege.
At the time of deployment, API security becomes essential. Rate limiting, multifactor authentication, and monitoring for abnormal behavior are measures that reduce the risk of system exploitation. In the operational phase, continuous monitoring of model performance and detection of model drift are essential to prevent degradation or manipulation of its behavior.
Security by Design therefore means a continuity of control, not a single event.
Zero Trust and the new paradigm of trust
In the AI era, the concept of trust must be rethought. The Zero Trust model, which assumes that no user or system is implicitly trusted, becomes extremely relevant. Access to AI models and associated data should only be granted based on clear and verifiable policies.
This approach is all the more important as AI systems are integrated into complex ecosystems, distributed in the cloud, and connected to multiple data sources. The lack of segmentation or granular access controls can turn a minor incident into a major breach.
Zero Trust applied to AI is not limited to authentication; it also involves continuous validation of system behavior, verification of model integrity, and permanent analysis of user interactions.
Regulation, governance and accountability
As AI becomes critical infrastructure, regulation becomes inevitable. The European Union has introduced the AI Act, which classifies AI systems according to risk and imposes strict requirements for those considered high-risk. In parallel, the GDPR establishes clear obligations on data protection and the right to explanation.
These legislative frameworks are not obstacles to innovation, but catalysts for the adoption of Security by Design. They force organizations to document processes, implement audit mechanisms and ensure transparency.
AI governance thus becomes a central element of security. It is not enough for a system to perform; it must be accountable, explainable and compliant with legal norms.
The ethical dimension of security
Security in the age of AI is not just a technical issue. It is also an ethical one. A model that discriminates or produces biased results can generate damage as serious as a data breach.
Companies like Microsoft and IBM have developed Responsible AI frameworks that include principles of fairness, transparency, and accountability. These initiatives show that Security by Design must also include protection against social and moral risks.
Ultimately, security is not just about protecting the system, but also protecting the people affected by its decisions.
Red Teaming and Operational Resilience
A secure system is not one that has not been attacked, but one that has been rigorously tested and proven resilient.
Red Teaming in the context of AI systems is a structured process through which specialized teams simulate real-world attacks to identify vulnerabilities before they are exploited in the operational environment, directly contributing to increasing the resilience of the system. Unlike traditional testing, red teaming involves a creative adversarial approach, in which experts try to circumvent model restrictions through prompt injection, adversarial attacks, data exfiltration attempts, or manipulation of algorithmic behavior. The goal is not just to discover specific technical errors, but to assess the ability of the entire architecture - model, infrastructure, access controls, and organizational processes - to withstand real-world pressures. By integrating red teaming into the continuous development and operations (MLOps) cycle, organizations can transform security from a reaction to incidents into a proactive mechanism for strengthening operational resilience, ensuring the safe and stable operation of AI systems in dynamic and potentially hostile conditions.
This practice transforms security from a defensive to a proactive process. Instead of reacting to incidents, organizations anticipate and model risk scenarios.
Future Challenges
As AI evolves towards autonomous systems and agents capable of making complex decisions independently, the attack surface will continue to grow. The integration of AI into critical infrastructure, financial systems, or healthcare systems will amplify the potential impact of vulnerabilities.
Security by Design must evolve with technology. Collaboration between engineers, security experts, lawyers, ethicists, and policymakers will be required. Without this interdisciplinary approach, the complexity of AI systems may outstrip our ability to control them.
Conclusion: Security as the Foundation of Trust
In the age of artificial intelligence, security is no longer a technical detail, but the foundation of digital trust. Without Security by Design, AI systems can become vulnerable, manipulable, and potentially dangerous tools. With Security by Design, they can become catalysts for progress, supporting innovation in a responsible and sustainable way.
Building secure AI is not about slowing down development, but making it sustainable. In a world where algorithms increasingly influence reality, security becomes the invisible architecture that supports the future.
Bibliography
-
Pearlson, Keri & Novaes Neto, Nelson. „What is Secure-by-Design AI?”
-
ETSI EN 304 223 - Securing Artificial Intelligence (SAI)
-
UK Government - Code of Practice for the Cyber Security of AI
-
Prasad, Anand. „A Policy Roadmap for Secure by Design AI: Building Trust Through Security-First Development”
-
Abbas, Rianat et al. „Secure by design - enhancing software products with AI-Driven security measures”
Top comments (0)