Keeping AI-Generated Code Secure: Balancing Risk and Efficiency

#guide #software

Artificial intelligence has made truly impressive strides in accelerating software development recently. I frequently leverage AI-assisted code generation for the backend of my side products or when developing new modules in a production ERP system. However, this speed comes at a cost: security. How "secure" automatically generated code is has always been a fundamental question mark for someone like me, who has been operating systems for many years and has personally experienced operational issues.

In this post, I will explain how I keep AI-generated code secure, the risks I've encountered, and how I balance efficiency and security. Finding a middle ground between the strict security policies I observed in an internal banking platform I set up a few years ago and the flexibility in my own small projects has always required experience.

The Two Faces of AI-Assisted Code Generation: Speed and Hidden Dangers

The speed AI offers in code generation is truly incredible. I can get a working draft in seconds, especially for writing boilerplate code, defining data models, or simple CRUD operations. In a production ERP system, creating basic API endpoints or a database schema for a new module could take days, but with AI, this time can be reduced to hours. This provides an invaluable advantage, especially for rapid prototyping or projects with tight deadlines. I recall saving about 2 days last month on a data integration project by getting models and basic processing logic for data from 15 different sources from AI.

However, behind this speed lie serious security risks. AI models can carry vulnerabilities, bad practices, or misconfigurations from their training datasets into the code. Sometimes, it might completely bypass an authorization mechanism for an endpoint and perform invalid token checks, or suggest a structure that logs sensitive data. Once, in a code snippet I received from AI for a backend service of my mobile application, I found a simple SQL injection vulnerability that directly inserted user input into a SQL query. This was a reflection of bad examples in the data the model was trained on. Therefore, every line of code from AI needs to be approached as if it came from a junior developer, or even more carefully. For me, this is not a situation to simply dismiss, because every security vulnerability that explodes in production means hours of debugging and fixing.

⚠️ Unexpected Risks in AI Code

The content of AI-generated code depends on the quality and security of the model's training data. Therefore, it is possible for AI-provided code to contain hidden vulnerabilities, outdated dependencies, or bad practices. Considering these risks during the development process and implementing comprehensive security checks is critically important.

Security Vulnerability Injection and Validation Mechanisms

AI models can sometimes, intentionally or unintentionally, inject security vulnerabilities into code. We call this "AI security vulnerability injection." This can happen by providing malicious input to the prompt or by the model generating incorrect or weak code during a "hallucination" moment. For example, AI-generated code for an API endpoint might directly use a user-provided parameter as a file path or send critical data without encryption. To catch such situations, I have a multi-layered approach.

First, I use static analysis tools (SAST). I've integrated tools like Bandit or Semgrep into my CI/CD pipeline for a FastAPI project. I run every AI-generated code block through these tools. Last week, I noticed that a Python function generated by AI used the shell=True argument by default in a subprocess.run() call. Bandit immediately warned me:

# Example AI-generated code (simple example)
import subprocess

def run_command_ai(command):
    # CAUTION: shell=True can pose a security risk!
    return subprocess.run(command, shell=True, capture_output=True, text=True)

# Warning we might get from Bandit scanning this code:
# B603: subprocess.run with shell=True is dangerous (CWE-78)

Thanks to this warning, I once again saw that AI sometimes opts for the easy way and can overlook security risks. Second, manual code review is indispensable. Automated tools cannot catch everything. An experienced eye is needed, especially for business logic-specific security vulnerabilities. Finally, in critical areas, I write unit and integration tests to cover not only functionality but also security scenarios. For example, I have tests that check whether fail2ban is triggered for invalid password attempts on a login endpoint or whether excessive requests from a specific IP are blocked by the rate limiting mechanism. These tests allow me to verify that the AI-generated code complies with my defined security policies.

Dependency Management and Outdated Libraries

A significant risk often overlooked when generating code with AI is dependency management. AI models typically use the most popular or most up-to-date libraries available at the time they were trained when creating code examples. However, specific versions of these libraries can become vulnerable or deprecated over time. Using an old and vulnerable library in a system running in a production environment can lead to disaster. Once, in the backend of my side product, I realized that a version of the requests library (around 2.20.0, I think) recommended by AI contained a known CVE (Request smuggling). If I hadn't had automated control mechanisms, it would have been much harder to detect this vulnerability.

To prevent such situations, I have two main strategies:

Automated Dependency Scanning: I regularly use tools like pip-audit or safety for my Python projects, and npm audit or yarn audit commands for my JavaScript projects. These tools compare my project's dependencies against known vulnerability databases and provide me with a report.
```
# Example pip-audit output
$ pip-audit -r requirements.txt

Found 1 vulnerability:
Package: urllib3
Version: 1.25.7
CVE: CVE-2020-26137
Description: HTTP request smuggling in urllib3
Fix: Upgrade to urllib3>=1.25.8
```
Such outputs allow me to immediately see the potential risks of an AI-recommended or automatically added dependency.
Strict Dependency Policies and Updates: When adding a new dependency, I always prefer the latest stable and security-patched version. Additionally, I receive automated dependency update pull requests with tools like dependabot and review them regularly. This reduces manual workload and keeps my systems up-to-date. Sometimes, even if a new version of a library introduces breaking changes, I prefer to make that adaptation rather than taking the security risk. I remember, in a client project, an old driver version remaining among the dependencies of a microservice working with PostgreSQL contained a bug that caused WAL bloat during replication. Such situations, even if not direct security vulnerabilities, can lead to operational instability and are among the risks posed by old dependencies.

With this approach, I try to minimize the hidden dependency risks that AI might bring while providing me with speed.

Prompt Engineering Strategies for Secure Output

The quality of the code we get from AI depends on the quality of the prompts we give it. This rule also applies to security. Instead of just saying "write me a login API," saying "write a JWT-based login API compliant with OWASP Top 10 principles, protected against SQL injection and XSS attacks" will result in a much more secure output. When working with AI, I enrich my prompts with security-focused terms.

Some prompt engineering strategies I use include:

Specifying Security Constraints: I communicate security requirements directly to the AI with phrases like "adhere to the principle of least privilege," "don't forget input validation," "encrypt all sensitive data," "implement rate limiting."
Role Assignment: Sometimes I ask the AI to take on roles like "review this code as a security expert and identify potential vulnerabilities" or "write a login flow in the role of an OWASP Top 10 auditor." This helps the model generate code from a security perspective.
Providing Example Secure Code: I can give the AI a small code snippet containing a specific security mechanism (e.g., using bcrypt for password hashing) and say, "follow this pattern." This helps the model learn correct implementations.

For example, for a FastAPI endpoint:

# Bad prompt example:
"Write a FastAPI endpoint that adds products with a POST request to `/items`."

# Security-focused prompt example:
"Write a FastAPI endpoint that adds products with a POST request to `/items`.
This endpoint must be protected with JWT authentication.
Perform strict input validation for the `name` and `description` fields in the incoming JSON body.
Implement measures against SQL injection and XSS.
Add float validation and positive number checks for the `price` value from the user.
Ensure all sensitive data is sanitized before saving to the database."

With such a detailed prompt, the security level of the initial code draft from AI significantly increases. In my experience, thanks to these types of prompts, the number of security vulnerabilities in the first version of AI-generated code decreased by 60-70%. Of course, this doesn't mean I won't perform final checks, but it sets the starting point on a much more solid foundation. Especially by using the RAG (Retrieval-Augmented Generation) pattern, I can feed my internal security guidelines or CVE data to the AI model to get more up-to-date and context-specific security recommendations.

Human Oversight and Automated Security Tests: Hand in Hand

No matter how good the code generated by AI is, human oversight and automated security tests are indispensable. My philosophy is to view AI as an assistant, never as a decision-maker. Especially every piece of code destined for production systems needs to pass through several layers.

The first layer is manual code review. In a production ERP system or a critical financial calculator, it is essential for someone else, preferably an experienced security expert or senior developer, to review an AI-generated code block. This is critical for catching business logic-specific vulnerabilities (e.g., an error in the authorization flow or a possible race condition in a transaction). Even in my own side products, I review critical AI-generated code blocks multiple times myself, considering different scenarios.

The second layer is automated security tests. This goes beyond static analysis (SAST) tools to include dynamic analysis (DAST), penetration tests, and dependency scanning. This is a crucial part of my CI/CD pipeline:

SAST (Static Application Security Testing): Analyzes the source code before it is compiled or run (like Bandit, Semgrep I mentioned above).
DAST (Dynamic Application Security Testing): Finds vulnerabilities by simulating external attacks while the application is running. In a client project, fuzzing requests sent to an AI-generated REST API with a DAST tool caught an unexpected Null Pointer Exception and an accompanying data leak.
SCA (Software Composition Analysis): Scans for known vulnerabilities in third-party libraries used (pip-audit, npm audit).
Configuration Audit: Checks if server, container, or network configurations comply with security standards. For example, whether TLS versions are correctly set in an Nginx reverse proxy configuration or whether security headers like X-Frame-Options are added.

ℹ️ Automated Security Integration with CI/CD

Passing AI-generated code through automated security tests before it reaches the production environment is vital to minimize risks. By integrating SAST, DAST, and SCA tools into your CI/CD pipeline, you can ensure continuous security auditing. This saves time for teams working with rapid iterations and facilitates early detection of security vulnerabilities.

These two layers allow me to use AI's speed without compromising security. I take AI-generated code for rapid prototyping, but on its journey to production, this code must pass through both human eyes and robotic controls. Last month, while optimizing PostgreSQL, I first tested a GIN index definition recommended by AI with different queries in the test environment, then checked its cost with explain analyze, and finally had a senior DBA friend review it. This is an approach that applies not only to security but also to performance.

AI Code in Production: Monitoring and Emergency Plans

Even after an AI-generated code snippet passes all security checks and reaches the production environment, my work isn't over. My operational experience has taught me that the best way to understand a system's true behavior is to monitor it in production. This also applies to AI-written code.

Monitoring: I implement comprehensive observability strategies to closely monitor the behavior of AI-generated code.

Metrics: I collect application performance metrics (latency, error rate, throughput). For example, for an AI-assisted production planning module, I track the duration and error rate of each planning cycle. If AI code causes a performance degradation, metrics immediately notify me. Once, I noticed that an AI-generated query was causing an N+1 problem in PostgreSQL when the query latency metrics on the dashboard suddenly spiked.
Logs: I forward logs from journald to a central system and monitor for anomalies. I especially link critical log levels (ERROR, CRITICAL) and specific keywords (e.g., "unauthorized access," "SQL error") to alarms. In case AI-generated code throws an unexpected error or a security breach attempt, logs instantly send me notifications.
Traces: In distributed systems, I use distributed tracing to understand the interaction of AI code with other services. This helps me visualize where a request gets stuck in the system or where an error originates.

Emergency Plans (Incident Response & Rollback): Despite the best precautions, problems can always occur. Therefore, I have a robust emergency plan for potential failures of AI-generated code:

Rollback Mechanisms: When deploying new AI-assisted code, I use blue-green or canary deployment strategies. This allows me to quickly revert to the old version if the new code causes a problem. In a production ERP system, when I saw CPU usage increase by 30% after deploying a new AI-enabled operator screen, I rolled back to the old version within 5 minutes, preventing an outage.
System Resource Limits: I keep cgroup limits tight for containerized AI services. By setting memory.high and memory.max values, I prevent AI code from running out of control and affecting the entire server. Last month, when I saw a service OOM-killed after writing sleep 360, I once again understood the importance of these limits and switched to a polling-wait mechanism.
Automated Alerts: I set up automated alerts on all this monitoring data with tools like Prometheus or Grafana. Database alarms like a WAL rotation alarm or abnormal rate limiting triggers in Nginx access logs can indicate potential security or performance issues triggered by AI code.

These monitoring and emergency plans allow me to leverage the efficiency AI provides while keeping potential operational and security risks under control. Because the real world always works differently from our expectations, and I always prefer to be prepared for these differences.

Conclusion: Taking Steps Forward with Known Risks

AI-driven code generation promises a revolutionary change in the software development world. In my more than 20 years of experience, technology advancing this rapidly is a rare sight. However, instead of blindly following this progress, understanding and managing the risks it brings is critical for long-term success. AI provides speed and efficiency, but security and stability have always been my priority, especially in enterprise environments or for my critical side products.

As I mentioned in this post, getting more secure outputs from AI with prompt engineering, implementing comprehensive automated security tests (SAST, DAST, SCA), not skipping manual code review, and finally having detailed monitoring and emergency plans in production are the keys to safely using AI-generated code. Let's not forget that AI is a powerful tool; like an axe, it can both cut down a forest and build a house. How we use it determines the final outcome. I will continue to benefit from AI by knowing the risks, in a measured and controlled manner. In my next post, I will share my PostgreSQL connection pool tuning and read replica routing strategies.