<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jérôme Van Der Linden</title>
    <description>The latest articles on DEV Community by Jérôme Van Der Linden (@jeromevdl).</description>
    <link>https://dev.to/jeromevdl</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F189803%2F88b26679-fa0d-4f52-96b6-104c2cf50a1c.jpeg</url>
      <title>DEV Community: Jérôme Van Der Linden</title>
      <link>https://dev.to/jeromevdl</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jeromevdl"/>
    <language>en</language>
    <item>
      <title>AI‑Assisted Software Development — 6 Pitfalls to Avoid</title>
      <dc:creator>Jérôme Van Der Linden</dc:creator>
      <pubDate>Mon, 05 Jan 2026 13:26:54 +0000</pubDate>
      <link>https://dev.to/jeromevdl/ai-assisted-software-development-6-pitfalls-to-avoid-2k3k</link>
      <guid>https://dev.to/jeromevdl/ai-assisted-software-development-6-pitfalls-to-avoid-2k3k</guid>
      <description>&lt;p&gt;Generative AI isn't just another tool, it's rewriting how we build software. Tools like Cursor and Claude Code, endless LinkedIn hype threads, and YouTube '10x productivity' demos have flooded the space. But most teams chasing the hype are about to learn a hard lesson: speed without discipline creates chaos faster than it creates value.&lt;/p&gt;

&lt;p&gt;Most people see the vibe-coding demos and think "let me trade my VS Code for an AI version of it (Kiro, Cursor, etc…) and I'll be the king". They'll produce code faster, sure. But without good practices, proper specifications and design, test harness and review process, they'll write &lt;strong&gt;10x faster code that's 10x messier&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This blog post details 6 concrete pitfalls I've seen - and sometimes fallen into myself - and recommendations to avoid them.&lt;/p&gt;

&lt;p&gt;Disclaimer: this blog post represents my personal thoughts and is not endorsed by AWS.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Vibe-coding without a plan
&lt;/h2&gt;

&lt;p&gt;I wanted to title this blog post "AI-Driven Development - 10 pitfalls to avoid" but that's wrong: developers drive the AI, not the other way around!&lt;/p&gt;

&lt;p&gt;Vibe-coding is a bit like enabling the auto-pilot mode in your car: you might arrive at destination but not sure it will be the safest and shortest trip.&lt;/p&gt;

&lt;p&gt;With a powerful assistant in your editor, it's tempting to jump straight into prompting and code generation: "Generate a service that does X, Y, Z." Ten minutes later, you have a shiny pull request with ten new files that introduce a new logging library, violate your coding standards, don't respect the project structure, and so on.&lt;br&gt;
Then you end up in a long conversation where requirements (if we can call them that), design (eventually) and implementation details are all mixed together. It results in an infinite back-and-forth between the developers and the AI.&lt;/p&gt;

&lt;p&gt;Of course, you can also ship the code directly without any code review, it will come back sooner than expected, like a boomerang, in the form of a production bug. In the end, it's a complete waste of time and what was supposed to make you x times faster finally slows you down (bug fix, rewrite, etc.).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommendations&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You will see everywhere "Spec-driven development" as if it were the new "agile", but let's face it, it's not new! Go back 20+ years in the past with the "&lt;a href="https://medium.com/r/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FV-model" rel="noopener noreferrer"&gt;V model&lt;/a&gt;" and you'll see nothing was invented: you start by defining the requirements and design before starting any implementation. With AI-assisted development, it's the same: If you want the AI to understand what you want and respect several rules, &lt;strong&gt;you need to specify things upfront&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You also need to make sure to properly design what you want&lt;/strong&gt;: how it integrates in the project architecture, what the inputs and outputs are, the business rules, etc.&lt;/li&gt;
&lt;li&gt;And last step before actually implementing anything, &lt;strong&gt;you need to plan the work for the AI not to diverge&lt;/strong&gt;. Having a detailed list of tasks will avoid the back-and-forth we mentioned before and will permit the AI to stay focused.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If you don't know it yet, I encourage you to have a look at &lt;a href="https://medium.com/r/?url=https%3A%2F%2Fkiro.dev%2F" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt;, a VS Code-based IDE with built-in "&lt;a href="https://medium.com/r/?url=https%3A%2F%2Fkiro.dev%2Fdocs%2Fspecs%2Fconcepts%2F" rel="noopener noreferrer"&gt;spec-driven development&lt;/a&gt;". Kiro will help you define your requirements, produce detailed specifications, design documents and task lists.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  2. Providing insufficient context
&lt;/h2&gt;

&lt;p&gt;You can see this pitfall as an extension of the previous one. Most often, specifications and design aren't enough. Without well-defined rules, the AI assistant will produce code that doesn't respect your standards, system architecture, or constraints.&lt;/p&gt;

&lt;p&gt;You will have to repeat things you've already said (when I started, I had to repeat each time to put &lt;code&gt;import&lt;/code&gt;s on top of Python files). You will have to ask for changes in the code again and again to generate the desired code. At that point, it would have been faster to write it yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommendations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Not only do you need to define your requirements, but also the constraints to work with&lt;/strong&gt;: architecture, compliance, coding standards, non-functional requirements, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create steering documents to specify your constraints&lt;/strong&gt;. Invest in them early. Treat them as code, share them in the repo. You can eventually share them more broadly at the enterprise level. Make them live, update them when the AI makes mistakes (just like you'd update unit tests to ensure non-regression after a bug fix).&lt;/li&gt;
&lt;li&gt;In addition to steering docs, you can give the assistant access to broader information (enterprise policies, security guidelines, compliance rules, etc…) &lt;strong&gt;from &lt;a href="https://medium.com/r/?url=https%3A%2F%2Fawslabs.github.io%2Fmcp%2Fservers%2Fbedrock-kb-retrieval-mcp-server" rel="noopener noreferrer"&gt;a Knowledge base through MCP tools or structured docs&lt;/a&gt;&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make your code "discoverable"&lt;/strong&gt;. Models perform better when projects have a clear structure, consistent naming, great conventions. Note this isn't just for AI, it's just a general good practice, even for humans to maintain a project.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;In Kiro, you can use steering documents to provide rules and constraints. You can also plug MCP servers, for the assistant to retrieve additional information, either third-party ones (AWS for example) or your own.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  3. Amassing too much context
&lt;/h2&gt;

&lt;p&gt;On the contrary, sometimes, you provide the assistant too much context: a very big codebase, lots of MCP servers, and an endless conversation about many different topics. The session accumulates many partially related requirements, refactors, bug fixes, and side explorations. Over time, the AI starts mixing up domains and quality gradually degrades.&lt;/p&gt;

&lt;p&gt;Some tools provide auto-summarization/compaction of the context, which can help, but it doesn't reliably reconstruct a clean, task-oriented context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommendations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Have one major task per session&lt;/strong&gt;: treat each significant feature, bug, or refactor independently as a separate AI session. For each session, provide only the required assets (code, specs, steering docs, MCP servers) - no more. &lt;strong&gt;Clear or restart the session when moving to the next task&lt;/strong&gt; or if you start to say "ignore this" to the assistant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduce the scope of the requirement&lt;/strong&gt;, don't ask the AI to build a complete application. Instead, focus on a specific feature or even smaller: a specific service. This will avoid the AI going into many directions, generating tons of poor and useless code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capture decisions in documentation or steering documents&lt;/strong&gt;, not endless conversation. These assets will be saved in the repo while conversation will be lost at some point. As the context grows, the LLM will have issues "remembering" and applying everything discussed. Instead, you (and your colleagues) will be able to leverage steering docs in future sessions. It's also a great way to document the project.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Kiro recently introduced the notion of "&lt;a href="https://medium.com/r/?url=https%3A%2F%2Fkiro.dev%2Fdocs%2Fpowers%2F" rel="noopener noreferrer"&gt;powers&lt;/a&gt;". Powers permit to load MCP servers dynamically, based on user query, rather than preloading all of them and filling the context with hundreds of tool definitions.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;For steering documents, Kiro permits to specify an "&lt;a href="https://medium.com/r/?url=https%3A%2F%2Fkiro.dev%2Fdocs%2Fsteering%2F%23inclusion-modes" rel="noopener noreferrer"&gt;inclusion mode&lt;/a&gt;" to define when to load them, and thus avoid adding all steerings to the context:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;always (for common best practices and rules)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;based on the presence of certain files (for example "&lt;code&gt;components/**/*.tsx&lt;/code&gt;" for React components)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;manual (using the #steering-file-name)&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  4. Treating AI as a developers-only tool
&lt;/h2&gt;

&lt;p&gt;Unfortunately, today, these tools generally land only in the hands of developers. Product owners still write fuzzy user stories, architects produce PowerPoint and wiki pages, security arrives at the end with a PDF of "blocking issues", and QA becomes the new bottleneck, flooded by tens of new features released by the developers.&lt;/p&gt;

&lt;p&gt;It's not new, organization has always been a factor of failure or success. Agile tried to gather people and competencies to build multi-skilled autonomous teams. DevOps tried to reduce the distance between developers and ops. But with the developers' (expected) increased velocity, it will become even more critical. AI becomes central here - not leading, but supporting the process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommendations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;As discussed above, the context provided to the AI is key, to get the right product and the good code. And most of &lt;strong&gt;the actors listed above must participate in generating this context: specifications and validation (PO, business, QA), design (architects), security and compliance (security and architects)&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;But it's not just about giving markdown files to the developer instead of PowerPoints or Word documents to be more AI-readable. It's about &lt;strong&gt;using AI themselves&lt;/strong&gt;, in close collaboration with developers and other actors to provide the best possible input: 

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Product owners, business, QA should now participate in the requirements generation with the developers&lt;/strong&gt;. Keep an iterative approach and continuously refine them during sprints. Notice that sprints might be shorter than before, one week or even less. &lt;/li&gt;
&lt;li&gt;During the specification phase, prepare the validation by &lt;strong&gt;creating BDD scenarios ("Given / When / Then") that everyone signs off before any code is generated&lt;/strong&gt;. Leveraging these scenarios and BDD frameworks like Cucumber will drastically help QA validation. I will cover the "test" topic in the next part.&lt;/li&gt;
&lt;li&gt;During the design phase, developers can certainly design their software, with the help of well-crafted steering documents, but when integrating with other system parts, or handling non-functional requirements, &lt;strong&gt;architects and security should join developers to ensure alignment&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;During all these steps, &lt;strong&gt;AI generates content, but it's the teams' responsibility to review, enrich, correct it, to produce the most accurate context, free of ambiguity&lt;/strong&gt;, for the development phase.&lt;/li&gt;
&lt;li&gt;With this "tool" comes a new organization. Don't throw away what you built with agile, reinforce it: &lt;strong&gt;build a strong product-oriented, multi-skilled team, co-build and share the same AI context&lt;/strong&gt; (specs, design, steering docs, tests) and &lt;strong&gt;co-own the result&lt;/strong&gt;. AI becomes an assistant of the team, not just of the developers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5. Testing after
&lt;/h2&gt;

&lt;p&gt;We briefly touched on tests in the previous part. AI is quite good at generating tests… that pass its own generated code. You will tell me it's still better than what most teams do: no test at all. The problem is if we forgot a business rule in the code, it will be also absent from tests. Tests confirm potential biases in the code, they don't challenge them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommendations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Write tests before writing code, between the specification/design and development phases&lt;/strong&gt;. It's not new, this is called &lt;a href="https://medium.com/r/?url=https%3A%2F%2Fwww.amazon.com%2Fgp%2Fproduct%2F0321146530%2Fref%3Das_li_tl%3Fie%3DUTF8%26camp%3D1789%26creative%3D9325%26creativeASIN%3D0321146530%26linkCode%3Das2%26tag%3Dmartinfowlerc-20" rel="noopener noreferrer"&gt;Test-Driven Development (TDD)&lt;/a&gt;. This practice, as good as it is, was rarely adopted except by a few purists and craftsmen. The mental model is very different, writing tests without code feels counterintuitive. But now with AI, you can generate them easily. This ensures all the requirements are backed by one or more tests, before starting the development. Later, it validates the AI-generated code against those tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write tests even earlier, during specification&lt;/strong&gt;. These tests, written with a specific language (&lt;a href="https://medium.com/r/?url=https%3A%2F%2Fcucumber.io%2Fdocs%2Fgherkin%2Freference" rel="noopener noreferrer"&gt;&lt;strong&gt;Gherkin&lt;/strong&gt;&lt;/a&gt;) can then be "implemented" (with Cucumber for example) to test the application. These "executable specifications", can be used by QA to validate the behavior of the application, it is called &lt;a href="https://dev.to**url**"&gt;Behavior Driven Development (BDD)&lt;/a&gt;. Gherkin/BDD exist for a very long time, but like TDD, the practice did not spread as it should have. POs/QA often struggle to write good Gherkin and developers struggle with the glue code. But AI can solve both:

&lt;ol&gt;
&lt;li&gt;Generate Gherkin specifications with the help/review of POs and QA during specification. &lt;/li&gt;
&lt;li&gt;Write the glue code (with Cucumber or the framework of your choice) during development. 
POs and QA get instant validation and avoid post-development bottlenecks.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  6. Over-trusting AI-generated code
&lt;/h2&gt;

&lt;p&gt;As we saw, providing good context and writing tests first to get better AI-generated code is primordial. But that's still not enough! What? You thought that leveraging AI, you'd have almost nothing to do?! You thought "developer" was a prehistoric job already?! Not at all!&lt;/p&gt;

&lt;p&gt;Even with solid specs, steering docs, and TDD/BDD in place, AI can still hallucinate, miss requirements, forget some edge cases or introduce unexpected changes in your codebase. Your job now is to control what gets in (context and tests) and what gets out (review and guardrails).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommendations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Humans always review AI-generated code&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Not just review, but &lt;strong&gt;understand the generated code&lt;/strong&gt;. If you can't explain a function to your colleagues during PR review, don't commit it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human review alone is not enough. Enforce automated guardrails&lt;/strong&gt;: 

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD quality gates&lt;/strong&gt;: minimum test coverage, linter, security scans, dependency checks, performance tests, documentation, no hardcoded secrets, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Staged rollout&lt;/strong&gt; (e.g. 10% every 10 minutes) and automatic rollback.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual gates for critical paths&lt;/strong&gt; (auth, payment, external integrations), keep human in the loop, AI cannot decide these alone.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Don't forget: &lt;strong&gt;you're the owner of the code&lt;/strong&gt;, even if your IDE generated it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcuzc80z3s86rzemqgov.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcuzc80z3s86rzemqgov.jpg" alt=" " width="500" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>sdlc</category>
      <category>softwaredevelopment</category>
      <category>kiro</category>
    </item>
    <item>
      <title>Anonymize your data using Amazon S3 Object Lambda</title>
      <dc:creator>Jérôme Van Der Linden</dc:creator>
      <pubDate>Fri, 09 Jul 2021 14:59:12 +0000</pubDate>
      <link>https://dev.to/aws-ch/anonymize-your-data-using-amazon-s3-object-lambda-19ba</link>
      <guid>https://dev.to/aws-ch/anonymize-your-data-using-amazon-s3-object-lambda-19ba</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzo8r9dv6t778mnodl76o.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzo8r9dv6t778mnodl76o.jpg" alt="Privacy please"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Personal_data" rel="noopener noreferrer"&gt;Personal Data&lt;/a&gt; in Europe, Personal Identifiable Information (PII) in the US, Client Identifying Data (CID) here in Switzerland, … Whatever the name you give it, the definition is slightly the same: it defines a category of information about an individual that can be used to unambiguously distinguish or trace his/her identity.&lt;br&gt;
For example, passport numbers, social security numbers, IBAN or biometric records, known as direct identifying data, clearly identify an individual. Full names, addresses, phone numbers, dates of birth or emails can also be used to identify someone. However, as they can be shared by several people, you need to combine them to explicitly identify an individual - we say they are indirect identifying data.&lt;br&gt;
This data, when maintained by a company, especially highly regulated one (Financial Services, Healthcare, …) or governing agency must comply with security standard and compliance certifications (GDPR, HIPAA, FINMA circulars, …). These certifications require this kind of data to be highly protected, from public leakage of course, but also internally from your own employees.&lt;br&gt;
In this article, I will mainly focus on this second point. Today more than ever, data is key is to take appropriate decisions, create new services or improve existing ones. And if you want to share data internally, in order to build clever solutions, leveraging analytics and machine learning for example, you need to keep control on that data and ensure it remains compliant with aforementioned certifications.&lt;/p&gt;
&lt;h2&gt;
  
  
  Anonymization / pseudonymization
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Anonymization&lt;/strong&gt; or &lt;strong&gt;pseudonymization&lt;/strong&gt; are some of the technics commonly adopted to do protect some data. In both case, you want to remove the ability to identify someone and more important the link to his personal information (financial, health, preferences…), while keeping the data practically useful. Anonymization consists in removing any direct (and part of indirect) identifying data. Pseudonymization does not remove these information but modify them so that we cannot make a link with the original individual.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh4c2s3zw4s6be35dv95j.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh4c2s3zw4s6be35dv95j.jpg" alt="Anonymous face"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Multiple papers, algorithms (&lt;a href="https://en.wikipedia.org/wiki/K-anonymity" rel="noopener noreferrer"&gt;k-anonymity&lt;/a&gt;) and technics exist to perform anonymization and pseudonymization. AWS also provides 2 functions — available in the Serverless Application Repository (SAR) — that use &lt;a href="https://amzn.to/3ke6Frl" rel="noopener noreferrer"&gt;Amazon Comprehend and its ability to detect PII&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1og9x8zc8k9e3ctzexv7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1og9x8zc8k9e3ctzexv7.png" alt="Amazon Comprehend Pii detection"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On my side, as the input file is pretty straightforward, I don’t need Comprehend to detect sensible information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7h53d8wfiecfp4yqbt9l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7h53d8wfiecfp4yqbt9l.png" alt="Data with Pii"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is my (naive) approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove any (identifying) field that is not useful to the downstream process. In my example, the SSN (social security number) is clearly useless for a data analytics application or to perform machine learning. Same thing for the phone number, address and name.&lt;/li&gt;
&lt;li&gt;Remove some precision, by extracting only the meaningful part. For example, we don’t need the exact date of birth, an age may be enough.&lt;/li&gt;
&lt;li&gt;If for any reason, we need to keep some identifying fields, then we must pseudonymize them. For example, we can replace the name with another, randomly generated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After this process, we should end up with the following information, clear from any identifying information (names have been replaced):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe33tdwd0qrzh7xs0tpt7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe33tdwd0qrzh7xs0tpt7.png" alt="Anonymized data"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now that we know what we want to do, let’s see it in the context of our workload.&lt;/p&gt;
&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;We have 3 main components in our workload:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A confidential application, that deal with these data, used by doctors and other medical staff. In that case, the data is not anonymized.&lt;/li&gt;
&lt;li&gt;A storage area (Amazon S3), where the data is kept as CSV files for further analytics. Raw data (with identifying information) is kept and protected with appropriate policies.&lt;/li&gt;
&lt;li&gt;Another application, used to perform some analytics on this data (without identifying information). Actually, there could be many more applications like this with each their specific requirements and compliance rules.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To provide anonymized data to these applications, we have several options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create and maintain as many copies as there are applications with different requirements so that each one has its own version of the data.&lt;/li&gt;
&lt;li&gt;Build and manage a proxy layer with additional infrastructure, so that you can manage this anonymization process between S3 and the target application.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both options add complexity and costs. So this is were I introduce &lt;a href="https://amzn.to/3AJK7V4" rel="noopener noreferrer"&gt;S3 Object Lambda&lt;/a&gt;, a capability recently announced by AWS and that will actually act as this proxy. Except that you don’t have to manage any infrastructure, just your Lambda function(s).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdcutk7hli9v8ggdy2eun.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdcutk7hli9v8ggdy2eun.gif" alt="Architecture"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;Let’s implement this solution. First thing to do is to create a Lambda function. To do so, use your preferred framework (SAM, Serverless, CDK, …). I use SAM and my function is in Python 3.8.&lt;/p&gt;

&lt;p&gt;The function must have permission to &lt;code&gt;WriteGetObjectResponse&lt;/code&gt;, in order to provide the response to downstream application(s). Note this is not in the s3 namespace but s3-object-lambda:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="s2"&gt;"s3-object-lambda:WriteGetObjectResponse"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Sid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"WriteS3GetObjectResponse"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;And here is the code of my function (commented to understand the details):&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;My Lambda function is really simple and if you would like to get something more production-ready, I encourage you to have a look at the AWS samples, mentioned above.&lt;/p&gt;

&lt;p&gt;Once the function is created and deployed, we need to &lt;a href="https://amzn.to/2TUZrh1" rel="noopener noreferrer"&gt;create an Access Point&lt;/a&gt;. &lt;a href="https://amzn.to/36p8mtH" rel="noopener noreferrer"&gt;Amazon S3 Access Points&lt;/a&gt; simplify managing data access for applications using shared data sets on S3, exactly what we want to do here. Using the AWS CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3control create-access-point &lt;span class="nt"&gt;--account-id&lt;/span&gt; 012345678912 &lt;span class="nt"&gt;--name&lt;/span&gt; anonymized-access &lt;span class="nt"&gt;--bucket&lt;/span&gt; my-bucket-with-cid
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Then we create the &lt;a href="https://amzn.to/3hQS426" rel="noopener noreferrer"&gt;Object Lambda Access Point&lt;/a&gt;. It will make the Lambda function act as a proxy to your access point. To do so with the AWS CLI, we need a JSON file. Be sure to replace with your account id, region, access point name (previously created) and function ARN:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;Finally, we &lt;a href="https://amzn.to/3e0v2V8" rel="noopener noreferrer"&gt;create the Object Lambda Access Point&lt;/a&gt; using the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3control create-access-point-for-object-lambda &lt;span class="nt"&gt;--account-id&lt;/span&gt; 012345678912 &lt;span class="nt"&gt;--name&lt;/span&gt; anonymize-lambda-accesspoint &lt;span class="nt"&gt;--configuration&lt;/span&gt; file://anonymize-lambda-accesspoint.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that’s it! You can now test your access point and the anonymization process with a simple get. Note that you don’t perform a get directly on the S3 bucket, but on the access point previously created, using its ARN, just like that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3api get-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; arn:aws:s3-object-lambda:eu-central-1:012345678912:accesspoint/anonymize-lambda-accesspoint &lt;span class="nt"&gt;--key&lt;/span&gt; patients.csv ./anonymized.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can now provide this access point ARN to the analytics application so it can retrieve anonymized data and perform whatever it needs to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, I’ve shared how to leverage S3 Object Lambda in order to anonymize your data. In just a few commands and a bit of code, we can safely share data containing identifying information with other applications without duplicating it or building a complex infrastructure.&lt;/p&gt;

&lt;p&gt;Note that you can use the same technology to enrich some data (retrieving information in a database), or modify it on the fly (eg. image resizing), or modifying the format (eg. xml to json, csv to parquet, …), and I guess you will find some usage too.&lt;/p&gt;

&lt;p&gt;The code of this article is available in &lt;a href="https://bit.ly/36u9yMc" rel="noopener noreferrer"&gt;github&lt;/a&gt;, together with a full &lt;a href="https://bit.ly/3wuHG5s" rel="noopener noreferrer"&gt;sam template&lt;/a&gt; to create everything (bucket, access points and Lambda function).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Photos by &lt;a href="https://unsplash.com/@jdent?utm_source=dev.to&amp;amp;utm_medium=referral"&gt;Jason Dent&lt;/a&gt; and &lt;a href="https://unsplash.com/@tar1k?utm_source=dev.to&amp;amp;utm_medium=referral"&gt;Tarik Haiga&lt;/a&gt; on &lt;a href="https://unsplash.com/?utm_source=dev.to&amp;amp;utm_medium=referral"&gt;Unsplash&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>aws</category>
      <category>cloud</category>
      <category>serverless</category>
    </item>
  </channel>
</rss>
