<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dakota Riley</title>
    <description>The latest articles on DEV Community by Dakota Riley (@dakotariley).</description>
    <link>https://dev.to/dakotariley</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1507534%2F929f2dd6-a2f5-4980-80bd-ed4add767ca5.jpeg</url>
      <title>DEV Community: Dakota Riley</title>
      <link>https://dev.to/dakotariley</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dakotariley"/>
    <language>en</language>
    <item>
      <title>Building an AWS GuardDuty Alert Triage Agent</title>
      <dc:creator>Dakota Riley</dc:creator>
      <pubDate>Sun, 27 Jul 2025 23:28:29 +0000</pubDate>
      <link>https://dev.to/aws-builders/building-an-aws-guardduty-alert-triage-agent-51e9</link>
      <guid>https://dev.to/aws-builders/building-an-aws-guardduty-alert-triage-agent-51e9</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;I've been interested in exploring the application of AI Agents to security automation use cases for a while, so I built one.&lt;/p&gt;

&lt;p&gt;AWS GuardDuty is a threat detection service in AWS that produces alerts (called findings) for hundreds of behaviors in AWS that could be considered malicious. From a detection perspective, Cloud APIs are essentially a giant LOLBin-as-a-Service: pretty much everything has a legitimate use case, and most alerts require additional contextualization and investigation to be worth acting on. &lt;/p&gt;

&lt;h2&gt;
  
  
  What makes an AI Agent?
&lt;/h2&gt;

&lt;p&gt;AI Agents have been all the hype lately, but what actually are they? In short - most of the foundational models support capability beyond just sending text and getting a text response back, which helps with building applications around them, such as:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured outputs&lt;/strong&gt; - this lets you ask the LLM to return its response matching a provided JSON Schema, which is very useful for programmatically consuming the outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool usage&lt;/strong&gt; - you can provide the LLM with a set of "tools", which allow it to request that external code or functionality be executed. This is typically implemented in the form of functions in your agent's code that are called upon request, and the response is returned to the LLM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Context Protocol (MCP) Integrations&lt;/strong&gt; - arguably this counts as tool-calling, but MCP provides a standardized way for services/external systems to advertise capabilities to LLM agents, instead of having to write tool functions in the agent's code itself. &lt;/p&gt;

&lt;h2&gt;
  
  
  Building The Agent!
&lt;/h2&gt;

&lt;p&gt;Tech I used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ai.pydantic.dev/" rel="noopener noreferrer"&gt;PydanticAI&lt;/a&gt;&lt;/strong&gt;: Library for building the agent. There are multiple frameworks out there however.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://pydantic.dev/logfire" rel="noopener noreferrer"&gt;Pydantic Logfire&lt;/a&gt;&lt;/strong&gt;: Observability offering for monitoring your agent's interactions with a generous free tier, powered by oTel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discord&lt;/strong&gt;: Somewhere nice to send the output to&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS GuardDuty&lt;/strong&gt;: Nice but noisy alert generation service 😄&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Foundation model APIs&lt;/strong&gt; (GPT, Claude, etc)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/DataDog/stratus-red-team" rel="noopener noreferrer"&gt;Stratus Red Team&lt;/a&gt;&lt;/strong&gt;: Detonate different TTPs to trigger GuardDuty findings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Getting started with PydanticAI is pretty straightforward. I wanted to define my structured outputs first. I created an AlertAssessment class that represented what format I want the agent to return the output in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AlertAssessment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;GuardDutyAlertTitle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;GuardDutyAlertDescription&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;Conclusion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AlertConclusion&lt;/span&gt;
    &lt;span class="n"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;AlertTimeline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;AlertTimelineEvent&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A chronological list of events that occurred during the alert. This should include resource creation, activities taken by the calling identity, etc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ActionLog&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;InvestigationAction&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A chronological log of investigation actions taken to assess this alert. Include guardduty searches, cloudtrail searches, etc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;RelatedAlerts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RelatedAlert&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Other GuardDuty alerts that may be related to this alert based on timing, resources, or actors involved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;Evidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;EvidenceItem&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A list of evidence items that support the conclusion of the alert. This should include guardduty searches, cloudtrail events IDs, resource arns, etc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The field descriptions are actually included in the JSONSchema provided to the agent, and useful in giving context in what actually belongs in those fields. I defined an AlertConclusion enum with a set of allowed values. I want the agent to classify the alert as one of the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AlertConclusion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;StrEnum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;MALICIOUS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;malicious&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;RED_TEAM_ACTIVITY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;red_team_activity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;NON_MALICIOUS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;non_malicious&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;GENERATED_FINDING&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generated_finding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;INCONCLUSIVE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inconclusive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instantiating an agent is relatively easy and consistent across most of these LLM agent frameworks. Note that we reference our defined model for AlertAssessment in the &lt;code&gt;output_type&lt;/code&gt; field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AnthropicModel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AnthropicModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-3-5-sonnet-20241022&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instrument&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;AlertAssessment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UserInquiry&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;allow_arbitrary_tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;You are a cloud security expert triaging AWS GuardDuty alerts. &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Retrieve and assess the specified alert using provided tools. &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Skip generated findings (fields starting with &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;). &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
     &lt;span class="bp"&gt;...&lt;/span&gt;
     &lt;span class="bp"&gt;...&lt;/span&gt;
     &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I want to give the LLM some tools to help with triaging the alerts. The agent should be able to retrieve the GuardDuty alert in full, make queries to CloudTrail, obtain metadata about resources, and search GuardDuty to find related findings. PydanticAI allows you to define python functions as tools by just adding a decorator to the top of your desired functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@agent.tool_plain&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_guardduty_alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;finding_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@agent.tool_plain&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_cloudtrail_events_for_resource_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resource_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;startTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;endTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; 
  &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@agent.tool_plain&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_cloudtrail_events_for_identity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;startTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;endTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
  &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="nd"&gt;@agent.tool_plain&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_guardduty_findings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;criteria_field&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;criteria_values&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;since&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
  &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not shown are the function docstrings, which like the field descriptions are also included to the LLM in tool definitions and help the LLM better select when and how to use them.&lt;/p&gt;

&lt;p&gt;While developing, I iterated locally by just calling my bot from discord to triage selected GuardDuty finding types and leaned on PydanticAI to dive into problematic runs. This was particularly useful because I saw the tool calls, arguments, results, and the token usage without having to manually instrument the code:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flgtlw07deg1j5cec71k7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flgtlw07deg1j5cec71k7.png" alt=" " width="773" height="617"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Summarizing what I have built:
&lt;/h3&gt;

&lt;p&gt;A user in a Discord channel can invoke the bot using &lt;code&gt;!triage&lt;/code&gt; with a command to triage a specific GuardDuty finding "Triage the latest GuardDuty alert", or "Triage GuardDuty finding 2139f904fk902kf302" both work.&lt;/p&gt;

&lt;p&gt;The Agent will retrieve the specific GuardDuty finding using its &lt;code&gt;get_guardduty_alert&lt;/code&gt; and &lt;code&gt;search_guardduty_findings&lt;/code&gt; tools.&lt;/p&gt;

&lt;p&gt;The Agent will use its tools for searching CloudTrail, GuardDuty for related findings, and retrieving metadata about resources (IAM Roles, Users, EC2 Instances) to gain additional context around the alert. Quick note that I used the CloudTrail &lt;a href="https://docs.aws.amazon.com/awscloudtrail/latest/APIReference/API_LookupEvents.html" rel="noopener noreferrer"&gt;LookupEvents&lt;/a&gt; API here instead of a SIEM to keep it simple. &lt;/p&gt;

&lt;p&gt;Finally, the agent will reply with a halfway decent looking structured response in a &lt;a href="https://discordjs.guide/popular-topics/embeds.html#embed-preview" rel="noopener noreferrer"&gt;Discord Embed&lt;/a&gt;, and gives the user some clickable buttons to choose to escalate or discard the alert:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfflft1ocic86w66tm58.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfflft1ocic86w66tm58.png" alt=" " width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting it to the test
&lt;/h2&gt;

&lt;p&gt;First, I grabbed one of the IAM Anomalous Behavior GuardDuty findings, generated by myself during normal activity (pretty sure I triggered this browsing around one of my AWS accounts...)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: &lt;code&gt;!triage the latest anomalous access finding involving the IAM User dakota-macbook-aquia&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flzxpogfudgsi7bu69468.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flzxpogfudgsi7bu69468.png" alt=" " width="537" height="869"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Okay, so not terrible. The assessment of &lt;code&gt;non_malicious&lt;/code&gt; was correct, and there really wasn't much else to indicate anything about this session being malicious. We also told the agent in the system prompt to bias towards a non_malicious assessment, unless it can prove otherwise.&lt;/p&gt;

&lt;p&gt;Next, I used the &lt;code&gt;aws guardduty create-sample-findings&lt;/code&gt; CLI command to generate a fake finding for an EC2 Instance communicating with a Tor entry node:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: &lt;code&gt;!triage 362af2710dce4ce294c09e2034092ae4&lt;/code&gt; (The direct finding ID to make it easy)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqce565qqdrav87756yw8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqce565qqdrav87756yw8.png" alt=" " width="534" height="713"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The agent here did follow its instructions to not attempt further triage by tool calling on generated findings (therefore saving tokens and money). Also, it was right about the &lt;a href="https://www.google.com/search?q=198.51.100.0+guardduty+site%3Aamazon.com" rel="noopener noreferrer"&gt;particular IP range appearing all over the AWS documentation &lt;/a&gt;, neat!&lt;/p&gt;

&lt;p&gt;Finally, I used Stratus Red Team to detonate the &lt;a href="https://stratus-red-team.cloud/attack-techniques/AWS/aws.credential-access.ec2-steal-instance-credentials/" rel="noopener noreferrer"&gt;Steal EC2 Credentials&lt;/a&gt; technique, which triggers &lt;a href="https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_finding-types-iam.html#unauthorizedaccess-iam-instancecredentialexfiltrationoutsideaws" rel="noopener noreferrer"&gt;&lt;strong&gt;UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration.OutsideAWS&lt;/strong&gt;&lt;/a&gt;. I figured this one will be fun because it could technically be classified as Malicious or Red Team Activity. I asked our GuardDuty agent friend to triage it: &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: &lt;code&gt;!triage aacbb621e81baa25d482ac989736d09f&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5l73kqhc1onw5xn5edb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5l73kqhc1onw5xn5edb.png" alt=" " width="533" height="906"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is a lot to take in here. The agent classified it as &lt;code&gt;red_team_activity&lt;/code&gt;, which is correct in this scenario. It explicitly calls out the Terraform user agent (which is used by Stratus under the hood for infrastructure provisioning), as well as the naming of the Instance Profile &lt;code&gt;stratus-red-team-ec2-steal-credentials-role&lt;/code&gt;, and tag applied to the instance of &lt;code&gt;StratusRedTeam: true&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;The most impressive observation made was that the IP Address that created the EC2 Instance, and the one that used the stolen credentials, was the same one. &lt;/p&gt;

&lt;p&gt;One thing I did notice is I was able to get a consistent classification for GuardDuty alerts on subsequent runs, but the reasoning and information would vary from session to session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design thoughts and iterations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Generic Tool Functions vs Specific
&lt;/h3&gt;

&lt;p&gt;At first, I tried to use very generic tool definitions, for example: &lt;code&gt;search_cloudtrail&lt;/code&gt; was a tool I implemented that wrapped the AWS CloudTrail LookupEvents API 1:1. The Agent actually really struggled with using the LookupEvents API, submitting the incorrect fields to arguments. I instead abstracted this away into more specific query functions: &lt;code&gt;get_cloudtrail_events_for_resource_name&lt;/code&gt;, &lt;code&gt;get_cloudtrail_events_for_identity&lt;/code&gt;, and it had much more success. &lt;/p&gt;

&lt;p&gt;There are several AWS MCP offerings that expose AWS APIs to LLMs and agents, but this is a reason I avoided it here: I didn't want to additionally overload the context more than was needed, and most of the MCP server implementations had the same issue as my &lt;code&gt;search_cloudtrail&lt;/code&gt; tool above: too generic. &lt;/p&gt;

&lt;p&gt;Part of me wonders if my usage of the &lt;code&gt;LookupEvents&lt;/code&gt; API itself was a cause of this, as the arguments for the API are a bit unintuitive, and most security folks out there will be using some sort of SIEM, and not this API directly (meaning there probably wasn't tons of training data on this). &lt;/p&gt;

&lt;h3&gt;
  
  
  Dynamically selecting a markdown playbook and adding that to the context
&lt;/h3&gt;

&lt;p&gt;I had this idea of storing alert playbooks as code (markdown), accessible to the LLM, giving it a function to list all available playbooks, and allowing it to select the appropriate one based on the alert. One could even dynamically provide different sets of tools based on the alert/playbook to help limit the amount of unneeded context. Sadly, this landed in the "for next time" category. &lt;/p&gt;

&lt;h3&gt;
  
  
  Deterministic vs letting the LLM handle things
&lt;/h3&gt;

&lt;p&gt;In my code, I have an &lt;code&gt;ActionLog&lt;/code&gt; field that allows the LLM to provide the actions it has taken in triaging the alert. If I were to write this again, I would instead make this deterministic via a decorator on all tool functions that is returned with the LLMs response. I also had some challenges with dates being extracted from the LLM, but the timezone not being provided, creating a confusing response for all. &lt;/p&gt;

&lt;p&gt;I would also likely include the raw outputs of log searches in the response to allow the end user to QA/easily dig deeper if needed.&lt;/p&gt;

&lt;p&gt;I didn't really come away with a great answer, but I think recognizing that not everything has to be handled by the LLM and deterministic code still works just fine for a lot of things is a solid takeaway. Engineering is but a series of tradeoffs, right? &lt;/p&gt;

&lt;h3&gt;
  
  
  Pruning unneeded data
&lt;/h3&gt;

&lt;p&gt;I had a fun troubleshooting session where I gave the LLM a tool for looking up CloudTrail events, and it would eat up all of its context from the raw results of those queries across a large date range. I ended up adding logic to the Event lookup tools to redact the data down to only the fields that were critical (UserName, UserAgent, EventName, Time, and EventId). If I had been using an actual SIEM, this probably would have been easier, but the challenge of sending only the relevant data to the LLM and preserving valuable context would absolutely still exist. &lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;No, I don't think we will be replacing skilled analysts anytime soon with an army of LLM agents. Human in the loop is going to be here for a long time. Even then, who will build the plumbing for all of that? &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Real environments are dirty, undocumented, and disorganized snowflakes. This was my personal AWS Lab environment, which is much more simplistic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On the more positive: I do feel LLMs have a lot of utility for those who are building SOAR automations. Tons of effort is spent on parsing out specific fields and using that to select the correct actions to take. Instead of that, we could build playbooks that can handle multiple classes of alerts, are resistant to upstream schema modifications (when data source formats change), and can even handle scenarios not previously identified. Using LLMs to route and tie together the correct deterministic actions for an alert to better help a human make an escalation decision is where I see this going. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There is LOTS of engineering to be had: managing what should be deterministic vs what the LLM handles, context windows, data engineering, cost management, etc. Turns out your AI Agent is just.... GASP.... software. Oh, and you can even write tests for your agent using &lt;a href="https://ai.pydantic.dev/evals/" rel="noopener noreferrer"&gt;evals&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I am excited and optimistic about the future here. I also had a blast building this, and am going to continue to iterate on it. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Some people say LLMs/GenAI is overhyped, and others say it is an incredibly useful technology. What if it is both?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>security</category>
      <category>aws</category>
      <category>cloud</category>
      <category>genai</category>
    </item>
    <item>
      <title>Tactical Cloud Audit Log Analysis with DuckDB - AWS CloudTrail</title>
      <dc:creator>Dakota Riley</dc:creator>
      <pubDate>Mon, 20 May 2024 22:14:23 +0000</pubDate>
      <link>https://dev.to/aws-builders/tactical-cloud-audit-log-analysis-with-duckdb-aws-cloudtrail-2amk</link>
      <guid>https://dev.to/aws-builders/tactical-cloud-audit-log-analysis-with-duckdb-aws-cloudtrail-2amk</guid>
      <description>&lt;h2&gt;
  
  
  Using DuckDB to query Cloud Provider audit logs when you don't have a SIEM available.
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Just want the code? Check out my gist &lt;a href="https://gist.github.com/rileydakota/da5f4e51d6b0f32867324420276d77a0"&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;More than once, I have been in a situation where I needed to query CloudTrail logs but was working in a customer environment where they weren’t aggregated to a search interface. Another similar situation is when CloudTrail data events are disabled for cost reasons but need to be temporarily turned on for troubleshooting/audit purposes. While the CloudTrail console offers some (very) limited lookups (for management events only), and Athena is an option, what about &lt;a href="https://duckdb.org"&gt;DuckDB&lt;/a&gt;? &lt;/p&gt;

&lt;p&gt;DuckDB offers both the ability to retrieve directly from S3, as well as parse JSON files into queryable tables. This blog is my documentation of working through that process! This blog assumes you already have DuckDB installed, if not, start &lt;a href="https://duckdb.org/docs/installation/?version=stable&amp;amp;environment=cli&amp;amp;platform=macos&amp;amp;download_method=package_manager"&gt;here&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;Start a DuckDB session. DuckDB can operate either fully in memory or utilize disk space to process datasets larger than your available memory. For Cloudtrail in a single account over a day, in memory should be fine, but we can use persistent storage mode to make sure our tables don't disappear when we exit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;duckdb cloudtrail-analysis
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, load the &lt;a href="https://duckdb.org/docs/extensions/aws.html"&gt;AWS extension&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;INSTALL&lt;/span&gt; &lt;span class="n"&gt;AWS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;LOAD&lt;/span&gt; &lt;span class="n"&gt;AWS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets you load AWS credentials from your CLI profiles a bit easier than the default workflow. We can load whatever credentials we have configured in our environment or AWS CLI profile using it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;load_aws_credentials&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before we go down the SQL Rabbit-hole, lets consider the structure of CloudTrail as it gets exported to S3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;cloudtrail_file.json&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;   
    &lt;/span&gt;&lt;span class="nl"&gt;"Records"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"eventVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.09"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"userIdentity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"IAMUser"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"principalId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"EXAMPLE6E4XEGITWATV6R"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"arn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::123456789012:user/Mary_Major"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"accountId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"123456789012"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
         &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Takeaway here is: the file is structured as a single JSON object, with a top level key of &lt;strong&gt;Records&lt;/strong&gt; that is an array containing our CloudTrail entries that we are after. We will need to explode the records out of that array into a table to make them useful. &lt;/p&gt;

&lt;p&gt;DuckDB’s &lt;strong&gt;read_json&lt;/strong&gt; function by default will attempt to determine the schema of JSON files, and adapt the column data types accordingly. CloudTrail entries have a few common top level fields but tend to be very dynamic when it comes to specific fields for that event (eg RequestParameters). We can use the &lt;strong&gt;maximum_depth&lt;/strong&gt; parameter on our read_json call to override this functionality.&lt;/p&gt;

&lt;p&gt;To avoid redownloading the files from S3 over and over again, we can use the CREATE TABLE … AS statement (aka CTAS in the SQL world) to create a table from our read_json query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;ct_raw&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;read_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'s3://org-cloudtrail-111122223333/AWSLogs/o-123456043/111122223333/CloudTrail/us-east-1/2024/05/19/*.gz'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gets us a table with a single column: Records with a data type of an array of JSON objects. Next, we can explode the list using &lt;strong&gt;&lt;a href="https://duckdb.org/docs/sql/query_syntax/unnest.html"&gt;unnest&lt;/a&gt;&lt;/strong&gt; to access the individual events:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ct_raw&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The JSON datatype allows us to to access the nested values using dot notation, which looks like this: &lt;code&gt;event.userIdentity.arn&lt;/code&gt;. While this can offer us some limited querying, when we want to utilize our columns in the WHERE statement, the JSON datatype isn't ideal. To finish, we can extract the keys we care about into separate columns using &lt;code&gt;json_extract_string&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;cloudtrail_events&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;   &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.eventVersion'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;eventVersion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.userIdentity.type'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;userType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.userIdentity.principalId'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;principalId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.userIdentity.arn'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;userArn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.userIdentity.accountId'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;accountId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.userIdentity.accessKeyId'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;accessKeyId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.userIdentity.userName'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;userName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.eventTime'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;eventTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.eventSource'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;eventSource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.eventName'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;eventName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.awsRegion'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;awsRegion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.sourceIPAddress'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sourceIPAddress&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.userAgent'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;userAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.errorCode'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;errorCode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.errorMessage'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;errorMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.requestParameters'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;requestParameters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.responseElements'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;responseElements&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.resources'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Query time!
&lt;/h2&gt;

&lt;p&gt;Some sample queries:&lt;/p&gt;

&lt;p&gt;All actions taken by a particular IAM Principal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;eventName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eventTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userAgent&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cloudtrail_events&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;arn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'REPLACE_ME'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All the unique error messages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;errorCode&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cloudtrail_events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get all events in the past 24 hours:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cloudtrail_events&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;eventtime&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Happy querying!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
      <category>cloudsecurity</category>
      <category>duckdb</category>
    </item>
  </channel>
</rss>
