DEV Community

Mike Anderson
Mike Anderson

Posted on

Building a Local AI SOC Analyst on an M1 MacBook Pro

How I solved a real SOC operations problem for Datadog, AWS, Cloudflare, Sysdig, PagerDuty with an AI runner, a local AI harness with a tricky model selection process

Executive Summary

We started with a practical SOC problem: build an AI-based SOC analyst that runs locally on an M1 MacBook Pro and helps with daily security operations across an existing cloud-native monitoring and alerting stack.

The environment already had strong telemetry and alerting coverage:

  • AWS CloudTrail
  • AWS Security Hub
  • Route53 VPC DNS Firewall
  • SES
  • SNS
  • Cloudflare logs
  • Application logs
  • GitHub audit logs crawler
  • Datadog Cloud Security detections
  • Datadog monitors for Kubernetes and AWS metrics
  • Datadog dashboards covering many SOC use cases
  • Sysdig runtime policies for Kubernetes
  • PagerDuty alert routing

The problem was not lack of logs or alerts. The real challenge was analyst workflow. The SOC still needed a repeatable way to review alerts, correlate evidence, summarize findings, identify missing context, and produce daily security notes without manually jumping between tools every time.

The working solution became a local AI SOC analyst pattern:

Ollama                Local model runner
llama3.2:3b           Stable default model for M1 daily SOC work
qwen3:8b              Optional larger model for focused deeper analysis
Python harness        SOC workflow, prompts, guardrails, and integrations
AI runner CLI         Analyst-facing command-line interface
Datadog               Primary log, signal, dashboard, and monitoring source
PagerDuty             Alert and incident routing source
Sysdig                Separate runtime policy signal source
Human analyst         Final decision authority
Enter fullscreen mode Exit fullscreen mode

The important lesson was that the model alone was not the solution. The working solution came from combining the right model, a controlled harness, bounded prompts, use-case-driven analysis, and realistic expectations about local MacBook hardware.


The Original Problem

The goal was to build a local AI-based SOC analyst on an M1 MacBook Pro.

The main telemetry flow looked like this:

AWS CloudTrail
AWS Security Hub
Route53 VPC DNS Firewall
SES
SNS
Cloudflare logs
Application logs
GitHub audit logs crawler
        |
        v
Datadog
        |
        v
Datadog Cloud Security rules
Datadog monitors
Datadog dashboards
        |
        v
PagerDuty
Enter fullscreen mode Exit fullscreen mode

Sysdig was separate:

Kubernetes runtime activity
        |
        v
Sysdig runtime policies
        |
        v
PagerDuty
Enter fullscreen mode Exit fullscreen mode

That distinction mattered. Datadog was the central place for logs, detections, monitors, and dashboards. Sysdig was not sending its logs to Datadog, so Sysdig alerts had to be treated as a separate runtime security signal path.

The expected solution was not a generic local chatbot. The expected solution was a repeatable local SOC assistant that could support:

  • Daily SOC review
  • Alert triage
  • CloudTrail analysis
  • AWS Security Hub finding review
  • Route53 DNS Firewall activity review
  • SES and SNS activity review
  • Cloudflare security event review
  • GitHub audit log review
  • Application log review
  • PagerDuty incident summarization
  • Sysdig runtime alert review
  • SOC note drafting
  • Recommended follow-up queries

Key Design Decision: AI Should Not Replace Detection

We made one important architectural decision early: the local AI model should not become the detector.

Datadog and Sysdig already perform that role:

  • Datadog receives logs and metrics.
  • Datadog Cloud Security rules generate security signals.
  • Datadog monitors detect operational and Kubernetes-related issues.
  • Sysdig runtime policies detect Kubernetes runtime policy violations.
  • PagerDuty routes alerts from Datadog and Sysdig.

The local AI should sit above those systems as a triage and analysis layer.

That means the AI helps answer:

  • What happened?
  • Which user, workload, IP, service, account, repository, or API was involved?
  • Is this likely malicious, expected change, duplicate, benign true positive, or false positive?
  • What evidence is missing?
  • Which Datadog queries should be run next?
  • Should this be escalated?
  • What should the SOC note say?
  • Is containment recommended, and does it require human approval?

This keeps the control boundary clean. Detection stays with Datadog and Sysdig. Alerting stays with PagerDuty. The local AI helps the analyst move faster, ask better questions, and document the investigation more consistently.


Final Architecture

The final working architecture was intentionally simple:

              +------------------------------+
              | AWS / Cloudflare / GitHub    |
              | Apps / SES / SNS / DNS FW    |
              +---------------+--------------+
                              |
                              v
                         +---------+
                         | Datadog |
                         | Logs    |
                         | Signals |
                         | Metrics |
                         | Monitors|
                         +----+----+
                              |
                              v
                         +---------+
                         |PagerDuty|
                         +----+----+

       +------------------+        +---------+
       | Sysdig Runtime   |------->|PagerDuty|
       | Policies         |        +---------+
       +------------------+

                              |
                              v

              +------------------------------+
              | Local AI SOC Analyst         |
              | M1 MacBook Pro               |
              |                              |
              | Ollama                       |
              | llama3.2:3b / qwen3:8b       |
              | Python SOC Harness           |
              | AI Runner CLI                |
              +------------------------------+
Enter fullscreen mode Exit fullscreen mode

The local AI analyst was designed as read-only first.

It can summarize, correlate, recommend, and draft. It should not automatically make production changes.

Human approval should still be required for actions such as:

  • Disabling IAM users
  • Rotating access keys
  • Blocking IPs globally
  • Changing Cloudflare WAF behavior
  • Muting Datadog monitors
  • Resolving PagerDuty incidents
  • Changing Sysdig policies
  • Quarantining Kubernetes workloads
  • Modifying production infrastructure

This matters because a wrong automated containment action can create a larger operational incident than the original alert.


What the AI Runner Does

The AI runner is the analyst-facing command-line interface.

It is what we run during daily operations.

Examples:

python ai_runner.py triage-json samples/sample_cloudtrail_delete_trail.json \
  --use-case UC-006.3-cloudtrail-logging-disabled
Enter fullscreen mode Exit fullscreen mode
python ai_runner.py security-signals --hours 24
Enter fullscreen mode Exit fullscreen mode
python ai_runner.py pagerduty --hours 24
Enter fullscreen mode Exit fullscreen mode
python ai_runner.py daily --hours 24 --out reports/daily_soc_report.md
Enter fullscreen mode Exit fullscreen mode

The runner coordinates the work:

  1. Pull security data from the configured source.
  2. Select the right SOC prompt.
  3. Build a bounded event bundle.
  4. Send the prompt and evidence to Ollama.
  5. Receive structured analysis from the local model.
  6. Print the result or write a report.
  7. Keep the workflow repeatable.

The runner is not the intelligence layer by itself. Its value is operational discipline. It prevents the analyst from manually copying logs, manually selecting prompts, manually formatting output, and manually saving results every time.


What the Harness Does

The harness is the control layer around the model.

This is the difference between a chatbot and a SOC workflow tool.

The harness handles:

  • Datadog API access
  • PagerDuty API access
  • Optional Sysdig API access
  • Use-case-specific prompts
  • SOC output structure
  • Context size limits
  • Model timeout configuration
  • Evidence-oriented analysis
  • Daily report generation
  • Read-only operating behavior
  • Repeatable command structure

The harness gives the model boundaries.

For SOC operations, this is critical. A local AI model should not receive an unbounded pile of logs and be asked, β€œIs anything bad?” That produces weak output and increases hallucination risk.

Instead, the harness asks focused questions:

  • Analyze this CloudTrail event for possible defense evasion.
  • Summarize Datadog security signals from the last 24 hours.
  • Review PagerDuty incidents for security relevance.
  • Draft a daily SOC report from bounded evidence.
  • Identify missing evidence and recommended follow-up queries.

The model reasons. The harness controls the task.


Model Selection Strategy

At first, a larger model such as qwen3:8b looked attractive because the problem involved cloud logs, security reasoning, and structured analysis.

That was a reasonable starting point. Larger models can be useful when the event bundle is small and the question requires deeper reasoning.

However, the target machine was an M1 MacBook Pro, not a dedicated GPU workstation. That changed the practical answer.

During testing, the first small triage workflow succeeded, but the machine became sluggish. Later, the heavier daily report failed with a local Ollama timeout:

ReadTimeout: HTTPConnectionPool(host='127.0.0.1', port=11434): Read timed out. (read timeout=300)
Enter fullscreen mode Exit fullscreen mode

That error was useful because it showed:

  • The Python harness was running.
  • The harness reached Ollama on localhost.
  • Ollama was processing the request.
  • The model did not complete within the configured timeout.

So the issue was not the SOC design. The issue was local inference load: model size, prompt size, timeout, and hardware limits.

The model strategy was adjusted:

Task Model Why
Smoke testing llama3.2:3b Fast and stable on M1
Daily SOC report llama3.2:3b More reliable for bounded daily reporting
Focused deeper investigation qwen3:8b Useful when the event bundle is smaller
Large multi-source correlation Avoid on M1 unless carefully limited Can cause slowdowns or timeouts

The final default became:

SOC_MODEL=llama3.2:3b
SOC_FAST_MODEL=llama3.2:3b
Enter fullscreen mode Exit fullscreen mode

This was the right operational tradeoff.

A smaller model that finishes reliably is more useful than a larger model that freezes the analyst workstation or times out during daily operations.


Hardware Constraint: The M1 MacBook Pro Matters

The M1 MacBook Pro can run useful local AI workflows, but the workflow must be tuned.

The main constraints were:

  • Local model cold start time
  • Memory pressure
  • Swap usage
  • Large prompt size
  • Long generation time
  • Ollama timeout
  • Large 24-hour log bundles

The fix was not to abandon the local approach. The fix was to make the workflow smaller and more controlled:

Use a smaller default model.
Limit daily prompt size.
Start with 6-hour reports.
Increase to 24 hours after validation.
Increase the Ollama timeout where needed.
Avoid sending excessive raw logs to the model.
Use focused use-case prompts.
Enter fullscreen mode Exit fullscreen mode

That is what made the solution usable.


Problems We Hit and How We Fixed Them

1. ollama ps Showing Nothing

When checking which model was running, ollama ps returned nothing.

That does not always mean something is broken.

ollama ps shows models currently loaded in memory. If the model finished and unloaded, it may show nothing.

Useful checks:

ollama list
Enter fullscreen mode Exit fullscreen mode

Shows installed models.

ollama ps
Enter fullscreen mode Exit fullscreen mode

Shows currently loaded models.

ollama run llama3.2:3b
Enter fullscreen mode Exit fullscreen mode

Manually starts a model.

This distinction helped avoid misdiagnosing a normal Ollama state as a failure.


7. Mac was Freezing

The Mac became sluggish after running the local model.

The likely cause was local inference load, especially if a larger model was used.

The fix was to run the smaller model first:

SOC_MODEL=llama3.2:3b python ai_runner.py triage-json samples/sample_cloudtrail_delete_trail.json \
  --use-case UC-006.3-cloudtrail-logging-disabled
Enter fullscreen mode Exit fullscreen mode

For stability, Ollama can also be limited:

export OLLAMA_NUM_PARALLEL=1
export OLLAMA_MAX_LOADED_MODELS=1
export OLLAMA_KEEP_ALIVE=30m
Enter fullscreen mode Exit fullscreen mode

7. Daily Report Timeout

The daily command failed because the model did not return within the configured timeout:

ReadTimeout: HTTPConnectionPool(host='127.0.0.1', port=11434): Read timed out. (read timeout=300)
Enter fullscreen mode Exit fullscreen mode

The fix had three parts:

  1. Use llama3.2:3b for daily reports.
  2. Reduce the daily prompt size.
  3. Increase the local model timeout where appropriate.

A safer first run was:

SOC_MODEL=llama3.2:3b python ai_runner.py daily --hours 6 --out reports/daily_soc_report.md
Enter fullscreen mode Exit fullscreen mode

Then scale to:

SOC_MODEL=llama3.2:3b python ai_runner.py daily --hours 24 --out reports/daily_soc_report.md
Enter fullscreen mode Exit fullscreen mode

The lesson: daily reports should summarize bounded evidence, not feed unlimited raw logs into a local model.


First Successful SOC Triage

The first successful test used a sample CloudTrail StopLogging event.

That is a meaningful test because attempts to stop CloudTrail logging may indicate defense evasion, unauthorized administrative activity, or compromised credentials.

The AI produced a high-risk SOC-style result similar to:

{
  "severity": "High",
  "confidence": 85,
  "disposition": "true_positive",
  "summary": "Suspicious attempt to stop CloudTrail logging...",
  "suspicious_indicators": [
    "StopLogging event by IAM user 'svc-deploy'",
    "Source IP 203.0.113.45",
    "User agent python-requests/2.32"
  ]
}
Enter fullscreen mode Exit fullscreen mode

This proved the core workflow:

Local venv works.
Dependencies are installed.
AI runner executes.
Harness builds the prompt.
Ollama receives the request.
Local model returns SOC-style analysis.
Enter fullscreen mode Exit fullscreen mode

The next improvement was to tighten expected output so the model always includes missing evidence and recommended follow-up queries. For production SOC use, those fields matter because they keep the analyst grounded in evidence.


Example SOC Use Cases

CloudTrail Logging Disabled

Use case:

UC-006.3-cloudtrail-logging-disabled
Enter fullscreen mode Exit fullscreen mode

Purpose:

Investigate possible CloudTrail tampering or defense evasion.

Example command:

python ai_runner.py datadog-query \
  --query 'source:cloudtrail @evt.name:(StopLogging OR DeleteTrail OR UpdateTrail OR PutEventSelectors)' \
  --hours 24 \
  --use-case UC-006.3-cloudtrail-logging-disabled
Enter fullscreen mode Exit fullscreen mode

Follow-up evidence should include:

  • Actor identity
  • Source IP
  • User agent
  • IAM permissions
  • Change ticket
  • Trail status after the event
  • Related IAM changes
  • Security Hub findings
  • Other Datadog signals for the same account or identity

IAM Privilege Escalation

Use case:

UC-007-iam-privilege-escalation
Enter fullscreen mode Exit fullscreen mode

Example command:

python ai_runner.py datadog-query \
  --query 'source:cloudtrail @evt.name:(AttachUserPolicy OR PutUserPolicy OR CreateAccessKey OR UpdateAssumeRolePolicy OR PassRole)' \
  --hours 24 \
  --use-case UC-007-iam-privilege-escalation
Enter fullscreen mode Exit fullscreen mode

The AI should help determine whether the activity was expected administration, automated deployment behavior, or suspicious privilege escalation.


Cloudflare WAF Activity

Use case:

UC-011-cloudflare-waf-attack
Enter fullscreen mode Exit fullscreen mode

Example command:

python ai_runner.py datadog-query \
  --query 'source:cloudflare (@action:block OR @action:challenge OR @security_action:block)' \
  --hours 24 \
  --use-case UC-011-cloudflare-waf-attack
Enter fullscreen mode Exit fullscreen mode

The AI should summarize source distribution, attacked paths, WAF actions, spike patterns, and whether any traffic bypassed protections.


Route53 DNS Firewall Activity

Use case:

UC-010-route53-dns-firewall-blocks
Enter fullscreen mode Exit fullscreen mode

Example command:

python ai_runner.py datadog-query \
  --query 'source:route53resolverdnsfirewall OR source:route53 @action:block' \
  --hours 24 \
  --use-case UC-010-route53-dns-firewall-blocks
Enter fullscreen mode Exit fullscreen mode

The AI should help identify suspicious domains, affected workloads, recurring clients, and whether the blocked activity suggests malware, misconfiguration, or expected testing.


GitHub Audit Risk

Use case:

UC-014-github-audit-risk
Enter fullscreen mode Exit fullscreen mode

Example command:

python ai_runner.py datadog-query \
  --query 'source:github (@action:*deploy_key* OR @action:*repo* OR @action:*workflow* OR @action:*branch_protection*)' \
  --hours 24 \
  --use-case UC-014-github-audit-risk
Enter fullscreen mode Exit fullscreen mode

The AI should focus on risky repository changes, workflow changes, deploy key activity, branch protection changes, and unusual administrative actions.

Those mentioned cases are one of few. The possibility is huge here. If you can follow the architecture then success will be yours.


Daily SOC Workflow

The stable workflow became:

1. Start Ollama

ollama serve
Enter fullscreen mode Exit fullscreen mode

2. Activate the project environment

cd /Users/tariqual/Documents/local_ai_soc_analyst
source .venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

3. Confirm model availability

ollama list
Enter fullscreen mode Exit fullscreen mode

4. Run a smoke test

python ai_runner.py triage-json samples/sample_cloudtrail_delete_trail.json \
  --use-case UC-006.3-cloudtrail-logging-disabled
Enter fullscreen mode Exit fullscreen mode

5. Run a safe daily report first

SOC_MODEL=llama3.2:3b python ai_runner.py daily --hours 6 --out reports/daily_soc_report.md
Enter fullscreen mode Exit fullscreen mode

6. Run the full daily report after the safe run works

SOC_MODEL=llama3.2:3b python ai_runner.py daily --hours 24 --out reports/daily_soc_report.md
Enter fullscreen mode Exit fullscreen mode

7. Review the output as an analyst

The report should be reviewed for:

  • P0 and P1 items
  • CloudTrail administrative changes
  • Security Hub critical or high findings
  • Cloudflare attack patterns
  • Route53 DNS Firewall blocks
  • SES or SNS abuse indicators
  • GitHub audit activity
  • PagerDuty incidents
  • Sysdig runtime alerts
  • Missing evidence
  • Recommended Datadog queries
  • Escalation or containment recommendations

The daily report is an analyst aid. It is not an automatic incident declaration.


Why This Works

The final solution works because it respects both the SOC workflow and the hardware.

It does not try to make the local model do everything.

It uses the existing security stack correctly:

Datadog detects and stores telemetry.
Sysdig detects runtime policy violations.
PagerDuty routes alerts.
The local AI harness gathers and structures evidence.
The model reasons over bounded context.
The analyst makes the final decision.
Enter fullscreen mode Exit fullscreen mode

That is a realistic AI SOC operating model.


What We Learned

1. The model is only one part of the solution

A strong model without a workflow becomes a chatbot. A smaller model with a strong harness can become a useful SOC assistant.

2. Local hardware must shape the design

The M1 MacBook Pro can support useful local AI workflows, but model size and prompt size must be controlled.

3. Daily SOC reporting needs summarization, not raw log dumping

Large prompts cause slowdowns and timeouts. The better pattern is to query, reduce, summarize, and then report.

4. Read-only first is the right security posture

The AI can recommend containment, but production changes should remain human-approved.

5. Evidence discipline matters

The AI output should separate observed facts, assumptions, missing evidence, and recommended next actions.

6. The harness is the operational control plane

The harness provides repeatability, guardrails, prompts, source integration, and output structure. That is what makes the solution operationally useful.


Final Outcome

We achieved a working local AI SOC analyst solution that fits the original problem set.

The final solution:

  • Runs locally on an M1 MacBook Pro.
  • Uses Ollama as the local model runner.
  • Uses llama3.2:3b as the stable default model.
  • Allows qwen3:8b for focused deeper analysis when the machine can handle it.
  • Uses a Python harness to control prompts, context, and workflows.
  • Uses an AI runner CLI for repeatable SOC commands.
  • Works with Datadog, PagerDuty, and optional Sysdig integration.
  • Supports CloudTrail, Security Hub, Route53 DNS Firewall, SES, SNS, Cloudflare, GitHub audit, application logs, and Kubernetes-related alert review.
  • Produces useful triage output and daily SOC reports.
  • Avoids unsafe automation by keeping containment human-approved.

The biggest success was not just getting a model to run locally. The success was turning local AI into a controlled SOC workflow that works despite hardware limitations.

That is the practical path for introducing AI into security operations: start with a real problem, keep the architecture simple, control the blast radius, tune for the hardware, and make the analyst workflow better.

Top comments (0)