Mike Anderson

Posted on May 24

Building a Local AI SOC Analyst on an M1 MacBook Pro

#ai #soc #harness #aimodel

How I solved a real SOC operations problem for Datadog, AWS, Cloudflare, Sysdig, PagerDuty with an AI runner, a local AI harness with a tricky model selection process

Executive Summary

We started with a practical SOC problem: build an AI-based SOC analyst that runs locally on an M1 MacBook Pro and helps with daily security operations across an existing cloud-native monitoring and alerting stack.

The environment already had strong telemetry and alerting coverage:

AWS CloudTrail
AWS Security Hub
Route53 VPC DNS Firewall
SES
SNS
Cloudflare logs
Application logs
GitHub audit logs crawler
Datadog Cloud Security detections
Datadog monitors for Kubernetes and AWS metrics
Datadog dashboards covering many SOC use cases
Sysdig runtime policies for Kubernetes
PagerDuty alert routing

The problem was not lack of logs or alerts. The real challenge was analyst workflow. The SOC still needed a repeatable way to review alerts, correlate evidence, summarize findings, identify missing context, and produce daily security notes without manually jumping between tools every time.

The working solution became a local AI SOC analyst pattern:

Ollama                Local model runner
llama3.2:3b           Stable default model for M1 daily SOC work
qwen3:8b              Optional larger model for focused deeper analysis
Python harness        SOC workflow, prompts, guardrails, and integrations
AI runner CLI         Analyst-facing command-line interface
Datadog               Primary log, signal, dashboard, and monitoring source
PagerDuty             Alert and incident routing source
Sysdig                Separate runtime policy signal source
Human analyst         Final decision authority

The important lesson was that the model alone was not the solution. The working solution came from combining the right model, a controlled harness, bounded prompts, use-case-driven analysis, and realistic expectations about local MacBook hardware.

The Original Problem

The goal was to build a local AI-based SOC analyst on an M1 MacBook Pro.

The main telemetry flow looked like this:

AWS CloudTrail
AWS Security Hub
Route53 VPC DNS Firewall
SES
SNS
Cloudflare logs
Application logs
GitHub audit logs crawler
        |
        v
Datadog
        |
        v
Datadog Cloud Security rules
Datadog monitors
Datadog dashboards
        |
        v
PagerDuty

Sysdig was separate:

Kubernetes runtime activity
        |
        v
Sysdig runtime policies
        |
        v
PagerDuty

That distinction mattered. Datadog was the central place for logs, detections, monitors, and dashboards. Sysdig was not sending its logs to Datadog, so Sysdig alerts had to be treated as a separate runtime security signal path.

The expected solution was not a generic local chatbot. The expected solution was a repeatable local SOC assistant that could support:

Daily SOC review
Alert triage
CloudTrail analysis
AWS Security Hub finding review
Route53 DNS Firewall activity review
SES and SNS activity review
Cloudflare security event review
GitHub audit log review
Application log review
PagerDuty incident summarization
Sysdig runtime alert review
SOC note drafting
Recommended follow-up queries

Key Design Decision: AI Should Not Replace Detection

We made one important architectural decision early: the local AI model should not become the detector.

Datadog and Sysdig already perform that role:

Datadog receives logs and metrics.
Datadog Cloud Security rules generate security signals.
Datadog monitors detect operational and Kubernetes-related issues.
Sysdig runtime policies detect Kubernetes runtime policy violations.
PagerDuty routes alerts from Datadog and Sysdig.

The local AI should sit above those systems as a triage and analysis layer.

That means the AI helps answer:

What happened?
Which user, workload, IP, service, account, repository, or API was involved?
Is this likely malicious, expected change, duplicate, benign true positive, or false positive?
What evidence is missing?
Which Datadog queries should be run next?
Should this be escalated?
What should the SOC note say?
Is containment recommended, and does it require human approval?

This keeps the control boundary clean. Detection stays with Datadog and Sysdig. Alerting stays with PagerDuty. The local AI helps the analyst move faster, ask better questions, and document the investigation more consistently.

Final Architecture

The final working architecture was intentionally simple:

              +------------------------------+
              | AWS / Cloudflare / GitHub    |
              | Apps / SES / SNS / DNS FW    |
              +---------------+--------------+
                              |
                              v
                         +---------+
                         | Datadog |
                         | Logs    |
                         | Signals |
                         | Metrics |
                         | Monitors|
                         +----+----+
                              |
                              v
                         +---------+
                         |PagerDuty|
                         +----+----+

       +------------------+        +---------+
       | Sysdig Runtime   |------->|PagerDuty|
       | Policies         |        +---------+
       +------------------+

                              |
                              v

              +------------------------------+
              | Local AI SOC Analyst         |
              | M1 MacBook Pro               |
              |                              |
              | Ollama                       |
              | llama3.2:3b / qwen3:8b       |
              | Python SOC Harness           |
              | AI Runner CLI                |
              +------------------------------+

The local AI analyst was designed as read-only first.

It can summarize, correlate, recommend, and draft. It should not automatically make production changes.

Human approval should still be required for actions such as:

Disabling IAM users
Rotating access keys
Blocking IPs globally
Changing Cloudflare WAF behavior
Muting Datadog monitors
Resolving PagerDuty incidents
Changing Sysdig policies
Quarantining Kubernetes workloads
Modifying production infrastructure

This matters because a wrong automated containment action can create a larger operational incident than the original alert.

What the AI Runner Does

The AI runner is the analyst-facing command-line interface.

It is what we run during daily operations.

Examples:

python ai_runner.py triage-json samples/sample_cloudtrail_delete_trail.json \
  --use-case UC-006.3-cloudtrail-logging-disabled

python ai_runner.py security-signals --hours 24

python ai_runner.py pagerduty --hours 24

python ai_runner.py daily --hours 24 --out reports/daily_soc_report.md

The runner coordinates the work:

Pull security data from the configured source.
Select the right SOC prompt.
Build a bounded event bundle.
Send the prompt and evidence to Ollama.
Receive structured analysis from the local model.
Print the result or write a report.
Keep the workflow repeatable.

The runner is not the intelligence layer by itself. Its value is operational discipline. It prevents the analyst from manually copying logs, manually selecting prompts, manually formatting output, and manually saving results every time.

What the Harness Does

The harness is the control layer around the model.

This is the difference between a chatbot and a SOC workflow tool.

The harness handles:

Datadog API access
PagerDuty API access
Optional Sysdig API access
Use-case-specific prompts
SOC output structure
Context size limits
Model timeout configuration
Evidence-oriented analysis
Daily report generation
Read-only operating behavior
Repeatable command structure

The harness gives the model boundaries.

For SOC operations, this is critical. A local AI model should not receive an unbounded pile of logs and be asked, “Is anything bad?” That produces weak output and increases hallucination risk.

Instead, the harness asks focused questions:

Analyze this CloudTrail event for possible defense evasion.
Summarize Datadog security signals from the last 24 hours.
Review PagerDuty incidents for security relevance.
Draft a daily SOC report from bounded evidence.
Identify missing evidence and recommended follow-up queries.

The model reasons. The harness controls the task.

Model Selection Strategy

At first, a larger model such as qwen3:8b looked attractive because the problem involved cloud logs, security reasoning, and structured analysis.

That was a reasonable starting point. Larger models can be useful when the event bundle is small and the question requires deeper reasoning.

However, the target machine was an M1 MacBook Pro, not a dedicated GPU workstation. That changed the practical answer.

During testing, the first small triage workflow succeeded, but the machine became sluggish. Later, the heavier daily report failed with a local Ollama timeout:

ReadTimeout: HTTPConnectionPool(host='127.0.0.1', port=11434): Read timed out. (read timeout=300)

That error was useful because it showed:

The Python harness was running.
The harness reached Ollama on localhost.
Ollama was processing the request.
The model did not complete within the configured timeout.

So the issue was not the SOC design. The issue was local inference load: model size, prompt size, timeout, and hardware limits.

The model strategy was adjusted:

Task	Model	Why
Smoke testing	`llama3.2:3b`	Fast and stable on M1
Daily SOC report	`llama3.2:3b`	More reliable for bounded daily reporting
Focused deeper investigation	`qwen3:8b`	Useful when the event bundle is smaller
Large multi-source correlation	Avoid on M1 unless carefully limited	Can cause slowdowns or timeouts

The final default became:

SOC_MODEL=llama3.2:3b
SOC_FAST_MODEL=llama3.2:3b

This was the right operational tradeoff.

A smaller model that finishes reliably is more useful than a larger model that freezes the analyst workstation or times out during daily operations.

Hardware Constraint: The M1 MacBook Pro Matters

The M1 MacBook Pro can run useful local AI workflows, but the workflow must be tuned.

The main constraints were:

Local model cold start time
Memory pressure
Swap usage
Large prompt size
Long generation time
Ollama timeout
Large 24-hour log bundles

The fix was not to abandon the local approach. The fix was to make the workflow smaller and more controlled:

Use a smaller default model.
Limit daily prompt size.
Start with 6-hour reports.
Increase to 24 hours after validation.
Increase the Ollama timeout where needed.
Avoid sending excessive raw logs to the model.
Use focused use-case prompts.

That is what made the solution usable.

Problems We Hit and How We Fixed Them

1. `ollama ps` Showing Nothing

When checking which model was running, ollama ps returned nothing.

That does not always mean something is broken.

ollama ps shows models currently loaded in memory. If the model finished and unloaded, it may show nothing.

Useful checks:

ollama list

Shows installed models.

ollama ps

Shows currently loaded models.

ollama run llama3.2:3b

Manually starts a model.

This distinction helped avoid misdiagnosing a normal Ollama state as a failure.

7. Mac was Freezing

The Mac became sluggish after running the local model.

The likely cause was local inference load, especially if a larger model was used.

The fix was to run the smaller model first:

SOC_MODEL=llama3.2:3b python ai_runner.py triage-json samples/sample_cloudtrail_delete_trail.json \
  --use-case UC-006.3-cloudtrail-logging-disabled

For stability, Ollama can also be limited:

export OLLAMA_NUM_PARALLEL=1
export OLLAMA_MAX_LOADED_MODELS=1
export OLLAMA_KEEP_ALIVE=30m

7. Daily Report Timeout

The daily command failed because the model did not return within the configured timeout:

ReadTimeout: HTTPConnectionPool(host='127.0.0.1', port=11434): Read timed out. (read timeout=300)

The fix had three parts:

Use llama3.2:3b for daily reports.
Reduce the daily prompt size.
Increase the local model timeout where appropriate.

A safer first run was:

SOC_MODEL=llama3.2:3b python ai_runner.py daily --hours 6 --out reports/daily_soc_report.md

Then scale to:

SOC_MODEL=llama3.2:3b python ai_runner.py daily --hours 24 --out reports/daily_soc_report.md

The lesson: daily reports should summarize bounded evidence, not feed unlimited raw logs into a local model.

First Successful SOC Triage

The first successful test used a sample CloudTrail StopLogging event.

That is a meaningful test because attempts to stop CloudTrail logging may indicate defense evasion, unauthorized administrative activity, or compromised credentials.

The AI produced a high-risk SOC-style result similar to:

{
  "severity": "High",
  "confidence": 85,
  "disposition": "true_positive",
  "summary": "Suspicious attempt to stop CloudTrail logging...",
  "suspicious_indicators": [
    "StopLogging event by IAM user 'svc-deploy'",
    "Source IP 203.0.113.45",
    "User agent python-requests/2.32"
  ]
}

This proved the core workflow:

Local venv works.
Dependencies are installed.
AI runner executes.
Harness builds the prompt.
Ollama receives the request.
Local model returns SOC-style analysis.

The next improvement was to tighten expected output so the model always includes missing evidence and recommended follow-up queries. For production SOC use, those fields matter because they keep the analyst grounded in evidence.

Example SOC Use Cases

CloudTrail Logging Disabled

Use case:

UC-006.3-cloudtrail-logging-disabled

Purpose:

Investigate possible CloudTrail tampering or defense evasion.

Example command:

python ai_runner.py datadog-query \
  --query 'source:cloudtrail @evt.name:(StopLogging OR DeleteTrail OR UpdateTrail OR PutEventSelectors)' \
  --hours 24 \
  --use-case UC-006.3-cloudtrail-logging-disabled

Follow-up evidence should include:

Actor identity
Source IP
User agent
IAM permissions
Change ticket
Trail status after the event
Related IAM changes
Security Hub findings
Other Datadog signals for the same account or identity

IAM Privilege Escalation

Use case:

UC-007-iam-privilege-escalation

Example command:

python ai_runner.py datadog-query \
  --query 'source:cloudtrail @evt.name:(AttachUserPolicy OR PutUserPolicy OR CreateAccessKey OR UpdateAssumeRolePolicy OR PassRole)' \
  --hours 24 \
  --use-case UC-007-iam-privilege-escalation

The AI should help determine whether the activity was expected administration, automated deployment behavior, or suspicious privilege escalation.

Cloudflare WAF Activity

Use case:

UC-011-cloudflare-waf-attack

Example command:

python ai_runner.py datadog-query \
  --query 'source:cloudflare (@action:block OR @action:challenge OR @security_action:block)' \
  --hours 24 \
  --use-case UC-011-cloudflare-waf-attack

The AI should summarize source distribution, attacked paths, WAF actions, spike patterns, and whether any traffic bypassed protections.

Route53 DNS Firewall Activity

Use case:

UC-010-route53-dns-firewall-blocks

Example command:

python ai_runner.py datadog-query \
  --query 'source:route53resolverdnsfirewall OR source:route53 @action:block' \
  --hours 24 \
  --use-case UC-010-route53-dns-firewall-blocks

The AI should help identify suspicious domains, affected workloads, recurring clients, and whether the blocked activity suggests malware, misconfiguration, or expected testing.

GitHub Audit Risk

Use case:

UC-014-github-audit-risk

Example command:

python ai_runner.py datadog-query \
  --query 'source:github (@action:*deploy_key* OR @action:*repo* OR @action:*workflow* OR @action:*branch_protection*)' \
  --hours 24 \
  --use-case UC-014-github-audit-risk

The AI should focus on risky repository changes, workflow changes, deploy key activity, branch protection changes, and unusual administrative actions.

Those mentioned cases are one of few. The possibility is huge here. If you can follow the architecture then success will be yours.

Daily SOC Workflow

The stable workflow became:

1. Start Ollama

ollama serve

2. Activate the project environment

cd /Users/tariqual/Documents/local_ai_soc_analyst
source .venv/bin/activate

3. Confirm model availability

ollama list

4. Run a smoke test

python ai_runner.py triage-json samples/sample_cloudtrail_delete_trail.json \
  --use-case UC-006.3-cloudtrail-logging-disabled

5. Run a safe daily report first

SOC_MODEL=llama3.2:3b python ai_runner.py daily --hours 6 --out reports/daily_soc_report.md

6. Run the full daily report after the safe run works

SOC_MODEL=llama3.2:3b python ai_runner.py daily --hours 24 --out reports/daily_soc_report.md

7. Review the output as an analyst

The report should be reviewed for:

P0 and P1 items
CloudTrail administrative changes
Security Hub critical or high findings
Cloudflare attack patterns
Route53 DNS Firewall blocks
SES or SNS abuse indicators
GitHub audit activity
PagerDuty incidents
Sysdig runtime alerts
Missing evidence
Recommended Datadog queries
Escalation or containment recommendations

The daily report is an analyst aid. It is not an automatic incident declaration.

Why This Works

The final solution works because it respects both the SOC workflow and the hardware.

It does not try to make the local model do everything.

It uses the existing security stack correctly:

Datadog detects and stores telemetry.
Sysdig detects runtime policy violations.
PagerDuty routes alerts.
The local AI harness gathers and structures evidence.
The model reasons over bounded context.
The analyst makes the final decision.

That is a realistic AI SOC operating model.

What We Learned

1. The model is only one part of the solution

A strong model without a workflow becomes a chatbot. A smaller model with a strong harness can become a useful SOC assistant.

2. Local hardware must shape the design

The M1 MacBook Pro can support useful local AI workflows, but model size and prompt size must be controlled.

3. Daily SOC reporting needs summarization, not raw log dumping

Large prompts cause slowdowns and timeouts. The better pattern is to query, reduce, summarize, and then report.

4. Read-only first is the right security posture

The AI can recommend containment, but production changes should remain human-approved.

5. Evidence discipline matters

The AI output should separate observed facts, assumptions, missing evidence, and recommended next actions.

6. The harness is the operational control plane

The harness provides repeatability, guardrails, prompts, source integration, and output structure. That is what makes the solution operationally useful.

Final Outcome

We achieved a working local AI SOC analyst solution that fits the original problem set.

The final solution:

Runs locally on an M1 MacBook Pro.
Uses Ollama as the local model runner.
Uses llama3.2:3b as the stable default model.
Allows qwen3:8b for focused deeper analysis when the machine can handle it.
Uses a Python harness to control prompts, context, and workflows.
Uses an AI runner CLI for repeatable SOC commands.
Works with Datadog, PagerDuty, and optional Sysdig integration.
Supports CloudTrail, Security Hub, Route53 DNS Firewall, SES, SNS, Cloudflare, GitHub audit, application logs, and Kubernetes-related alert review.
Produces useful triage output and daily SOC reports.
Avoids unsafe automation by keeping containment human-approved.

The biggest success was not just getting a model to run locally. The success was turning local AI into a controlled SOC workflow that works despite hardware limitations.

That is the practical path for introducing AI into security operations: start with a real problem, keep the architecture simple, control the blast radius, tune for the hardware, and make the analyst workflow better.

DEV Community

Building a Local AI SOC Analyst on an M1 MacBook Pro

How I solved a real SOC operations problem for Datadog, AWS, Cloudflare, Sysdig, PagerDuty with an AI runner, a local AI harness with a tricky model selection process

Executive Summary

The Original Problem

Key Design Decision: AI Should Not Replace Detection

Final Architecture

What the AI Runner Does

What the Harness Does

Model Selection Strategy

Hardware Constraint: The M1 MacBook Pro Matters

Problems We Hit and How We Fixed Them

1. `ollama ps` Showing Nothing

7. Mac was Freezing

7. Daily Report Timeout

First Successful SOC Triage

Example SOC Use Cases

CloudTrail Logging Disabled

IAM Privilege Escalation

Cloudflare WAF Activity

Route53 DNS Firewall Activity

GitHub Audit Risk

Daily SOC Workflow

1. Start Ollama

2. Activate the project environment

3. Confirm model availability

4. Run a smoke test

5. Run a safe daily report first

6. Run the full daily report after the safe run works

7. Review the output as an analyst

Why This Works

What We Learned

1. The model is only one part of the solution

2. Local hardware must shape the design

3. Daily SOC reporting needs summarization, not raw log dumping

4. Read-only first is the right security posture

5. Evidence discipline matters

6. The harness is the operational control plane

Final Outcome

Top comments (0)

How I solved a real SOC operations problem for Datadog, AWS, Cloudflare, Sysdig, PagerDuty with an AI runner, a local AI harness with a tricky model selection process

Executive Summary

The Original Problem

Key Design Decision: AI Should Not Replace Detection

Final Architecture

What the AI Runner Does

What the Harness Does

Model Selection Strategy

Hardware Constraint: The M1 MacBook Pro Matters

Problems We Hit and How We Fixed Them

1. ollama ps Showing Nothing

7. Mac was Freezing

7. Daily Report Timeout

First Successful SOC Triage

Example SOC Use Cases

CloudTrail Logging Disabled

IAM Privilege Escalation

Cloudflare WAF Activity

Route53 DNS Firewall Activity

GitHub Audit Risk

Daily SOC Workflow

1. Start Ollama

2. Activate the project environment

3. Confirm model availability

4. Run a smoke test

5. Run a safe daily report first

6. Run the full daily report after the safe run works

7. Review the output as an analyst

Why This Works

What We Learned

1. The model is only one part of the solution

2. Local hardware must shape the design

3. Daily SOC reporting needs summarization, not raw log dumping

4. Read-only first is the right security posture

5. Evidence discipline matters

6. The harness is the operational control plane

Final Outcome

1. `ollama ps` Showing Nothing