Aakash Rahsi

Posted on May 12

Threat-Forged Sentinel | Custom Log Ingestion | Turning Non-Native Logs into Detection-Grade Intelligence | R.A.H.S.I. Framework™ Analysis

#ai #security #sentinel #azure

🛡️ 𝗥𝗲𝗮𝗱 𝘁𝗵𝗲 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲 𝗮𝗿𝘁𝗶𝗰𝗹𝗲 |

Threat-Forged Sentinel | Custom Log Ingestion | Turning Non-Native Logs into Detection-Grade Intelligence | R.A.H.S.I. Framework™ Analysis

Threat-Forged Sentinel turns custom logs into detection-grade intelligence with DCRs, KQL, ASIM, and Sentinel analytics.

aakashrahsi.online

🛡️ 𝗟𝗲𝘁’𝘀 𝗰𝗼𝗻𝗻𝗲𝗰𝘁 |

Hire Aakash Rahsi | Expert in Intune, Automation, AI, and Cloud Solutions

Hire Aakash Rahsi, a seasoned IT expert with over 13 years of experience specializing in PowerShell scripting, IT automation, cloud solutions, and cutting-edge tech consulting. Aakash offers tailored strategies and innovative solutions to help businesses streamline operations, optimize cloud infrastructure, and embrace modern technology. Perfect for organizations seeking advanced IT consulting, automation expertise, and cloud optimization to stay ahead in the tech landscape.

aakashrahsi.online

Threat-Forged Sentinel | Custom Log Ingestion | Turning Non-Native Logs into Detection-Grade Intelligence | R.A.H.S.I. Framework™ Analysis

A SOC Engineering Blueprint for Turning Raw Logs into Detection-Grade Intelligence

Most SOC teams treat custom log ingestion as a data onboarding task.

That is the mistake.

In Microsoft Sentinel, custom log ingestion should not be measured only by whether the data lands in a Log Analytics workspace.

It should be measured by whether the data can support:

Detection
Investigation
Hunting
Entity mapping
Analytics rules
Workbooks
Incident response
SOC optimization
Threat coverage

That is the purpose of Threat-Forged Sentinel.

It is a framework for turning non-native logs into detection-grade intelligence using Microsoft Sentinel, Azure Monitor, Data Collection Rules, transformation logic, custom tables, ASIM-style normalization, KQL analytics, entity mapping, hunting workflows, and SOC visibility.

A log is not valuable because it was ingested.

A log becomes valuable when it helps the SOC detect adversary behavior, explain what happened, map affected entities, and drive response.

1. The Core Problem: Raw Logs Are Not Intelligence

Many organizations ingest logs from firewalls, proxies, SaaS platforms, appliances, OT systems, IAM tools, custom applications, and business platforms.

But ingestion alone does not create security value.

A raw log may contain useful evidence, but if it is poorly structured, inconsistently parsed, missing key fields, or disconnected from detection logic, it becomes difficult for analysts to use.

The result is common:

Logs exist but are not used in detections
Tables exist but no one queries them
Fields exist but are not normalized
Data is ingested but not mapped to entities
Analysts cannot quickly understand the event
Hunting queries are difficult to reuse
Workbooks cannot visualize the signal
SOC leaders cannot prove coverage
Storage cost increases without detection value

This is not detection engineering.

This is log storage.

Threat-Forged Sentinel changes the goal.

The goal is not:

Did we ingest the log?

The goal is:

Can this log detect adversary behavior, support investigation, and improve SOC coverage?

2. What Threat-Forged Sentinel Means

Threat-Forged Sentinel is the discipline of engineering custom log pipelines so that non-native telemetry becomes usable security intelligence.

It connects the full pipeline:

Source Log
   ↓
Collection Method
   ↓
Data Collection Endpoint
   ↓
Data Collection Rule
   ↓
Transformation Logic
   ↓
Custom Table
   ↓
Parser
   ↓
Normalized Schema
   ↓
KQL Detection
   ↓
Entity Mapping
   ↓
Analytics Rule
   ↓
Hunting Query
   ↓
Workbook Visibility
   ↓
SOC Optimization

This is the shift from log ingestion to detection engineering.

The SOC should not only collect data.

It should forge the data into signal.

3. Microsoft Sentinel and Azure Monitor as the Data Foundation

Microsoft Sentinel uses Azure Monitor and Log Analytics as the data foundation.

Logs ingested into Sentinel are stored in a Log Analytics workspace, where Kusto Query Language, or KQL, is used to query data, detect threats, investigate activity, and build analytics.

For native Microsoft sources, many connectors already provide structured tables and content.

For non-native sources, the SOC must often design the ingestion and transformation path.

That design may include:

Azure Monitor Agent
Syslog forwarding
Common Event Format ingestion
Custom Logs via Azure Monitor Agent
Logs Ingestion API
Data Collection Endpoints
Data Collection Rules
Custom Log Analytics tables
Ingestion-time transformations
Workspace transformations
ASIM normalization
Custom parsers
Sentinel analytics rules
Hunting queries
Workbooks

The engineering decision is not simply how to bring the log in.

The engineering decision is how to make the log useful.

4. Custom Log Ingestion Architecture

A strong custom ingestion architecture starts with a clear understanding of the source.

Before onboarding any custom log source, the SOC should define:

What system produces the log?
What security behavior does it represent?
Is the source authoritative?
Is the timestamp reliable?
Which fields are required for detection?
Which fields identify users, hosts, IPs, URLs, files, or cloud resources?
Which fields should be transformed or enriched?
Which fields contain sensitive data?
Which Sentinel table should store the data?
Which parser should make the data reusable?
Which detections or hunts will use the data?
Which workbook will prove visibility?

If these questions are not answered, custom ingestion becomes a technical exercise without operational value.

5. Collection Methods for Non-Native Logs

Non-native logs can enter Microsoft Sentinel through multiple patterns.

The correct method depends on the source system, format, network path, latency requirements, and operational model.

Common ingestion methods include:

Collection Method	Best Use Case
Azure Monitor Agent	Collecting logs from machines, servers, and supported custom text logs
Syslog	Linux and network device logging
CEF	Security appliances and products that support Common Event Format
Logs Ingestion API	Custom applications, platforms, pipelines, and sources that can send JSON over API
Data Collection Endpoint	Ingestion endpoint control, especially where private connectivity is required
Data Collection Rule	Routing, transformation, and destination logic
Custom Table	Storing unique log formats in Log Analytics
Built-in Sentinel Connector	Native or partner-supported ingestion
Workspace Transformation	Applying transformation logic to supported tables

Each method should be selected based on the detection objective, not only the easiest ingestion path.

6. Data Collection Rules as the Control Plane

Data Collection Rules, or DCRs, are central to custom log ingestion.

A DCR defines how data is collected, transformed, and sent to a destination such as a Log Analytics workspace.

In a Threat-Forged Sentinel model, the DCR becomes the ingestion control plane.

It can define:

Input stream structure
Destination workspace
Target table
Transformation logic
Output stream
Data routing
Filtering
Field shaping
Schema alignment

This matters because raw source data often does not match the target table schema.

The DCR transformation can reshape incoming data before it lands.

That means the SOC can engineer data quality at ingestion time instead of forcing every analyst or rule to handle messy fields later.

7. Data Collection Endpoints

A Data Collection Endpoint, or DCE, provides an endpoint for data ingestion.

In many custom ingestion designs, especially with private connectivity requirements, a DCE can be part of the architecture.

A DCE can support scenarios where data needs a controlled ingestion endpoint before being processed by the DCR.

The relationship can be understood like this:

Source System
   ↓
Data Collection Endpoint
   ↓
Data Collection Rule
   ↓
Transformation
   ↓
Log Analytics Table
   ↓
Microsoft Sentinel

The DCE is not the detection layer.

It is part of the ingestion path.

The DCR and transformation logic are where the source data begins to become detection-ready.

8. Logs Ingestion API for Custom Sources

The Logs Ingestion API is important when a source can send data through REST API calls or client libraries.

This is useful for:

Custom applications
SaaS platforms
Internal security tools
Business applications
Custom detection pipelines
Middleware
Enrichment systems
Non-standard telemetry sources

The source sends JSON-formatted data to Azure Monitor.

The DCR defines how that data is interpreted and where it is stored.

This provides flexibility because the incoming source format does not always need to match the final table format. Transformation logic can reshape the event into the destination schema.

A typical Logs Ingestion API flow looks like this:

Custom Source
   ↓
JSON Payload
   ↓
Logs Ingestion API
   ↓
DCR Stream Declaration
   ↓
Transform KQL
   ↓
Custom Table
   ↓
Sentinel KQL Detection

This pattern is powerful because it gives the SOC control over the structure, destination, and detection readiness of custom telemetry.

9. Custom Tables in Log Analytics

Custom tables are used when the source data does not fit an existing standard table.

A custom table should not be created casually.

It should be designed around how the SOC will query, detect, hunt, and investigate.

A useful custom table should have:

Clear naming
Reliable timestamp field
Source identifier
Event type
User field
Host field
IP address fields
URL or domain fields
Action field
Result field
Severity or risk field
Raw message field when needed
Parsed fields for detection
Consistent data types
Minimal unnecessary columns

A poor custom table becomes a dumping ground.

A well-designed custom table becomes a detection asset.

10. Recommended Custom Table Design

Below is a practical custom table design model for a non-native security source.

Column	Type	Purpose
TimeGenerated	datetime	Event timestamp used by Sentinel and KQL
SourceVendor	string	Vendor or platform name
SourceProduct	string	Product or service name
EventType	string	Type of security event
EventResult	string	Success, failure, blocked, allowed, detected
EventSeverity	string	Source severity or mapped severity
User	string	User identity
SrcIpAddr	string	Source IP address
DstIpAddr	string	Destination IP address
DstHostname	string	Destination host
Url	string	URL involved in event
Domain	string	Domain involved in event
FileName	string	File involved in event
Action	string	Action taken by the source system
RuleName	string	Source rule or policy name
ThreatName	string	Threat or signature name
RawMessage	string	Original event payload or message
AdditionalFields	dynamic	Flexible extra metadata

This structure helps detection engineers write reusable KQL.

It also makes the data easier to normalize and map to Sentinel entities.

11. Ingestion-Time Transformations

Ingestion-time transformations allow data to be shaped before it is stored.

This can include:

Filtering irrelevant records
Removing unnecessary columns
Parsing raw fields
Creating calculated fields
Renaming fields
Normalizing values
Masking sensitive data
Routing events to the correct table
Enriching logs with additional context

This is important because detection quality often depends on data quality.

For example, a raw log may contain the source IP inside a long message string.

A transformation can extract that value into a dedicated field such as SrcIpAddr.

That one engineering decision can make the data more useful for analytics rules, hunting queries, entity mapping, and workbooks.

12. Example Transformation Logic

A simplified transformation might reshape incoming custom source data into a cleaner schema.

source
| extend EventTime = todatetime(timestamp)
| extend SrcIpAddr = tostring(src_ip)
| extend DstIpAddr = tostring(dst_ip)
| extend User = tostring(username)
| extend EventType = tostring(event_type)
| extend EventResult = tostring(result)
| extend Action = tostring(action)
| project
    TimeGenerated = EventTime,
    SrcIpAddr,
    DstIpAddr,
    User,
    EventType,
    EventResult,
    Action,
    RawMessage = tostring(raw_message)

This is not just formatting.

This is detection preparation.

The transformation creates fields that KQL detections and entity mapping can use consistently.

13. Filtering Noise at Ingestion

Not every event deserves to be stored in the same way.

Some events are useful for detection.

Some are useful only for audit.

Some are repetitive noise.

Some contain sensitive data that should be masked or removed.

Ingestion-time filtering can help reduce unnecessary data volume and improve signal quality.

Examples include:

Dropping known health-check events
Removing duplicate heartbeat logs
Filtering low-value debug events
Removing sensitive fields
Keeping only security-relevant event types
Routing high-value events to analytics-ready tables
Sending low-value events to lower-cost storage where appropriate

The objective is not blind data reduction.

The objective is security-focused data shaping.

14. Normalization and ASIM

Normalization is where custom logs become reusable across the SOC.

Microsoft Sentinel supports the Advanced Security Information Model, commonly known as ASIM, to help normalize different source types into common schemas.

This matters because every vendor has its own field names.

One firewall may use src_ip.

Another may use sourceAddress.

Another may use client_ip.

Without normalization, every detection must be rewritten for every source.

With normalization, multiple sources can support common detection and hunting logic.

Normalization helps convert vendor-specific telemetry into a common analytical language.

15. Why ASIM-Style Normalization Matters

ASIM-style normalization helps the SOC:

Reduce vendor-specific query logic
Create reusable detections
Improve hunting consistency
Make workbooks easier to build
Improve analyst experience
Compare events across products
Support cross-source correlation
Build scalable detection content

For example, normalized network session data can support hunting across firewall, proxy, VPN, and network appliance logs.

Normalized authentication data can support identity-focused detection across Entra ID, VPN, SaaS, and third-party IAM platforms.

The more consistent the schema, the more reusable the detection logic.

16. Parser Engineering

Parsers convert source-specific data into reusable views.

A parser can:

Rename source fields
Convert data types
Extract values from raw messages
Map vendor fields to normalized names
Add calculated fields
Standardize event results
Normalize user, IP, URL, and host fields
Hide source complexity from analysts

A good parser allows analysts and rules to query a clean function instead of raw table complexity.

Example concept:

CustomFirewall_CL
| extend SrcIpAddr = tostring(SourceIP_s)
| extend DstIpAddr = tostring(DestinationIP_s)
| extend DstPortNumber = toint(DestinationPort_d)
| extend EventResult = iff(Action_s == "allow", "Success", "Failure")
| project
    TimeGenerated,
    SrcIpAddr,
    DstIpAddr,
    DstPortNumber,
    EventResult,
    Action_s,
    RuleName_s

The parser makes the data usable.

The detection logic becomes cleaner.

The analyst experience improves.

17. From Custom Logs to KQL Detections

A custom log source becomes valuable when it supports reliable KQL detection logic.

A detection-grade custom log should support queries such as:

Suspicious authentication failures
Impossible travel from non-native IAM logs
Proxy access to suspicious domains
Firewall deny spikes
Data exfiltration indicators
Rare destination access
Privileged user activity
Admin policy changes
Malware detections from security appliances
OT device anomalies
SaaS mass download behavior
API abuse patterns
Suspicious user-agent activity
New external destination patterns

The query should model behavior, not only match keywords.

18. Example Detection: Suspicious Repeated Denied Connections

CustomFirewall_CL
| where TimeGenerated > ago(1h)
| where EventResult_s in~ ("Denied", "Blocked", "Failure")
| summarize
    DenyCount = count(),
    UniqueDestinations = dcount(DstIpAddr_s),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by SrcIpAddr_s, bin(TimeGenerated, 15m)
| where DenyCount > 50 or UniqueDestinations > 20
| project
    TimeGenerated,
    SrcIpAddr = SrcIpAddr_s,
    DenyCount,
    UniqueDestinations,
    FirstSeen,
    LastSeen

This detection is simple, but it shows the principle.

The custom log is no longer just stored.

It is being converted into behavior-based security signal.

19. Example Detection: Suspicious Proxy Access Pattern

CustomProxy_CL
| where TimeGenerated > ago(24h)
| where Url_s has_any ("pastebin", "anonfiles", "mega", "telegram", "discord")
| summarize
    RequestCount = count(),
    UniqueUrls = dcount(Url_s),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by User_s, SrcIpAddr_s
| where RequestCount >= 10
| project
    User = User_s,
    SrcIpAddr = SrcIpAddr_s,
    RequestCount,
    UniqueUrls,
    FirstSeen,
    LastSeen

The value here depends on ingestion quality.

If user, source IP, URL, and timestamp fields are not parsed correctly, the detection becomes weak.

20. Entity Mapping in Sentinel

Entity mapping is critical.

A detection should not only return rows.

It should identify investigation anchors.

Useful Sentinel entities include:

Account
Host
IP address
URL
File
Process
Cloud application
Azure resource
Mailbox
DNS domain

For custom logs, entity mapping requires field discipline.

If a custom table does not consistently expose user, host, IP, URL, or resource fields, Sentinel incidents become harder to investigate.

Good ingestion engineering makes entity mapping easier.

Bad ingestion engineering pushes complexity onto analysts.

21. Custom Alert Details

Custom alert details help analysts understand why an alert fired.

For custom log detections, alert details should include:

Source product
Event type
User
Host
Source IP
Destination IP
URL or domain
Action
Detection reason
Rule name
Severity
Count or threshold
First seen time
Last seen time
Raw event reference

This gives the analyst context before they open the full query results.

The alert should explain itself.

22. Hunting with Custom Logs

Not every custom log use case should become an analytics rule immediately.

Some data should first support hunting.

Hunting is useful when:

The behavior is exploratory
The source is newly onboarded
Baselines are not known
Noise is still being understood
The SOC is validating field quality
The detection threshold is not mature
Analysts are researching adversary behavior

Custom logs can support hunts such as:

Rare destination access
New admin activity
Abnormal SaaS downloads
New external domains
Suspicious user-agent strings
Unusual authentication failures
Denied connection spikes
OT device behavior changes
Privileged account activity
Suspicious API calls

Hunting helps turn raw telemetry into tested detection logic.

23. Analytics Rules from Custom Logs

Once a hunting query becomes reliable, it can be promoted into an analytics rule.

Before promotion, the SOC should confirm:

The table is stable
The schema is reliable
The fields are consistently populated
The KQL is accurate
The detection is actionable
False positives are understood
Severity logic is defined
Entity mapping is configured
Alert details are useful
Incident grouping is appropriate
A response path exists

This is the difference between a query and an engineered detection.

24. Sentinel Workbooks for Custom Log Visibility

Workbooks help prove whether custom logs are operationally useful.

A custom ingestion workbook should show:

Data volume by source
Events by event type
Events by severity
Ingestion health
Parsing failures
Missing key fields
Top users
Top hosts
Top source IPs
Top destination IPs
Top URLs or domains
Detection coverage
Rule activity
Hunting usage
Entity mapping completeness

The SOC should be able to see whether a custom log source is healthy, useful, and contributing to security outcomes.

25. SOC Optimization for Custom Logs

Custom ingestion should feed SOC optimization.

A custom source should be evaluated by:

Detection value
Investigation value
Hunting value
Coverage value
Cost efficiency
Analyst usability
Entity mapping quality
Normalization quality
Alert fidelity
Response usefulness

A high-volume source that produces no detection value should be reviewed.

A low-volume source that closes a critical visibility gap may be extremely valuable.

SOC optimization is not about collecting everything.

It is about collecting and engineering the right things.

26. Detection-Grade Custom Log Checklist

A custom log source is detection-grade when it satisfies the following checklist:

Requirement	Question
Source clarity	Do we know what system produced the event?
Timestamp quality	Is TimeGenerated accurate and reliable?
Schema quality	Are important fields parsed into dedicated columns?
Entity support	Can users, hosts, IPs, URLs, files, or resources be mapped?
Detection value	Can the log support analytics rules?
Hunting value	Can the log support threat hunting?
Normalization	Can the log align to a common schema or parser?
Noise control	Can irrelevant data be filtered or reduced?
Security value	Does this log close a coverage gap?
Operational value	Can analysts use the data quickly?
Workbook visibility	Can the SOC monitor health and usage?
Response mapping	Does the log support incident response?

If the answer is no across most of these areas, the ingestion pipeline needs more engineering.

27. Common Mistakes in Custom Log Ingestion

SOC teams should avoid these mistakes:

Ingesting logs without a detection use case
Creating custom tables with unclear schemas
Keeping critical values trapped inside raw messages
Ignoring timestamp quality
Failing to normalize field names
Not mapping entities in Sentinel rules
Writing KQL that depends on inconsistent fields
Not validating ingestion latency
Not testing transformations
Not documenting source ownership
Not tracking parsing failures
Not building workbooks for source visibility
Treating data volume as success
Ignoring cost impact
Failing to connect logs to response playbooks

The main mistake is treating ingestion as the finish line.

Ingestion is only the beginning.

28. Recommended Engineering Workflow

A mature SOC should onboard custom logs through a structured workflow.

Step 1: Define the security objective

Identify why the source matters.

Examples:

Detect firewall deny spikes
Hunt suspicious proxy access
Monitor SaaS admin actions
Detect OT device anomalies
Track IAM privilege changes
Identify API abuse

Step 2: Identify required fields

Define the minimum fields needed for detection and investigation.

Examples:

Timestamp
User
Host
Source IP
Destination IP
URL
Action
Result
Event type
Severity
Raw message

Step 3: Choose ingestion method

Select the correct ingestion path.

Examples:

AMA custom logs
Syslog
CEF
Logs Ingestion API
Data Collection Endpoint
Built-in connector

Step 4: Design the table

Create a table schema that supports KQL and entity mapping.

Step 5: Build the DCR

Define stream declarations, destination, transformation logic, and output stream.

Step 6: Transform the data

Parse, filter, enrich, mask, and shape the incoming event.

Step 7: Build parsers

Create reusable parser functions where needed.

Step 8: Normalize

Align fields to common schemas or ASIM-style conventions where possible.

Step 9: Build hunts

Use hunting queries to validate value and reduce noise.

Step 10: Promote to analytics rules

Convert reliable hunting logic into scheduled analytics rules.

Step 11: Map entities

Map account, host, IP, URL, file, and resource fields into Sentinel incidents.

Step 12: Build workbook visibility

Create dashboards for source health, data quality, and detection contribution.

Step 13: Optimize continuously

Tune rules, transformations, schemas, and parsers based on analyst feedback and SOC outcomes.

29. R.A.H.S.I. Framework™ Analysis

From the R.A.H.S.I. Framework™ perspective, Threat-Forged Sentinel represents a shift in SOC maturity.

A basic SOC asks:

Did we ingest the log?

A mature SOC asks:

Can this log detect adversary behavior, support investigation, and improve coverage?

That is the difference between raw telemetry and detection-grade intelligence.

A custom log pipeline should be judged by:

Whether it improves visibility
Whether it supports KQL detections
Whether it maps to useful entities
Whether it helps analysts investigate faster
Whether it improves hunting
Whether it closes a coverage gap
Whether it reduces uncertainty during response
Whether it can be measured in workbooks
Whether it supports SOC optimization

The strongest SOCs will not be the ones that ingest the most data.

They will be the ones that engineer the most useful signal.

30. Key Design Principles

1. Start with the detection objective

Do not ingest a source only because it exists.

Ingest it because it supports a security outcome.

2. Design the schema for investigation

Tables should support how analysts search, pivot, and respond.

3. Use DCRs as engineering controls

Treat Data Collection Rules as the control plane for shaping data.

4. Transform early

Parse, filter, enrich, and shape data before analysts and detections depend on it.

5. Normalize for reuse

Use ASIM-style normalization and parsers to make detections scalable.

6. Map entities clearly

A detection should identify the user, host, IP, URL, file, or resource involved.

7. Promote hunts into rules carefully

Not every query should become an alert.

Only reliable, actionable, tested logic should become a production analytics rule.

8. Measure detection value

Custom ingestion should be measured by security usefulness, not only data volume.

Threat-Forged Sentinel is the discipline of turning non-native logs into detection-grade intelligence.

It is not enough to collect firewall, proxy, SaaS, appliance, OT, IAM, or custom application logs.

Those logs must be shaped, normalized, parsed, mapped, tested, hunted, visualized, and connected to response.

In Microsoft Sentinel, this means using the full engineering chain:

Azure Monitor Agent
Syslog and CEF
Logs Ingestion API
Data Collection Endpoints
Data Collection Rules
Transformations
Custom tables
ASIM-style normalization
KQL detections
Entity mapping
Analytics rules
Hunting queries
Workbooks
SOC optimization

The goal is not more data.

The goal is better signal.

A log is not intelligence because it exists.

A log becomes intelligence when it helps the SOC detect, understand, and respond to adversary behavior.

Custom log ingestion is now a detection engineering discipline.