๐ก๏ธ ๐ฅ๐ฒ๐ฎ๐ฑ ๐๐ต๐ฒ ๐ฐ๐ผ๐บ๐ฝ๐น๐ฒ๐๐ฒ ๐ฎ๐ฟ๐๐ถ๐ฐ๐น๐ฒ |
๐ก๏ธ ๐๐ฒ๐โ๐ ๐ฐ๐ผ๐ป๐ป๐ฒ๐ฐ๐ |
Threat-Forged Sentinel | Custom Log Ingestion | Turning Non-Native Logs into Detection-Grade Intelligence | R.A.H.S.I. Frameworkโข Analysis
A SOC Engineering Blueprint for Turning Raw Logs into Detection-Grade Intelligence
Most SOC teams treat custom log ingestion as a data onboarding task.
That is the mistake.
In Microsoft Sentinel, custom log ingestion should not be measured only by whether the data lands in a Log Analytics workspace.
It should be measured by whether the data can support:
- Detection
- Investigation
- Hunting
- Entity mapping
- Analytics rules
- Workbooks
- Incident response
- SOC optimization
- Threat coverage
That is the purpose of Threat-Forged Sentinel.
It is a framework for turning non-native logs into detection-grade intelligence using Microsoft Sentinel, Azure Monitor, Data Collection Rules, transformation logic, custom tables, ASIM-style normalization, KQL analytics, entity mapping, hunting workflows, and SOC visibility.
A log is not valuable because it was ingested.
A log becomes valuable when it helps the SOC detect adversary behavior, explain what happened, map affected entities, and drive response.
1. The Core Problem: Raw Logs Are Not Intelligence
Many organizations ingest logs from firewalls, proxies, SaaS platforms, appliances, OT systems, IAM tools, custom applications, and business platforms.
But ingestion alone does not create security value.
A raw log may contain useful evidence, but if it is poorly structured, inconsistently parsed, missing key fields, or disconnected from detection logic, it becomes difficult for analysts to use.
The result is common:
- Logs exist but are not used in detections
- Tables exist but no one queries them
- Fields exist but are not normalized
- Data is ingested but not mapped to entities
- Analysts cannot quickly understand the event
- Hunting queries are difficult to reuse
- Workbooks cannot visualize the signal
- SOC leaders cannot prove coverage
- Storage cost increases without detection value
This is not detection engineering.
This is log storage.
Threat-Forged Sentinel changes the goal.
The goal is not:
Did we ingest the log?
The goal is:
Can this log detect adversary behavior, support investigation, and improve SOC coverage?
2. What Threat-Forged Sentinel Means
Threat-Forged Sentinel is the discipline of engineering custom log pipelines so that non-native telemetry becomes usable security intelligence.
It connects the full pipeline:
Source Log
โ
Collection Method
โ
Data Collection Endpoint
โ
Data Collection Rule
โ
Transformation Logic
โ
Custom Table
โ
Parser
โ
Normalized Schema
โ
KQL Detection
โ
Entity Mapping
โ
Analytics Rule
โ
Hunting Query
โ
Workbook Visibility
โ
SOC Optimization
This is the shift from log ingestion to detection engineering.
The SOC should not only collect data.
It should forge the data into signal.
3. Microsoft Sentinel and Azure Monitor as the Data Foundation
Microsoft Sentinel uses Azure Monitor and Log Analytics as the data foundation.
Logs ingested into Sentinel are stored in a Log Analytics workspace, where Kusto Query Language, or KQL, is used to query data, detect threats, investigate activity, and build analytics.
For native Microsoft sources, many connectors already provide structured tables and content.
For non-native sources, the SOC must often design the ingestion and transformation path.
That design may include:
- Azure Monitor Agent
- Syslog forwarding
- Common Event Format ingestion
- Custom Logs via Azure Monitor Agent
- Logs Ingestion API
- Data Collection Endpoints
- Data Collection Rules
- Custom Log Analytics tables
- Ingestion-time transformations
- Workspace transformations
- ASIM normalization
- Custom parsers
- Sentinel analytics rules
- Hunting queries
- Workbooks
The engineering decision is not simply how to bring the log in.
The engineering decision is how to make the log useful.
4. Custom Log Ingestion Architecture
A strong custom ingestion architecture starts with a clear understanding of the source.
Before onboarding any custom log source, the SOC should define:
- What system produces the log?
- What security behavior does it represent?
- Is the source authoritative?
- Is the timestamp reliable?
- Which fields are required for detection?
- Which fields identify users, hosts, IPs, URLs, files, or cloud resources?
- Which fields should be transformed or enriched?
- Which fields contain sensitive data?
- Which Sentinel table should store the data?
- Which parser should make the data reusable?
- Which detections or hunts will use the data?
- Which workbook will prove visibility?
If these questions are not answered, custom ingestion becomes a technical exercise without operational value.
5. Collection Methods for Non-Native Logs
Non-native logs can enter Microsoft Sentinel through multiple patterns.
The correct method depends on the source system, format, network path, latency requirements, and operational model.
Common ingestion methods include:
| Collection Method | Best Use Case |
|---|---|
| Azure Monitor Agent | Collecting logs from machines, servers, and supported custom text logs |
| Syslog | Linux and network device logging |
| CEF | Security appliances and products that support Common Event Format |
| Logs Ingestion API | Custom applications, platforms, pipelines, and sources that can send JSON over API |
| Data Collection Endpoint | Ingestion endpoint control, especially where private connectivity is required |
| Data Collection Rule | Routing, transformation, and destination logic |
| Custom Table | Storing unique log formats in Log Analytics |
| Built-in Sentinel Connector | Native or partner-supported ingestion |
| Workspace Transformation | Applying transformation logic to supported tables |
Each method should be selected based on the detection objective, not only the easiest ingestion path.
6. Data Collection Rules as the Control Plane
Data Collection Rules, or DCRs, are central to custom log ingestion.
A DCR defines how data is collected, transformed, and sent to a destination such as a Log Analytics workspace.
In a Threat-Forged Sentinel model, the DCR becomes the ingestion control plane.
It can define:
- Input stream structure
- Destination workspace
- Target table
- Transformation logic
- Output stream
- Data routing
- Filtering
- Field shaping
- Schema alignment
This matters because raw source data often does not match the target table schema.
The DCR transformation can reshape incoming data before it lands.
That means the SOC can engineer data quality at ingestion time instead of forcing every analyst or rule to handle messy fields later.
7. Data Collection Endpoints
A Data Collection Endpoint, or DCE, provides an endpoint for data ingestion.
In many custom ingestion designs, especially with private connectivity requirements, a DCE can be part of the architecture.
A DCE can support scenarios where data needs a controlled ingestion endpoint before being processed by the DCR.
The relationship can be understood like this:
Source System
โ
Data Collection Endpoint
โ
Data Collection Rule
โ
Transformation
โ
Log Analytics Table
โ
Microsoft Sentinel
The DCE is not the detection layer.
It is part of the ingestion path.
The DCR and transformation logic are where the source data begins to become detection-ready.
8. Logs Ingestion API for Custom Sources
The Logs Ingestion API is important when a source can send data through REST API calls or client libraries.
This is useful for:
- Custom applications
- SaaS platforms
- Internal security tools
- Business applications
- Custom detection pipelines
- Middleware
- Enrichment systems
- Non-standard telemetry sources
The source sends JSON-formatted data to Azure Monitor.
The DCR defines how that data is interpreted and where it is stored.
This provides flexibility because the incoming source format does not always need to match the final table format. Transformation logic can reshape the event into the destination schema.
A typical Logs Ingestion API flow looks like this:
Custom Source
โ
JSON Payload
โ
Logs Ingestion API
โ
DCR Stream Declaration
โ
Transform KQL
โ
Custom Table
โ
Sentinel KQL Detection
This pattern is powerful because it gives the SOC control over the structure, destination, and detection readiness of custom telemetry.
9. Custom Tables in Log Analytics
Custom tables are used when the source data does not fit an existing standard table.
A custom table should not be created casually.
It should be designed around how the SOC will query, detect, hunt, and investigate.
A useful custom table should have:
- Clear naming
- Reliable timestamp field
- Source identifier
- Event type
- User field
- Host field
- IP address fields
- URL or domain fields
- Action field
- Result field
- Severity or risk field
- Raw message field when needed
- Parsed fields for detection
- Consistent data types
- Minimal unnecessary columns
A poor custom table becomes a dumping ground.
A well-designed custom table becomes a detection asset.
10. Recommended Custom Table Design
Below is a practical custom table design model for a non-native security source.
| Column | Type | Purpose |
|---|---|---|
| TimeGenerated | datetime | Event timestamp used by Sentinel and KQL |
| SourceVendor | string | Vendor or platform name |
| SourceProduct | string | Product or service name |
| EventType | string | Type of security event |
| EventResult | string | Success, failure, blocked, allowed, detected |
| EventSeverity | string | Source severity or mapped severity |
| User | string | User identity |
| SrcIpAddr | string | Source IP address |
| DstIpAddr | string | Destination IP address |
| DstHostname | string | Destination host |
| Url | string | URL involved in event |
| Domain | string | Domain involved in event |
| FileName | string | File involved in event |
| Action | string | Action taken by the source system |
| RuleName | string | Source rule or policy name |
| ThreatName | string | Threat or signature name |
| RawMessage | string | Original event payload or message |
| AdditionalFields | dynamic | Flexible extra metadata |
This structure helps detection engineers write reusable KQL.
It also makes the data easier to normalize and map to Sentinel entities.
11. Ingestion-Time Transformations
Ingestion-time transformations allow data to be shaped before it is stored.
This can include:
- Filtering irrelevant records
- Removing unnecessary columns
- Parsing raw fields
- Creating calculated fields
- Renaming fields
- Normalizing values
- Masking sensitive data
- Routing events to the correct table
- Enriching logs with additional context
This is important because detection quality often depends on data quality.
For example, a raw log may contain the source IP inside a long message string.
A transformation can extract that value into a dedicated field such as SrcIpAddr.
That one engineering decision can make the data more useful for analytics rules, hunting queries, entity mapping, and workbooks.
12. Example Transformation Logic
A simplified transformation might reshape incoming custom source data into a cleaner schema.
source
| extend EventTime = todatetime(timestamp)
| extend SrcIpAddr = tostring(src_ip)
| extend DstIpAddr = tostring(dst_ip)
| extend User = tostring(username)
| extend EventType = tostring(event_type)
| extend EventResult = tostring(result)
| extend Action = tostring(action)
| project
TimeGenerated = EventTime,
SrcIpAddr,
DstIpAddr,
User,
EventType,
EventResult,
Action,
RawMessage = tostring(raw_message)
This is not just formatting.
This is detection preparation.
The transformation creates fields that KQL detections and entity mapping can use consistently.
13. Filtering Noise at Ingestion
Not every event deserves to be stored in the same way.
Some events are useful for detection.
Some are useful only for audit.
Some are repetitive noise.
Some contain sensitive data that should be masked or removed.
Ingestion-time filtering can help reduce unnecessary data volume and improve signal quality.
Examples include:
- Dropping known health-check events
- Removing duplicate heartbeat logs
- Filtering low-value debug events
- Removing sensitive fields
- Keeping only security-relevant event types
- Routing high-value events to analytics-ready tables
- Sending low-value events to lower-cost storage where appropriate
The objective is not blind data reduction.
The objective is security-focused data shaping.
14. Normalization and ASIM
Normalization is where custom logs become reusable across the SOC.
Microsoft Sentinel supports the Advanced Security Information Model, commonly known as ASIM, to help normalize different source types into common schemas.
This matters because every vendor has its own field names.
One firewall may use src_ip.
Another may use sourceAddress.
Another may use client_ip.
Without normalization, every detection must be rewritten for every source.
With normalization, multiple sources can support common detection and hunting logic.
Normalization helps convert vendor-specific telemetry into a common analytical language.
15. Why ASIM-Style Normalization Matters
ASIM-style normalization helps the SOC:
- Reduce vendor-specific query logic
- Create reusable detections
- Improve hunting consistency
- Make workbooks easier to build
- Improve analyst experience
- Compare events across products
- Support cross-source correlation
- Build scalable detection content
For example, normalized network session data can support hunting across firewall, proxy, VPN, and network appliance logs.
Normalized authentication data can support identity-focused detection across Entra ID, VPN, SaaS, and third-party IAM platforms.
The more consistent the schema, the more reusable the detection logic.
16. Parser Engineering
Parsers convert source-specific data into reusable views.
A parser can:
- Rename source fields
- Convert data types
- Extract values from raw messages
- Map vendor fields to normalized names
- Add calculated fields
- Standardize event results
- Normalize user, IP, URL, and host fields
- Hide source complexity from analysts
A good parser allows analysts and rules to query a clean function instead of raw table complexity.
Example concept:
CustomFirewall_CL
| extend SrcIpAddr = tostring(SourceIP_s)
| extend DstIpAddr = tostring(DestinationIP_s)
| extend DstPortNumber = toint(DestinationPort_d)
| extend EventResult = iff(Action_s == "allow", "Success", "Failure")
| project
TimeGenerated,
SrcIpAddr,
DstIpAddr,
DstPortNumber,
EventResult,
Action_s,
RuleName_s
The parser makes the data usable.
The detection logic becomes cleaner.
The analyst experience improves.
17. From Custom Logs to KQL Detections
A custom log source becomes valuable when it supports reliable KQL detection logic.
A detection-grade custom log should support queries such as:
- Suspicious authentication failures
- Impossible travel from non-native IAM logs
- Proxy access to suspicious domains
- Firewall deny spikes
- Data exfiltration indicators
- Rare destination access
- Privileged user activity
- Admin policy changes
- Malware detections from security appliances
- OT device anomalies
- SaaS mass download behavior
- API abuse patterns
- Suspicious user-agent activity
- New external destination patterns
The query should model behavior, not only match keywords.
18. Example Detection: Suspicious Repeated Denied Connections
CustomFirewall_CL
| where TimeGenerated > ago(1h)
| where EventResult_s in~ ("Denied", "Blocked", "Failure")
| summarize
DenyCount = count(),
UniqueDestinations = dcount(DstIpAddr_s),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated)
by SrcIpAddr_s, bin(TimeGenerated, 15m)
| where DenyCount > 50 or UniqueDestinations > 20
| project
TimeGenerated,
SrcIpAddr = SrcIpAddr_s,
DenyCount,
UniqueDestinations,
FirstSeen,
LastSeen
This detection is simple, but it shows the principle.
The custom log is no longer just stored.
It is being converted into behavior-based security signal.
19. Example Detection: Suspicious Proxy Access Pattern
CustomProxy_CL
| where TimeGenerated > ago(24h)
| where Url_s has_any ("pastebin", "anonfiles", "mega", "telegram", "discord")
| summarize
RequestCount = count(),
UniqueUrls = dcount(Url_s),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated)
by User_s, SrcIpAddr_s
| where RequestCount >= 10
| project
User = User_s,
SrcIpAddr = SrcIpAddr_s,
RequestCount,
UniqueUrls,
FirstSeen,
LastSeen
The value here depends on ingestion quality.
If user, source IP, URL, and timestamp fields are not parsed correctly, the detection becomes weak.
20. Entity Mapping in Sentinel
Entity mapping is critical.
A detection should not only return rows.
It should identify investigation anchors.
Useful Sentinel entities include:
- Account
- Host
- IP address
- URL
- File
- Process
- Cloud application
- Azure resource
- Mailbox
- DNS domain
For custom logs, entity mapping requires field discipline.
If a custom table does not consistently expose user, host, IP, URL, or resource fields, Sentinel incidents become harder to investigate.
Good ingestion engineering makes entity mapping easier.
Bad ingestion engineering pushes complexity onto analysts.
21. Custom Alert Details
Custom alert details help analysts understand why an alert fired.
For custom log detections, alert details should include:
- Source product
- Event type
- User
- Host
- Source IP
- Destination IP
- URL or domain
- Action
- Detection reason
- Rule name
- Severity
- Count or threshold
- First seen time
- Last seen time
- Raw event reference
This gives the analyst context before they open the full query results.
The alert should explain itself.
22. Hunting with Custom Logs
Not every custom log use case should become an analytics rule immediately.
Some data should first support hunting.
Hunting is useful when:
- The behavior is exploratory
- The source is newly onboarded
- Baselines are not known
- Noise is still being understood
- The SOC is validating field quality
- The detection threshold is not mature
- Analysts are researching adversary behavior
Custom logs can support hunts such as:
- Rare destination access
- New admin activity
- Abnormal SaaS downloads
- New external domains
- Suspicious user-agent strings
- Unusual authentication failures
- Denied connection spikes
- OT device behavior changes
- Privileged account activity
- Suspicious API calls
Hunting helps turn raw telemetry into tested detection logic.
23. Analytics Rules from Custom Logs
Once a hunting query becomes reliable, it can be promoted into an analytics rule.
Before promotion, the SOC should confirm:
- The table is stable
- The schema is reliable
- The fields are consistently populated
- The KQL is accurate
- The detection is actionable
- False positives are understood
- Severity logic is defined
- Entity mapping is configured
- Alert details are useful
- Incident grouping is appropriate
- A response path exists
This is the difference between a query and an engineered detection.
24. Sentinel Workbooks for Custom Log Visibility
Workbooks help prove whether custom logs are operationally useful.
A custom ingestion workbook should show:
- Data volume by source
- Events by event type
- Events by severity
- Ingestion health
- Parsing failures
- Missing key fields
- Top users
- Top hosts
- Top source IPs
- Top destination IPs
- Top URLs or domains
- Detection coverage
- Rule activity
- Hunting usage
- Entity mapping completeness
The SOC should be able to see whether a custom log source is healthy, useful, and contributing to security outcomes.
25. SOC Optimization for Custom Logs
Custom ingestion should feed SOC optimization.
A custom source should be evaluated by:
- Detection value
- Investigation value
- Hunting value
- Coverage value
- Cost efficiency
- Analyst usability
- Entity mapping quality
- Normalization quality
- Alert fidelity
- Response usefulness
A high-volume source that produces no detection value should be reviewed.
A low-volume source that closes a critical visibility gap may be extremely valuable.
SOC optimization is not about collecting everything.
It is about collecting and engineering the right things.
26. Detection-Grade Custom Log Checklist
A custom log source is detection-grade when it satisfies the following checklist:
| Requirement | Question |
|---|---|
| Source clarity | Do we know what system produced the event? |
| Timestamp quality | Is TimeGenerated accurate and reliable? |
| Schema quality | Are important fields parsed into dedicated columns? |
| Entity support | Can users, hosts, IPs, URLs, files, or resources be mapped? |
| Detection value | Can the log support analytics rules? |
| Hunting value | Can the log support threat hunting? |
| Normalization | Can the log align to a common schema or parser? |
| Noise control | Can irrelevant data be filtered or reduced? |
| Security value | Does this log close a coverage gap? |
| Operational value | Can analysts use the data quickly? |
| Workbook visibility | Can the SOC monitor health and usage? |
| Response mapping | Does the log support incident response? |
If the answer is no across most of these areas, the ingestion pipeline needs more engineering.
27. Common Mistakes in Custom Log Ingestion
SOC teams should avoid these mistakes:
- Ingesting logs without a detection use case
- Creating custom tables with unclear schemas
- Keeping critical values trapped inside raw messages
- Ignoring timestamp quality
- Failing to normalize field names
- Not mapping entities in Sentinel rules
- Writing KQL that depends on inconsistent fields
- Not validating ingestion latency
- Not testing transformations
- Not documenting source ownership
- Not tracking parsing failures
- Not building workbooks for source visibility
- Treating data volume as success
- Ignoring cost impact
- Failing to connect logs to response playbooks
The main mistake is treating ingestion as the finish line.
Ingestion is only the beginning.
28. Recommended Engineering Workflow
A mature SOC should onboard custom logs through a structured workflow.
Step 1: Define the security objective
Identify why the source matters.
Examples:
- Detect firewall deny spikes
- Hunt suspicious proxy access
- Monitor SaaS admin actions
- Detect OT device anomalies
- Track IAM privilege changes
- Identify API abuse
Step 2: Identify required fields
Define the minimum fields needed for detection and investigation.
Examples:
- Timestamp
- User
- Host
- Source IP
- Destination IP
- URL
- Action
- Result
- Event type
- Severity
- Raw message
Step 3: Choose ingestion method
Select the correct ingestion path.
Examples:
- AMA custom logs
- Syslog
- CEF
- Logs Ingestion API
- Data Collection Endpoint
- Built-in connector
Step 4: Design the table
Create a table schema that supports KQL and entity mapping.
Step 5: Build the DCR
Define stream declarations, destination, transformation logic, and output stream.
Step 6: Transform the data
Parse, filter, enrich, mask, and shape the incoming event.
Step 7: Build parsers
Create reusable parser functions where needed.
Step 8: Normalize
Align fields to common schemas or ASIM-style conventions where possible.
Step 9: Build hunts
Use hunting queries to validate value and reduce noise.
Step 10: Promote to analytics rules
Convert reliable hunting logic into scheduled analytics rules.
Step 11: Map entities
Map account, host, IP, URL, file, and resource fields into Sentinel incidents.
Step 12: Build workbook visibility
Create dashboards for source health, data quality, and detection contribution.
Step 13: Optimize continuously
Tune rules, transformations, schemas, and parsers based on analyst feedback and SOC outcomes.
29. R.A.H.S.I. Frameworkโข Analysis
From the R.A.H.S.I. Frameworkโข perspective, Threat-Forged Sentinel represents a shift in SOC maturity.
A basic SOC asks:
Did we ingest the log?
A mature SOC asks:
Can this log detect adversary behavior, support investigation, and improve coverage?
That is the difference between raw telemetry and detection-grade intelligence.
A custom log pipeline should be judged by:
- Whether it improves visibility
- Whether it supports KQL detections
- Whether it maps to useful entities
- Whether it helps analysts investigate faster
- Whether it improves hunting
- Whether it closes a coverage gap
- Whether it reduces uncertainty during response
- Whether it can be measured in workbooks
- Whether it supports SOC optimization
The strongest SOCs will not be the ones that ingest the most data.
They will be the ones that engineer the most useful signal.
30. Key Design Principles
1. Start with the detection objective
Do not ingest a source only because it exists.
Ingest it because it supports a security outcome.
2. Design the schema for investigation
Tables should support how analysts search, pivot, and respond.
3. Use DCRs as engineering controls
Treat Data Collection Rules as the control plane for shaping data.
4. Transform early
Parse, filter, enrich, and shape data before analysts and detections depend on it.
5. Normalize for reuse
Use ASIM-style normalization and parsers to make detections scalable.
6. Map entities clearly
A detection should identify the user, host, IP, URL, file, or resource involved.
7. Promote hunts into rules carefully
Not every query should become an alert.
Only reliable, actionable, tested logic should become a production analytics rule.
8. Measure detection value
Custom ingestion should be measured by security usefulness, not only data volume.
Threat-Forged Sentinel is the discipline of turning non-native logs into detection-grade intelligence.
It is not enough to collect firewall, proxy, SaaS, appliance, OT, IAM, or custom application logs.
Those logs must be shaped, normalized, parsed, mapped, tested, hunted, visualized, and connected to response.
In Microsoft Sentinel, this means using the full engineering chain:
- Azure Monitor Agent
- Syslog and CEF
- Logs Ingestion API
- Data Collection Endpoints
- Data Collection Rules
- Transformations
- Custom tables
- ASIM-style normalization
- KQL detections
- Entity mapping
- Analytics rules
- Hunting queries
- Workbooks
- SOC optimization
The goal is not more data.
The goal is better signal.
A log is not intelligence because it exists.
A log becomes intelligence when it helps the SOC detect, understand, and respond to adversary behavior.
Custom log ingestion is now a detection engineering discipline.

aakashrahsi.online
Top comments (0)