Complex production logs rarely arrive in one clean format. A single line may start with a timestamp, continue with delimiter-separated fields, embed a JSON object, and end with extra wrapper text. If the collector can only apply one parser, the downstream log platform receives either a lossy structure or a raw string that operators must clean later.
The source article presents Tencent Cloud CLS LogListener composite parsing as a collector-side pipeline for this situation. LogListener can run one or more processors in sequence, so a log line can be split, decoded, filtered, enriched, and reassembled before it is uploaded to CLS.
When composite parsing is useful
Composite parsing is designed for three source-backed scenarios:
| Scenario | What happens in the log line | What the pipeline does |
|---|---|---|
| Multiple parsing modes are needed | One part is delimiter-separated, while another part is JSON or key-value text | Apply different processors to different fields after the first split |
| Some fields need post-processing | Parsed fields include values that should be dropped, renamed, or supplemented | Run processors such as field dropping or metadata extraction |
| Both patterns appear together | A line needs multiple extraction steps and field-level transformation | Chain processors in order inside the LogListener configuration |
The flow is simple: split the original log into segments, process each segment with the right processor, and output only the fields that should become structured log content.
Processor map from the source article
The source screenshot includes the available processors for this workflow:
| Function | Processor | Source-backed use |
|---|---|---|
| Extract fields | processor_log_string |
Multi-character line parsing, usually for advanced single-line logs |
| Extract fields | processor_multiline |
First-line regex parsing for multiline logs |
| Extract fields | processor_multiline_fullregex |
First-line regex plus full multiline regex extraction |
| Extract fields | processor_fullregex |
Regex extraction for a single-line field |
| Extract fields | processor_json |
Expand a field value as JSON |
| Extract fields | processor_split_delimiter |
Split fields by one or more delimiter characters |
| Extract fields | processor_split_key_value |
Extract key-value pairs |
| Process fields | processor_drop |
Drop selected fields |
| Process fields | processor_timeformat |
Parse a source time field, convert the time format, and set the log timestamp |
Pattern 1: drop fields before upload
If a raw log contains three key-value pairs but only key2 is useful, the source article uses processor_drop to remove key1 and key3.
Input:
key1:value1
key2:value2
key3:value3
LogListener configuration:
{
"processors": [
{
"type": "processor_drop",
"detail": {
"Sourcekey": ["key1", "key3"]
}
}
]
}
Output:
key2:value2
This is the cheapest kind of log optimization: reduce payload size and storage cost by removing fields that do not need to be indexed or analyzed.
Pattern 2: enrich logs from metadata
The article also shows a metadata-enrichment case. A log body such as value1,value2 is collected from a file path, and the collector extracts ownership fields from that path. The source notes that meta_processor requires LogListener 2.7.4 or later.
Input:
value1,value2
Path:
/usr/local/loglistener-2.7.4/testdir/test.log
Configuration shape:
{
"processors": [
{
"type": "processor_split_delimiter",
"detail": {
"Delimiter": ",",
"ExtractKeys": ["msg1", "msg2"]
}
},
{
"type": "meta_processor",
"detail": {
"ExtractKeys": ["FILENAME"]
},
"processors": [
{
"type": "processor_fullregex",
"detail": {
"KeepSource": false,
"SourceKey": "FILENAME",
"ExtractRegex": "/\\w+/\\w+/(\\w+)-([^/]+)/(\\w+)/(\\w+).*",
"ExtractKeys": ["app", "ver", "logdir", "logname"]
}
}
]
}
]
}
Output fields shown by the source:
msg1:value1
msg2:value2
__TAG__.app: loglistener
__TAG__.ver: 2.7.4
__TAG__.logname: test
__TAG__.logdir: testdir
Pattern 3: parse nested fields with child processors
The custom parsing example starts with one comma-separated line:
1571394459,http://127.0.0.1/my/course/4|10.135.46.111|200,status:DEAD,
The pipeline first splits the line into time, msg1, and msg2. Child processors then convert the Unix timestamp, split msg1 by |, and parse msg2 as key-value content.
{
"processors": [
{
"type": "processor_split_delimiter",
"detail": {
"Delimiter": ",",
"ExtractKeys": ["time", "msg1", "msg2"]
},
"processors": [
{
"type": "processor_timeformat",
"detail": {
"KeepSource": true,
"TimeFormat": "%s",
"SourceKey": "time"
}
},
{
"type": "processor_split_delimiter",
"detail": {
"KeepSource": false,
"Delimiter": "|",
"SourceKey": "msg1",
"ExtractKeys": ["submsg1", "submsg2", "submsg3"]
}
},
{
"type": "processor_split_key_value",
"detail": {
"KeepSource": false,
"Delimiter": ":",
"SourceKey": "msg2"
}
}
]
}
]
}
Output:
time: 1571394459
submsg1: http://127.0.0.1/my/course/4
submsg2: 10.135.46.111
submsg3: 200
status: DEAD
Pattern 4: unwrap a real access log with JSON inside
The final source example is a slash-wrapped log line:
2016-01-02 12:59:59/log_start/{"remote_ip":"10.135.46.111","body_sent":23,"responsetime":0.232,"upstreamtime":"0.232","upstreamhost":"unix:/tmp/php-cgi.sock","http_host":"127.0.0.1","method":"POST","url":"/event/dispatch","request":"POST /event/dispatch HTTP/1.1","xff":"-","referer":"http://127.0.0.1/my/course/4","agent":"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0","response_code":"200"}/log_end/
Expected behavior:
- Split the log by
/into five segments. - Keep the first segment as
time. - Drop the wrapper fields.
- Expand the JSON segment.
{
"processors": [
{
"type": "processor_split_delimiter",
"detail": {
"KeepSource": false,
"Delimiter": "/",
"ExtractKeys": ["time", "msg2", "msg3", "msg4", "msg5"]
},
"processors": [
{
"type": "processor_drop",
"detail": {
"SourceKey": "msg2"
}
},
{
"type": "processor_json",
"detail": {
"KeepSource": false,
"SourceKey": "msg3"
}
},
{
"type": "processor_drop",
"detail": {
"SourceKey": "msg4"
}
},
{
"type": "processor_drop",
"detail": {
"SourceKey": "msg5"
}
}
]
}
]
}
The final structured fields include time, agent, body_sent, http_host, method, referer, remote_ip, request, response_code, responsetime, upstreamhost, upstreamtime, url, and xff.
Practical checklist
- Use composite parsing when a single parser cannot express the source log format.
- Keep the first split simple, then apply child processors to specific fields.
- Drop wrapper or low-value fields before upload when they are not needed for search or analysis.
- Convert image-only configuration examples into selectable JSON so future operators can copy and review them.
- Keep the pipeline order explicit; LogListener executes processor configuration in sequence.


Top comments (0)