Mixed-format logs are hard to query when collection treats each line as one raw string. A single record can include a timestamp, delimiter-separated fields, embedded JSON, key-value text, wrapper markers, and metadata hidden in the file path. Tencent Cloud Log Service (CLS) uses LogListener composite parsing to structure those logs during collection. You define a processor pipeline, run extraction and transformation steps in order, and upload cleaner structured logs to a CLS log topic.
Best fit for LogListener composite parsing
Use composite parsing when one parsing mode is not enough for the log line.
| Scenario | What happens in the raw log | Pipeline operation |
|---|---|---|
| Multiple formats in one line | A delimiter split reveals a field that contains JSON or key-value text | Split first, then parse the nested field |
| Fields need cleanup | Parsed fields include data that should not be uploaded | Drop, rename, or enrich fields before upload |
| A full record needs several stages | Extraction and field transformation must happen together | Chain processors in the LogListener settings |
The practical result is lower query-time cleanup. The log topic receives fields that are already closer to the structure operators need.
Collection-time parsing flow
The workflow is a three-step pipeline:
- Split the original log into segments.
- Apply different processors to the segments that need additional parsing.
- Assemble the final structured fields for upload to CLS.
For example, a log can be split into four sections, with each section processed by a different plugin before the final field set is uploaded.
Useful LogListener processors for mixed logs
The source workflow lists extraction processors and field-processing processors.
| Function | Processor | Use case |
|---|---|---|
| Extract fields | processor_log_string |
Parse advanced single-line logs with multiple characters |
| Extract fields | processor_multiline |
Parse multiline logs with first-line regex |
| Extract fields | processor_multiline_fullregex |
Combine first-line regex with full multiline regex extraction |
| Extract fields | processor_fullregex |
Extract fields from a single-line value with regex |
| Extract fields | processor_json |
Expand a field value as JSON |
| Extract fields | processor_split_delimiter |
Split fields by delimiter characters |
| Extract fields | processor_split_key_value |
Extract key-value pairs |
| Process fields | processor_drop |
Remove selected fields |
| Process fields | processor_timeformat |
Parse a source time field and set the log timestamp |
Dropping unneeded fields before upload
Use processor_drop when a parsed log contains fields that do not need to be reported.
Input:
key1:value1
key2:value2
key3:value3
Goal:
key2:value2
Configuration:
{
"processors": [
{
"type": "processor_drop",
"detail": {
"Sourcekey": ["key1", "key3"]
}
}
]
}
Problem -> operation -> result:
| Problem | Operation | Result |
|---|---|---|
| The log contains fields that are not important for search or analysis | Drop key1 and key3
|
Only key2:value2 is uploaded |
File-path metadata enrichment
Sometimes the log body lacks context that exists in the file path. For example, a path can include application, version, directory, and log name details.
Input log:
value1,value2
File path:
/usr/local/loglistener-2.7.4/testdir/test.log
Goal:
msg1:value1
msg2:value2
__TAG__.app: loglistener
__TAG__.ver: 2.7.4
__TAG__.logname: test
__TAG__.logdir: testdir
Configuration:
{
"processors": [
{
"type": "processor_split_delimiter",
"detail": {
"Delimiter": ",",
"ExtractKeys": ["msg1", "msg2"]
}
},
{
"type": "meta_processor",
"detail": {
"ExtractKeys": ["FILENAME"]
},
"processors": [
{
"type": "processor_fullregex",
"detail": {
"KeepSource": false,
"SourceKey": "FILENAME",
"ExtractRegex": "/\\w+/\\w+/(\\w+)-([^/]+)/(\\w+)/(\\w+).*",
"ExtractKeys": ["app", "ver", "logdir", "logname"]
}
}
]
}
]
}
The metadata processor requires LogListener 2.7.4 or later in this workflow.
Nested parsing for delimiter, timestamp, and key-value fields
Use nested processors when one delimiter split exposes fields that need their own parsing.
Input:
1571394459,http://127.0.0.1/my/course/4|10.135.46.111|200,status:DEAD,
Goal:
time: 1571394459
submsg1: http://127.0.0.1/my/course/4
submsg2: 10.135.46.111
submsg3: 200
status: DEAD
Configuration:
{
"processors": [
{
"type": "processor_split_delimiter",
"detail": {
"Delimiter": ",",
"ExtractKeys": ["time", "msg1", "msg2"]
},
"processors": [
{
"type": "processor_timeformat",
"detail": {
"KeepSource": true,
"TimeFormat": "%s",
"SourceKey": "time"
}
},
{
"type": "processor_split_delimiter",
"detail": {
"KeepSource": false,
"Delimiter": "|",
"SourceKey": "msg1",
"ExtractKeys": ["submsg1", "submsg2", "submsg3"]
}
},
{
"type": "processor_split_key_value",
"detail": {
"KeepSource": false,
"Delimiter": ":",
"SourceKey": "msg2"
}
}
]
}
]
}
This pipeline first splits the line by comma, parses time, splits msg1 by |, and extracts status from msg2.
Wrapper removal and embedded JSON expansion
Consider an access log wrapped with markers:
2016-01-02 12:59:59/log_start/{"remote_ip":"10.135.46.111","body_sent":23,"responsetime":0.232,"upstreamtime":"0.232","upstreamhost":"unix:/tmp/php-cgi.sock","http_host":"127.0.0.1","method":"POST","url":"/event/dispatch","request":"POST /event/dispatch HTTP/1.1","xff":"-","referer":"http://127.0.0.1/my/course/4","agent":"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0","response_code":"200"}/log_end/
Goal:
- Keep the timestamp.
- Remove
log_start,log_end, and the empty tail field. - Expand the JSON object into searchable CLS fields.
Configuration:
{
"processors": [
{
"type": "processor_split_delimiter",
"detail": {
"KeepSource": false,
"Delimiter": "/",
"ExtractKeys": [
"time",
"msg2",
"msg3",
"msg4",
"msg5"
]
},
"processors": [
{
"type": "processor_drop",
"detail": {
"SourceKey": "msg2"
}
},
{
"type": "processor_json",
"detail": {
"KeepSource": false,
"SourceKey": "msg3"
}
},
{
"type": "processor_drop",
"detail": {
"SourceKey": "msg4"
}
},
{
"type": "processor_drop",
"detail": {
"SourceKey": "msg5"
}
}
]
}
]
}
Structured output:
time: 2016/01/02 12:59:59
agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0
body_sent: 23
http_host: 127.0.0.1
method: POST
referer: http://127.0.0.1/my/course/4
remote_ip: 10.135.46.111
request: POST /event/dispatch HTTP/1.1
response_code: 200
responsetime: 0.232
upstreamhost: unix:/tmp/php-cgi.sock
upstreamtime: 0.232
url: /event/dispatch
xff: -
Common pitfalls when building LogListener parsing pipelines
- Parsing JSON before isolating the JSON field.
- Keeping wrapper fields such as
log_startandlog_endeven though they have no search value. - Forgetting
KeepSource: falsewhen the raw field should not remain in the final log body. - Using one parser for a log line that clearly contains several nested formats.
- Using metadata extraction without checking the LogListener version.
FAQ
When should I use processor_json in a LogListener pipeline?
Use processor_json after the JSON content has been isolated into a field. In the wrapped access-log example, the pipeline first splits the record by /, then applies JSON parsing to msg3.
How do I remove fields from logs before they reach CLS?
Use processor_drop and point it to the field that should not be uploaded. This is useful for wrapper markers, empty fields, and values that do not help search or analysis.
Can LogListener parse timestamps during collection?
Yes. Use processor_timeformat with the source time field and the expected time format, such as %s for a Unix timestamp.
How can I add file path context to logs?
Use metadata extraction on FILENAME, then run regex extraction to populate fields such as app, version, log directory, or log name.
Final checklist
- Identify the first split that separates the main log parts.
- Apply nested processors only to the fields that need them.
- Drop wrapper or empty fields before upload.
- Parse time fields with the expected format.
- Expand JSON only after isolating the JSON field.
- Verify the final CLS log record contains searchable fields instead of raw mixed-format text.
Top comments (0)