Extracting Flow-Level Network Features from PCAPs with Tranalyzer2

#networksecurity #trafficanalysis #machinelearning #encryptedtraffic

Why Flow-Level Feature Extraction Matters

Flow-level representation is a fundamental abstraction in modern network traffic analysis. Instead of operating on individual packets, flows summarize communication behavior between endpoints over time, enabling scalable analysis even for large PCAP datasets. Effective flow feature extraction is therefore a critical prerequisite for downstream tasks such as traffic characterization, anomaly detection, and machine learning–based modeling.

Among the available open-source tools, Tranalyzer2 stands out as a comprehensive framework that combines flow generation, deep protocol awareness, and rich statistical feature extraction within a single pipeline.

Why Tranalyzer2?

Tranalyzer2 is designed specifically for high-performance flow-based traffic analysis. Unlike tools that either focus only on packet inspection or provide minimal NetFlow-style statistics, Tranalyzer2 offers:

Native flow construction from PCAPs

Extensive protocol awareness (L2–L7 via plugins)

Rich statistical, temporal, and behavioral features

Modular plugin-based architecture

Structured outputs suitable for direct analytical use

Its ability to extract hundreds of flow-level attributes in a single pass significantly reduces preprocessing overhead and simplifies large-scale traffic analysis workflows.

Feature Categories Extracted by Tranalyzer2

Tranalyzer2 enables extraction of a wide spectrum of flow features covering multiple network dimensions. In this configuration, the extracted attributes span multiple categories, including but not limited to:

General flow attributes
Flow direction, duration, packet counts, byte counts, and inter-arrival metrics
Statistical flow features
Minimum, maximum, average, variance, skewness, and kurtosis of packet sizes and inter-arrival times
Connection and state features
Flow state indicators, connection patterns, and bidirectional statistics
Transport-layer features
TCP flags, window sizes, retransmission indicators, and sequence behavior
Security-relevant protocol features
TLS/SSL handshake metadata, cipher information, version indicators, and fingerprints
Entropy and payload-derived metrics
Entropy ratios and payload distribution statistics useful for encrypted traffic characterization
Advanced timing and distribution features
Packet timing dispersion, burstiness, and flow-level behavioral signatures

Extracting Flow-Level Features Using Tranalyzer2

Tranalyzer2 follows a plugin-driven architecture, where flow-level features are generated by selectively enabling plugins. Each plugin contributes a specific category of features, such as basic flow statistics, transport-layer behavior, protocol metadata, or entropy-based metrics. As a result, effective feature extraction begins with careful plugin selection and configuration.

Step 1: Enable Required Tranalyzer2 Plugins

Before processing any PCAP files, the required plugins must be activated based on the desired feature categories. This typically includes plugins responsible for:

Core flow generation and statistical summaries
Transport-layer behavior and connection dynamics
Security- and protocol-related metadata (e.g., TLS-related attributes)
Entropy and payload-derived metrics
Output sinks for structured data storage

In this workflow, the mysqlSink plugin is enabled to store extracted flow records directly into a MySQL database. This approach enables scalable storage, schema-level control, and flexible downstream data export.

After selecting the required plugins, Tranalyzer2 is rebuilt to ensure that the enabled components are included in the processing pipeline.

Step 2: Process PCAP Files and Generate Flow Records

Once the required plugins are enabled and Tranalyzer2 is rebuilt, PCAP files can be processed using its command-line interface. Each PCAP is handled independently to preserve flow integrity and ensure consistent feature extraction across captures.

As a good practice, separate directories are created for input data and extracted results to keep the workflow organized:

mkdir ~/data ~/results

PCAP files placed in the data directory are then processed using the t2 command:

t2 -r ~/data/sample_traffic.pcap -w ~/results/

During this step:

Packets are aggregated into bidirectional flows

Plugin-specific flow features are computed in real time

Flow records are written directly into the MySQL database via the mysqlSink plugin

This process converts raw packet-level traffic into structured, flow-level representations enriched with statistical, temporal, and protocol-aware attributes. By leveraging Tranalyzer2’s plugin architecture and database integration, feature extraction remains both scalable and reproducible.

During extraction, certain statistical attributes—such as high-precision timing or higher-order moments—may require adjustments to the MySQL database schema. This typically involves increasing numeric precision for duration-related fields and modifying columns associated with skewness or kurtosis values to ensure compatibility and prevent insertion errors. These schema-level corrections guarantee that all flow features are stored accurately without any data loss.

Step 3: Export Flow-Level Features to CSV

Once the flow records are successfully stored in MySQL, you can export them into CSV format for further analysis. First, log in to MySQL and check the flow table to ensure all features are present.

Instead of manually specifying all columns, you can export all flow-level features using SELECT *. Here’s a generalized approach:

# Export all flow records to a CSV file
mysql -u mysql -p -D tranalyzer -e "
SELECT * 
FROM flow
" > ~/path/to/output.csv

This will generate a CSV file containing all flow-level features from the flow table.

With the flow-level features exported to CSV, your data is now structured and ready for analysis, visualization, or machine learning pipelines. This approach maintains feature integrity and allows you to scale your workflow efficiently. Using Tranalyzer2 in combination with MySQL makes traffic analysis modular, reproducible, and easy to integrate into downstream projects.

For more details and tutorials, check out Tranalyzer2 Tutorials.