merge
Merges flows which were split across multiple records due to active timeout.
usage: python3 -m flow_models.merge [-h] [-i {csv_flow,pipe,nfcapd,binary}]
[-o {csv_flow,binary,append,extend,none}]
[-O OUTPUT] [--skip-in SKIP_IN]
[--count-in COUNT_IN]
[--skip-out SKIP_OUT]
[--count-out COUNT_OUT]
[--filter-expr FILTER_EXPR]
[-I INACTIVE_TIMEOUT] [-A ACTIVE_TIMEOUT]
in_files [in_files ...]
Positional Arguments
- in_files
input files or directories
Named Arguments
- -i, --in-format
Possible choices: csv_flow, pipe, nfcapd, binary
format of input files
Default:
'nfcapd'- -o, --out-format
Possible choices: csv_flow, binary, append, extend, none
format of output
Default:
'csv_flow'- -O, --output
file or directory for output
Default:
'-'- --skip-in
number of flows to skip at the beginning of input
Default:
0- --count-in
limit for number of flows to read from input
- --skip-out
number of flows to skip after filtering
Default:
0- --count-out
limit for number of flows to output after filtering
- --filter-expr
expression of filter
- -I, --inactive-timeout
inactive timeout in seconds
Default:
15.0- -A, --active-timeout
active timeout in seconds
Default:
300.0
This tool can be used to merge flow records which were split during the collection into multiple records due to active timeout.
User should specify active and inactive timeout values which were used during the records collection to correctly merge flow records.
To filter flow records, the filter expressions should be specified. Filter expression should use the Python syntax. Bitwise (&, |, ~) operators should be used instead logical ones (and, or, not). The following fields are available:
af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs
Skipping of flow records can be done with skip_in, count_in, skip_out, count_out parameters. They specify how many flow records should be skipped (skip_in) and then read (count_in) from input and to be skipped (skip_out) and written (count_out) after filtering.
Example: (merges flows from the cleaned directory and writes output to the merged directory)
flow_models.merge -i nfcapd -o binary -I 15 -A 300 -O merged cleaned
In all hardware and many software exporters, long-lasting flows may become split due to active timeout and reported as multiple flow records. Such flow records have to be found and merged back in order to obtain accurate flow length, size or duration values. The merge tool available in our framework can be used for that purpose. Additionally, it filters out erroneously split records. The tool processes all flow records sequentially and performs all calculations using only integers to ensure precision and reproducibility. This is possible thanks to Python’s unlimited width integer support.
The tool takes flow records in any supported format as an input and outputs merged flow records in binary or CSV format. Each merged flow record contains aggs field, which tells how many flow records were merged back into that particular aggregate flow record. A user should specify both active and inactive timeouts used in the collection process when calling the command to ensure the correctness of merge operation.