sort
Sorts flow records according to specified key fields (requires numpy).
usage: python3 -m flow_models.sort [-h] [-i {binary}] [-o {binary}]
[-O OUTPUT] [--skip-in SKIP_IN]
[--count-in COUNT_IN] [-k [KEY_FIELDS ...]]
[-I INDEX_FILE] [--reverse]
[--measure-memory]
in_files [in_files ...]
Positional Arguments
- in_files
input files or directories
Named Arguments
- -i, --in-format
Possible choices: binary
format of input files
Default:
'binary'- -o, --out-format
Possible choices: binary
format of output
Default:
'binary'- -O, --output
directory for output
Default:
'.'- --skip-in
number of flows to skip at the beginning of input
Default:
0- --count-in
limit for number of flows to read from input
- -k, --key-fields
ordered key fields names
- -I, --index-file
index file
- --reverse
reverse order
Default:
False- --measure-memory
collect and print memory statistics
Default:
False
This tool can be used to sort flow records in binary format.
Sorting in being done according to specified key fields. Key fields should be specified in an order, for example ‘-k first first_ms’ means that records are sorted according to the first second value, and next records with the same second value are sorted according to the millisecond.
By default records are sorted in an ascending order. To get the descending order, use –reverse parameter.
User can specify directory for output with the -O parameter. When the output directory is the same as the input directory, sorting will be done in-place and overwrite input files.
Sorting of flow records can be done with skip_in and count_in parameters. They specify how many flow records should be skipped (skip_in) and then read (count_in) from input.
To filter flow records, the filter expressions should be specified. Filter expression should use the Python syntax. Bitwise (&, |, ~) operators should be used instead logical ones (and, or, not). The following fields are available:
af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs
Example: (sorts flows in the merged directory and saves the output to the sorted directory)
flow_models.sort -i -k first first_ms -O sorted merged
During the merging process, flow records may become reordered. This applies especially to long flows, which in some circumstances may stay cached until the end of merging process. Such flows are dumped at the end of output files. The purpose of sort tool is to reorder flow records in a file according to specified keys, usually flow start or flow end times. This step is unnecessary when further operations will be performed on the whole file. However, in a case when only a part of a record file will be used, sorting is necessary.