sort

Sorts flow records according to specified key fields (requires numpy).

usage: python3 -m flow_models.sort [-h] [-i {binary}] [-o {binary}]
                                   [-O OUTPUT] [--skip-in SKIP_IN]
                                   [--count-in COUNT_IN] [-k [KEY_FIELDS ...]]
                                   [-I INDEX_FILE] [--reverse]
                                   [--measure-memory]
                                   in_files [in_files ...]

Positional Arguments

in_files: input files or directories

Named Arguments

-i, --in-format

Possible choices: binary

format of input files

Default: 'binary'

-o, --out-format

Possible choices: binary

format of output

Default: 'binary'

-O, --output

directory for output

Default: '.'

--skip-in

number of flows to skip at the beginning of input

Default: 0

--count-in

limit for number of flows to read from input

-k, --key-fields

ordered key fields names

-I, --index-file

index file

--reverse

reverse order

Default: False

--measure-memory

collect and print memory statistics

Default: False

This tool can be used to sort flow records in binary format.

Sorting in being done according to specified key fields. Key fields should be specified in an order, for example ‘-k first first_ms’ means that records are sorted according to the first second value, and next records with the same second value are sorted according to the millisecond.

By default records are sorted in an ascending order. To get the descending order, use –reverse parameter.

User can specify directory for output with the -O parameter. When the output directory is the same as the input directory, sorting will be done in-place and overwrite input files.

Sorting of flow records can be done with skip_in and count_in parameters. They specify how many flow records should be skipped (skip_in) and then read (count_in) from input.

To filter flow records, the filter expressions should be specified. Filter expression should use the Python syntax. Bitwise (&, |, ~) operators should be used instead logical ones (and, or, not). The following fields are available:

af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs

Example: (sorts flows in the merged directory and saves the output to the sorted directory)

flow_models.sort -i -k first first_ms -O sorted merged

During the merging process, flow records may become reordered. This applies especially to long flows, which in some circumstances may stay cached until the end of merging process. Such flows are dumped at the end of output files. The purpose of sort tool is to reorder flow records in a file according to specified keys, usually flow start or flow end times. This step is unnecessary when further operations will be performed on the whole file. However, in a case when only a part of a record file will be used, sorting is necessary.