anonymize

Anonymizes IP addresses in IPv4 flows using Crypto-PAn algorithm.

usage: python3 -m flow_models.anonymize [-h]
                                        [-i {csv_flow,pipe,nfcapd,binary}]
                                        [-o {csv_flow,binary,append,extend,none}]
                                        [-O OUTPUT] [--skip-in SKIP_IN]
                                        [--count-in COUNT_IN]
                                        [--skip-out SKIP_OUT]
                                        [--count-out COUNT_OUT]
                                        [--filter-expr FILTER_EXPR]
                                        [--key KEY]
                                        in_files [in_files ...]

Positional Arguments

in_files

input files or directories

Named Arguments

-i, --in-format

Possible choices: csv_flow, pipe, nfcapd, binary

format of input files

Default: 'nfcapd'

-o, --out-format

Possible choices: csv_flow, binary, append, extend, none

format of output

Default: 'csv_flow'

-O, --output

file or directory for output

Default: '-'

--skip-in

number of flows to skip at the beginning of input

Default: 0

--count-in

limit for number of flows to read from input

--skip-out

number of flows to skip after filtering

Default: 0

--count-out

limit for number of flows to output after filtering

--filter-expr

expression of filter

--key

32 bytes long encryption key

Default: ''

This tool anonymizes IPv4 addresses using Crypto-PAn prefix-preserving algorithm.

It works only for IPv4 flows (af==2). Therefore, after processing by this tool all flows of other address families will be filtered out. Both source (sa3) and destination (da3) IPv4 addresses are anonymized.

To filter flow records, the filter expressions should be specified. Filter expression should use the Python syntax. Bitwise (&, |, ~) operators should be used instead logical ones (and, or, not). The following fields are available:

af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs

Skipping of flow records can be done with skip_in, count_in, skip_out, count_out parameters. They specify how many flow records should be skipped (skip_in) and then read (count_in) from input and to be skipped (skip_out) and written (count_out) after filtering.

Example: (encrypts flow records in binary format and outputs as csv lines to standard output)

flow_models.anonymize -i binary -O - –count-in 1000 –key boojahyoo3vaeToong0Eijee7Ahz3yee sorted