flow_models package

flow_models.anonymize module

Anonymizes IP addresses in IPv4 flows using Crypto-PAn algorithm.

anonymize(in_files, output, in_format='nfcapd', out_format='csv_flow', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, key='')

Anonymizes IP addresses in IPv4 flows using Crypto-PAn algorithm.

Parameters:

in_files (list[pathlib.Path]) – input files paths
output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream
in_format (str, default 'nfcapd') – input format
out_format (str, default 'csv_flow') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – number of flows to skip after filtering
count_out (int, default None, meaning all flows) – number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
key (str) – encryption key (32 bytes)

flow_models.convert module

Converts flow records between supported formats. Can be used for filtering and cutting flow record files.

convert(in_files, output, in_format='nfcapd', out_format='csv_flow', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None)

Convert flow records between supported formats. Can also be used for filtering and cutting flow record files.

Parameters:

in_files (list[pathlib.Path]) – input files paths
output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream
in_format (str, default 'nfcapd') – input format
out_format (str, default 'csv_flow') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – number of flows to skip after filtering
count_out (int, default None, meaning all flows) – number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression

flow_models.cut module

Cuts binary flow records with dd.

cut(in_files, output, in_format='binary', out_format='binary', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None)

Cut binary flow records with dd.

Parameters:

in_files (list[pathlib.Path]) – input files paths
output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream
in_format (str, default 'binary') – input format
out_format (str, default 'binary') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – number of flows to skip after filtering
count_out (int, default None, meaning all flows) – number of flows to output after filtering
filter_expr (CodeType, optional) – not supported

flow_models.fit module

Creates General Mixture Models (GMM) fitted to flow records (requires scipy).

fit(in_file, y_value, max_iter=100, initial=None, max_pareto_w=None, cb=None)

Fit distribution mixture to flow histogram.

Parameters:

in_file (os.PathLike) – input histogram file
y_value (str) – y axis value
max_iter (int, default 100) – maximum number of iterations
initial (dict, optional) – initial mixture
max_pareto_w (float, optional) – maximum pareto weight
cb (function, optional) – callback function to call after each iteration

Returns:

{‘mix’: result_mix, ‘sum’: np.sum(weights)}

Return type:

dict

flow_models.generate module

Generates flow records from histograms or mixture models.

generate(in_file, output, out_format='csv_flow', size=1, x_value='length', random_state=None)

Generate flows from mixture or histogram file to output.

Parameters:

in_file (os.PathLike) – csv_hist file or mixture director
output (os.PathLike | io.TextIOWrapper) – file or directory for output
out_format (str, default 'csv_flow') – output format
size (int, default 1) – number of flows to generate
x_value (str, default 'length') – x axis value
random_state (object, optional) – initial random state

generate_flows(in_file, size=1, x_value='length', random_state=None)

Yield flow tuples generated from mixture or histogram file.

Parameters:

in_file (os.PathLike) – csv_hist file or mixture director
size (int, default 1) – number of flows to generate
x_value (str, default 'length') – x axis value
random_state (object, optional) – initial random state

Yields:

(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int) – af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs

flow_models.hist module

Calculates histograms of flows length, size, duration or rate.

hist(in_files, output, in_format='nfcapd', out_format='csv_hist', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, bin_exp=0, x_value='length', additional_columns=None)

Calculate histograms of flows length, size, duration or rate.

Parameters:

in_files (list[os.PathLike]) – input files paths
output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream
in_format (str, default 'nfcapd') – input format
out_format (str, default 'csv_hist') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – number of flows to skip after filtering
count_out (int, default None, meaning all flows) – number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
bin_exp (int, default 0) – bin width exponent of 2
x_value (str, default 'length') – x axis value
additional_columns (list[str], optional) – additional column to sum

flow_models.hist_np module

Calculates histograms using multiple threads (requires numpy, much faster, but uses more memory).

hist(in_files, output, in_format='binary', out_format='csv_hist', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, bin_exp=0, x_value='length', additional_columns=None)

Calculate histograms of flows length, size, duration or rate.

Parameters:

in_files (list[os.PathLike]) – input files paths
output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream
in_format (str, default 'binary') – input format
out_format (str, default 'csv_hist') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – not supported
count_out (int, default None, meaning all flows) – not supported
filter_expr (CodeType, optional) – filter expression
bin_exp (int, default 0) – bin width exponent of 2
x_value (str, default 'length') – x axis value
additional_columns (list[str], optional) – additional column to sum

flow_models.merge module

Merges flows which were split across multiple records due to active timeout.

merge(in_files, output, in_format='nfcapd', out_format='csv_flow', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, inactive_timeout=15.0, active_timeout=300.0)

Merge flows split due to timeout.

Parameters:

in_files (list[os.PathLike]) – input files paths
output (os.PathLike | io.TextIOWrapper) – directory path
in_format (str, default 'nfcapd') – input format
out_format (str, default 'csv_flow') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – number of flows to skip after filtering
count_out (int, default None, meaning all flows) – number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
inactive_timeout (float, default 15.0) – inactive timeout in seconds
active_timeout (float, default 300.0) – active timeout in seconds

flow_models.plot module

Generates plots from flow records and fitted models (requires pandas and scipy).

flow_models.series module

Generates packets and octets time series from flow records.

series(in_files, output, in_format='nfcapd', out_format='csv_series', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None)

Generate packets and octets time series from flow records.

Parameters:

in_files (list[pathlib.Path]) – input files paths
output (os.PathLike) – directory path
in_format (str, default 'nfcapd') – input format
out_format (str, default 'csv_series') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – number of flows to skip after filtering
count_out (int, default None, meaning all flows) – number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression

flow_models.series_plot module

flow_models.sort module

Sorts flow records according to specified key fields (requires numpy).

create_index(path, key_fields, index_file, counters=None, reverse=False)

Create index array, optionally saving it to file.

Parameters:

path (os.PathLike) – path of a directory with key files
key_fields (list[str]) – ordered list of key fields
index_file (str, optional) – index file path
counters (dict[str, int], default) –

skip_inint, default 0
number of flows to skip at the beginning of input

count_inint, default None, meaning all flows
number of flows to read from input

skip_outint, default 0
not supported

count_outint, default None, meaning all flows
not supported
reverse (bool, default False) – reverse order

Returns:

index array

Return type:

numpy.array

sort(in_files, output, key_fields, in_format='binary', out_format='binary', index_file=None, skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, reverse=False)

Sorts flow records according to specified key fields.

Parameters:

in_files (list[str]) – input files paths
output (os.PathLike | None) – output directory path
key_fields (list[str]) – ordered list of key fields
in_format (str, default 'binary') – input format
out_format (str, default 'binary') – output format
index_file (str, optional) – index file path
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – not supported
count_out (int, default None, meaning all flows) – not supported
filter_expr (CodeType, optional) – not supported
reverse (bool, default False) – reverse order

sort_array(input_file, output_dir, index_array, counters=None)

Sorts flow records according to an index array.

Parameters:

input_file (os.PathLike) – input files path
output_dir (os.PathLike) – output directory path
index_array (object) – index array
counters (dict[str, int], default) –

skip_inint, default 0
number of flows to skip at the beginning of input

count_inint, default None, meaning all flows
number of flows to read from input

skip_outint, default 0
not supported

count_outint, default None, meaning all flows
not supported

flow_models.summary module

Produces TeX tables containing summary statistics of flow dataset (requires scipy).