flow_models package

flow_models.anonymize module

Anonymizes IP addresses in IPv4 flows using Crypto-PAn algorithm.

anonymize(in_files, output, in_format='nfcapd', out_format='csv_flow', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, key='')

Anonymizes IP addresses in IPv4 flows using Crypto-PAn algorithm.

Parameters:
  • in_files (list[pathlib.Path]) – input files paths

  • output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream

  • in_format (str, default 'nfcapd') – input format

  • out_format (str, default 'csv_flow') – output format

  • skip_in (int, default 0) – number of flows to skip at the beginning of input

  • count_in (int, default None, meaning all flows) – number of flows to read from input

  • skip_out (int, default 0) – number of flows to skip after filtering

  • count_out (int, default None, meaning all flows) – number of flows to output after filtering

  • filter_expr (CodeType, optional) – filter expression

  • key (str) – encryption key (32 bytes)

flow_models.convert module

Converts flow records between supported formats. Can be used for filtering and cutting flow record files.

convert(in_files, output, in_format='nfcapd', out_format='csv_flow', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None)

Convert flow records between supported formats. Can also be used for filtering and cutting flow record files.

Parameters:
  • in_files (list[pathlib.Path]) – input files paths

  • output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream

  • in_format (str, default 'nfcapd') – input format

  • out_format (str, default 'csv_flow') – output format

  • skip_in (int, default 0) – number of flows to skip at the beginning of input

  • count_in (int, default None, meaning all flows) – number of flows to read from input

  • skip_out (int, default 0) – number of flows to skip after filtering

  • count_out (int, default None, meaning all flows) – number of flows to output after filtering

  • filter_expr (CodeType, optional) – filter expression

flow_models.cut module

Cuts binary flow records with dd.

cut(in_files, output, in_format='binary', out_format='binary', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None)

Cut binary flow records with dd.

Parameters:
  • in_files (list[pathlib.Path]) – input files paths

  • output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream

  • in_format (str, default 'binary') – input format

  • out_format (str, default 'binary') – output format

  • skip_in (int, default 0) – number of flows to skip at the beginning of input

  • count_in (int, default None, meaning all flows) – number of flows to read from input

  • skip_out (int, default 0) – number of flows to skip after filtering

  • count_out (int, default None, meaning all flows) – number of flows to output after filtering

  • filter_expr (CodeType, optional) – not supported

flow_models.fit module

Creates General Mixture Models (GMM) fitted to flow records (requires scipy).

fit(in_file, y_value, max_iter=100, initial=None, max_pareto_w=None, cb=None)

Fit distribution mixture to flow histogram.

Parameters:
  • in_file (os.PathLike) – input histogram file

  • y_value (str) – y axis value

  • max_iter (int, default 100) – maximum number of iterations

  • initial (dict, optional) – initial mixture

  • max_pareto_w (float, optional) – maximum pareto weight

  • cb (function, optional) – callback function to call after each iteration

Returns:

{‘mix’: result_mix, ‘sum’: np.sum(weights)}

Return type:

dict

flow_models.generate module

Generates flow records from histograms or mixture models.

generate(in_file, output, out_format='csv_flow', size=1, x_value='length', random_state=None)

Generate flows from mixture or histogram file to output.

Parameters:
  • in_file (os.PathLike) – csv_hist file or mixture director

  • output (os.PathLike | io.TextIOWrapper) – file or directory for output

  • out_format (str, default 'csv_flow') – output format

  • size (int, default 1) – number of flows to generate

  • x_value (str, default 'length') – x axis value

  • random_state (object, optional) – initial random state

generate_flows(in_file, size=1, x_value='length', random_state=None)

Yield flow tuples generated from mixture or histogram file.

Parameters:
  • in_file (os.PathLike) – csv_hist file or mixture director

  • size (int, default 1) – number of flows to generate

  • x_value (str, default 'length') – x axis value

  • random_state (object, optional) – initial random state

Yields:

(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int) – af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs

flow_models.hist module

Calculates histograms of flows length, size, duration or rate.

hist(in_files, output, in_format='nfcapd', out_format='csv_hist', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, bin_exp=0, x_value='length', additional_columns=None)

Calculate histograms of flows length, size, duration or rate.

Parameters:
  • in_files (list[os.PathLike]) – input files paths

  • output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream

  • in_format (str, default 'nfcapd') – input format

  • out_format (str, default 'csv_hist') – output format

  • skip_in (int, default 0) – number of flows to skip at the beginning of input

  • count_in (int, default None, meaning all flows) – number of flows to read from input

  • skip_out (int, default 0) – number of flows to skip after filtering

  • count_out (int, default None, meaning all flows) – number of flows to output after filtering

  • filter_expr (CodeType, optional) – filter expression

  • bin_exp (int, default 0) – bin width exponent of 2

  • x_value (str, default 'length') – x axis value

  • additional_columns (list[str], optional) – additional column to sum

flow_models.hist_np module

Calculates histograms using multiple threads (requires numpy, much faster, but uses more memory).

hist(in_files, output, in_format='binary', out_format='csv_hist', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, bin_exp=0, x_value='length', additional_columns=None)

Calculate histograms of flows length, size, duration or rate.

Parameters:
  • in_files (list[os.PathLike]) – input files paths

  • output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream

  • in_format (str, default 'binary') – input format

  • out_format (str, default 'csv_hist') – output format

  • skip_in (int, default 0) – number of flows to skip at the beginning of input

  • count_in (int, default None, meaning all flows) – number of flows to read from input

  • skip_out (int, default 0) – not supported

  • count_out (int, default None, meaning all flows) – not supported

  • filter_expr (CodeType, optional) – filter expression

  • bin_exp (int, default 0) – bin width exponent of 2

  • x_value (str, default 'length') – x axis value

  • additional_columns (list[str], optional) – additional column to sum

flow_models.merge module

Merges flows which were split across multiple records due to active timeout.

merge(in_files, output, in_format='nfcapd', out_format='csv_flow', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, inactive_timeout=15.0, active_timeout=300.0)

Merge flows split due to timeout.

Parameters:
  • in_files (list[os.PathLike]) – input files paths

  • output (os.PathLike | io.TextIOWrapper) – directory path

  • in_format (str, default 'nfcapd') – input format

  • out_format (str, default 'csv_flow') – output format

  • skip_in (int, default 0) – number of flows to skip at the beginning of input

  • count_in (int, default None, meaning all flows) – number of flows to read from input

  • skip_out (int, default 0) – number of flows to skip after filtering

  • count_out (int, default None, meaning all flows) – number of flows to output after filtering

  • filter_expr (CodeType, optional) – filter expression

  • inactive_timeout (float, default 15.0) – inactive timeout in seconds

  • active_timeout (float, default 300.0) – active timeout in seconds

flow_models.plot module

Generates plots from flow records and fitted models (requires pandas and scipy).

flow_models.series module

Generates packets and octets time series from flow records.

series(in_files, output, in_format='nfcapd', out_format='csv_series', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None)

Generate packets and octets time series from flow records.

Parameters:
  • in_files (list[pathlib.Path]) – input files paths

  • output (os.PathLike) – directory path

  • in_format (str, default 'nfcapd') – input format

  • out_format (str, default 'csv_series') – output format

  • skip_in (int, default 0) – number of flows to skip at the beginning of input

  • count_in (int, default None, meaning all flows) – number of flows to read from input

  • skip_out (int, default 0) – number of flows to skip after filtering

  • count_out (int, default None, meaning all flows) – number of flows to output after filtering

  • filter_expr (CodeType, optional) – filter expression

flow_models.series_plot module

flow_models.sort module

Sorts flow records according to specified key fields (requires numpy).

create_index(path, key_fields, index_file, counters=None, reverse=False)

Create index array, optionally saving it to file.

Parameters:
  • path (os.PathLike) – path of a directory with key files

  • key_fields (list[str]) – ordered list of key fields

  • index_file (str, optional) – index file path

  • counters (dict[str, int], default) –

    skip_inint, default 0

    number of flows to skip at the beginning of input

    count_inint, default None, meaning all flows

    number of flows to read from input

    skip_outint, default 0

    not supported

    count_outint, default None, meaning all flows

    not supported

  • reverse (bool, default False) – reverse order

Returns:

index array

Return type:

numpy.array

sort(in_files, output, key_fields, in_format='binary', out_format='binary', index_file=None, skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, reverse=False)

Sorts flow records according to specified key fields.

Parameters:
  • in_files (list[str]) – input files paths

  • output (os.PathLike | None) – output directory path

  • key_fields (list[str]) – ordered list of key fields

  • in_format (str, default 'binary') – input format

  • out_format (str, default 'binary') – output format

  • index_file (str, optional) – index file path

  • skip_in (int, default 0) – number of flows to skip at the beginning of input

  • count_in (int, default None, meaning all flows) – number of flows to read from input

  • skip_out (int, default 0) – not supported

  • count_out (int, default None, meaning all flows) – not supported

  • filter_expr (CodeType, optional) – not supported

  • reverse (bool, default False) – reverse order

sort_array(input_file, output_dir, index_array, counters=None)

Sorts flow records according to an index array.

Parameters:
  • input_file (os.PathLike) – input files path

  • output_dir (os.PathLike) – output directory path

  • index_array (object) – index array

  • counters (dict[str, int], default) –

    skip_inint, default 0

    number of flows to skip at the beginning of input

    count_inint, default None, meaning all flows

    number of flows to read from input

    skip_outint, default 0

    not supported

    count_outint, default None, meaning all flows

    not supported

flow_models.summary module

Produces TeX tables containing summary statistics of flow dataset (requires scipy).