flow_models.lib package

flow_models.lib.data module

avg_data(data, idx, what)

Calculate average data.

Parameters:
  • data (dict[numpy.array]) – container with PDF data

  • idx (numpy.array) – input vector

  • what (str) – flow feature from container

Returns:

avg_points, avg_line

Return type:

(numpy.array, numpy.array)

calc_minmax(idx, *rest)

Calculate minimum and maximum values from many arrays.

Parameters:
Returns:

xmin, xmax, ymin, ymax

Return type:

(float, float, float, float)

load_data(objects)

Load mixtures or histograms into a structured dictionary.

Parameters:

objects (list[os.PathLike]) – list of paths to load

Returns:

container with loaded data

Return type:

dict

normalize_data(org_data, bin_exp=None)

Normalize data binned exponentially to present it on plot.

Parameters:
  • org_data (pandas.DataFrame) – histogram

  • bin_exp (int, optional) – bin width exponent of 2

Returns:

normalized histogram

Return type:

pandas.DataFrame

pdf_from_cdf(data, idx, what)

Calculate a PDF by interpolating CDF.

Parameters:
  • data (dict[numpy.array]) – container with PDF data

  • idx (numpy.array) – input vector

  • what (str) – flow feature from container

Returns:

PDF vector

Return type:

numpy.array

flow_models.lib.io module

class Formatter(prog, indent_increment=2, max_help_position=24, width=None)

Bases: RawDescriptionHelpFormatter, ArgumentDefaultsHelpFormatter

class IOArgumentParser(**kwargs)

Bases: ArgumentParser

find_array_path(path)

Find an exact numpy array path and type.

Parameters:

path (os.PathLike) – array file path

Returns:

name, dtype, path

Return type:

(str, str, pathlib.Path)

load_array_mv(path, mode='r')

Load a numpy array as memoryview.

Parameters:
  • path (os.PathLike) – array file path

  • mode (str, default 'r') – file open mode

load_array_np(path, mode='r')

Load a numpy array as numpy.memmap.

Parameters:
  • path (os.PathLike) – array file path

  • mode (str, default 'r') – file open mode

load_arrays(path, fields, counters, filter_expr, require_numpy=False)

Load all binary flow arrays from a directory.

Parameters:
  • path (os.PathLike) – directory path

  • fields (list[str]) – fields to load

  • counters (dict[str, int], default) –

    skip_inint, default 0

    number of flows to skip at the beginning of input

    count_inint, default None, meaning all flows

    number of flows to read from input

    skip_outint, default 0

    not supported

    count_outint, default None, meaning all flows

    not supported

  • filter_expr (CodeType, optional) – filter expression

  • require_numpy (bool, default False) – require to load arrays as numpy arrays

Returns:

arrays, filtered, size

Return type:

(dict[str, memoryview | numpy.array], numpy.array, int)

prepare_file_list(file_paths)

Prepare files list from file list or directory.

Parameters:

file_paths (list[str | os.PathLike | io.IOBase]) – list of file paths

Returns:

prepared file list

Return type:

list[pathlib.Path | io.IOBase]

read_flow_binary(in_dir, counters=None, filter_expr=None, fields=None)

Read and yield all flows in a directory containing array files.

Parameters:
  • in_dir (os.PathLike) – directory to read from

  • counters (dict[str, int], default) –

    skip_inint, default 0

    number of flows to skip at the beginning of input

    count_inint, default None, meaning all flows

    number of flows to read from input

    skip_outint, default 0

    number of flows to skip after filtering

    count_outint, default None, meaning all flows

    number of flows to output after filtering

  • filter_expr (CodeType, optional) – filter expression

  • fields (list[str], optional) – read only these fields, other can be zeros

Returns:

af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs

Return type:

(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)

read_flow_csv(in_file, counters=None, filter_expr=None, fields=None)

Read and yield all flows in a csv_flow file/stream.

Parameters:
  • in_file (os.PathLike | io.TextIOWrapper) – csv_flow file or stream to read

  • counters (dict[str, int], default) –

    skip_inint, default 0

    number of flows to skip at the beginning of input

    count_inint, default None, meaning all flows

    number of flows to read from input

    skip_outint, default 0

    number of flows to skip after filtering

    count_outint, default None, meaning all flows

    number of flows to output after filtering

  • filter_expr (CodeType, optional) – filter expression

  • fields (list[str], optional) – read only these fields, other can be zeros

Returns:

af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs

Return type:

(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)

read_nfcapd(in_file, counters=None, filter_expr=None, fields=None)

Read and yield all flows in a nfdump nfpcapd file.

This function calls nfdump program to parse nfpcapd file.

Parameters:
  • in_file (os.PathLike) – nfdump nfpcapd file to read

  • counters (dict[str, int], default) –

    skip_inint, default 0

    number of flows to skip at the beginning of input

    count_inint, default None, meaning all flows

    number of flows to read from input

    skip_outint, default 0

    number of flows to skip after filtering

    count_outint, default None, meaning all flows

    number of flows to output after filtering

  • filter_expr (CodeType, optional) – filter expression

  • fields (list[str], optional) – read only these fields, other can be zeros

Returns:

af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs

Return type:

(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)

read_pipe(in_file, counters=None, filter_expr=None, fields=None)

Read and yield all flows in a nfdump pipe file/stream.

This function calls nfdump program to parse nfdump file.

Parameters:
  • in_file (os.PathLike | io.TextIOWrapper) – nfdump pipe file or stream to read

  • counters (dict[str, int], default) –

    skip_inint, default 0

    number of flows to skip at the beginning of input

    count_inint, default None, meaning all flows

    number of flows to read from input

    skip_outint, default 0

    number of flows to skip after filtering

    count_outint, default None, meaning all flows

    number of flows to output after filtering

  • filter_expr (CodeType, optional) – filter expression

  • fields (list[str], optional) – read only these fields, other can be zeros

Returns:

af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs

Return type:

(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)

write_flow_binary(output_dir)

Write flow tuples to binary flow files.

Parameters:

output_dir (os.PathLike) – directory to write to

write_flow_csv(output)

Write flow tuples to the output.

Parameters:

output (os.PathLike | io.TextIOWrapper) – file to write or stream

write_line(output, header_line=None)

Write lines to the output.

Parameters:
  • output (os.PathLike | io.TextIOWrapper) – file to write or stream

  • header_line (str, optional) – header line to write at the beggining of the file

flow_models.lib.mix module

avg(data, x, x_val, what)

Calculate average value of a feature basing on distribution mixtures.

Parameters:
  • data (dict[dict[list[list]]) – container with all distribution mixtures

  • x (numpy.array) – input vector

  • x_val (str) – flow feature being sampled

  • what (str) – average value to calculate

Returns:

vector of average values

Return type:

numpy.array

cdf(mix, x)

Return a CDF from a distribution mixture.

Parameters:
  • mix (dict | list[list]) – distribution mixture

  • x (numpy.array) – input vector

Returns:

CDF vector

Return type:

numpy.array

cdf_comp(mix, x)

Return CDF components from a distribution mixture.

Parameters:
  • mix (dict | list[list]) – distribution mixture

  • x (numpy.array) – input vector

Returns:

dict of named CDF components

Return type:

dict[str, numpy.array]

pdf(mix, x, x_val)

Return a PDF from a distribution mixture.

Parameters:
  • mix (dict | list[list]) – distribution mixture

  • x (numpy.array) – input vector

  • x_val (str) – flow feature being sampled

Returns:

PDF vector

Return type:

numpy.array

pdf_comp(mix, x, x_val)

Return PDF components from a distribution mixture.

Parameters:
  • mix (dict | list[list]) – distribution mixture

  • x (numpy.array) – input vector

  • x_val (str) – flow feature being sampled

Returns:

dict of named PDF components

Return type:

dict[str, numpy.array]

rvs(mix, x_val, size=1, random_state=None)

Return a sample from a distribution mixture.

Parameters:
  • mix (dict | list[list]) – distribution mixture

  • x_val (str) – flow feature being sampled

  • size (int, default 1) – number of elements in sample

  • random_state (object, optional) – random state to use

Returns:

sample

Return type:

numpy.array

flow_models.lib.ml module

calculate_reduction(octets, octets_predicted, thresholds=None)

Calculate flow reduction curve.

Parameters:
  • octets (numpy.array) – real flow sizes (number of octets/bytes)

  • octets_predicted (numpy.array) – predicted flow sizes (number of octets/bytes)

  • thresholds (int | numpy.array, optional) – flow size thresholds to calculate upon or their number, default [2..2^24]

Returns:

[threshold, traffic_coverage, flow_table_reduction] array containing flow table size reduction and traffic coverage obtained for each flow size threshold

Return type:

np.array[3]

calculate_reduction_from_mixture(path)

Calculate flow reduction curve from distribution mixture JSON.

Parameters:

path (os.PathLike) – path to a directory with JSON mixture

Returns:

[threshold, flow_table_reduction, traffic_coverage] array containing flow table size reduction and traffic coverage obtained for each flow size threshold

Return type:

np.array[3]

interp_reduction(x, traffic_coverage, flow_table_reduction)

Interpolate flow reduction curve for given traffic coverages.

Parameters:
  • x (numpy.array) – traffic coverage point to interpolate upon

  • traffic_coverage (numpy.array) – traffic coverages corresponding to the input flow table size reductions

  • flow_table_reduction (numpy.array) – input flow table size reductions

Returns:

x, flow_table_reduction_for_x

Return type:

(numpy.array, numpy.array)

load_arrays(directory)

Load 5-tuple and flow sizes arrays from a directory.

Parameters:

directory (os.PathLike) – direcotry containing binary flow records

Returns:

sa, da, sp, dp, prot, oc

Return type:

(np.array, np.array, np.array, np.array, np.array, np.array)

make_slice(data, skip=0, count=None)

Make slice of 5-tuple and flow size arrays.

Parameters:
  • data ((np.array, np.array, np.array, np.array, np.array, np.array)) – input data: (sa, da, sp, dp, prot, oc)

  • skip (int, default 0) – number of flows to skip at the beggining

  • count (int, optional) – number of flows to use after skipping

Returns:

sa, da, sp, dp, prot, oc

Return type:

(np.array, np.array, np.array, np.array, np.array, np.array)

prepare_decision(oc, coverage)

Simulate mice/elephant decision to obtain a desired traffic coverage.

Parameters:
  • oc (numpy.array) – flow sizes (number of octets/bytes)

  • coverage (float) – desired traffic coverage

Returns:

classification decision (0 for mice, 1 for elephants)

Return type:

np.array[bool]

prepare_input(data, octets=False, bits=False)

Prepare input features array and target (flow sizes) array.

Parameters:
  • data ((np.array, np.array, np.array, np.array, np.array)) – input data: (sa, da, sp, dp, prot)

  • octets (bool, default False) – split IP addresses to separate 1-byte octets

  • bits (bool, default False) – split all input fields to separate bits

Returns:

input features array

Return type:

np.array[5]

score_reduction(octets, octets_predicted)

Calculate average flow table size reduction for 80% traffic coverage.

Parameters:
  • octets (numpy.array) – real flow sizes (number of octets/bytes)

  • octets_predicted (numpy.array) – predicted flow sizes (number of octets/bytes)

Returns:

average flow table size reduction for 80% traffic coverage

Return type:

float

top_idx(octets, ratio, seed=None)

Get indices of the largest flows.

Parameters:
  • octets (numpy.array) – flow sizes (number of octets/bytes)

  • ratio (float) – percentage of largest flows

  • seed (int, default None) – seed for random generator

Returns:

indices of the largest flows

Return type:

np.array[int]

flow_models.lib.util module

bin_calc_log(x, b)

Calculate logarithmic bin size.

Parameters:
  • x (int) – value

  • b (int) – bin width exponent of 2

Returns:

bin_lo, bin_hi

Return type:

(int, int)

bin_calc_one(x, _)

Calculate bin size of 1 width.

Parameters:
  • x (int) – value

  • _ (int) – not used

Returns:

bin_lo, bin_hi

Return type:

(int, int)

measure_memory(on=False)

Measure and print minimum, average and maximum memory usage.

Parameters:

on (default False) – run measurement