flow_models.lib package

flow_models.lib.data module

avg_data(data, idx, what)

Calculate average data.

Parameters:

data (dict[numpy.array]) – container with PDF data
idx (numpy.array) – input vector
what (str) – flow feature from container

Returns:

avg_points, avg_line

Return type:

(numpy.array, numpy.array)

calc_minmax(idx, *rest)

Calculate minimum and maximum values from many arrays.

Parameters:

idx (numpy.array | pandas.Series | pandas.DataFrame) – container
*rest (list[numpy.array | pandas.Series | pandas.DataFrame]) – rest of containers

Returns:

xmin, xmax, ymin, ymax

Return type:

(float, float, float, float)

load_data(objects)

Load mixtures or histograms into a structured dictionary.

Parameters:: objects (list[os.PathLike]) – list of paths to load
Returns:: container with loaded data
Return type:: dict

normalize_data(org_data, bin_exp=None)

Normalize data binned exponentially to present it on plot.

Parameters:

org_data (pandas.DataFrame) – histogram
bin_exp (int, optional) – bin width exponent of 2

Returns:

normalized histogram

Return type:

pandas.DataFrame

pdf_from_cdf(data, idx, what)

Calculate a PDF by interpolating CDF.

Parameters:

data (dict[numpy.array]) – container with PDF data
idx (numpy.array) – input vector
what (str) – flow feature from container

Returns:

PDF vector

Return type:

numpy.array

flow_models.lib.io module

class Formatter(prog, indent_increment=2, max_help_position=24, width=None): Bases: RawDescriptionHelpFormatter, ArgumentDefaultsHelpFormatter

class IOArgumentParser(**kwargs): Bases: ArgumentParser

find_array_path(path)

Find an exact numpy array path and type.

Parameters:: path (os.PathLike) – array file path
Returns:: name, dtype, path
Return type:: (str, str, pathlib.Path)

load_array_mv(path, mode='r')

Load a numpy array as memoryview.

Parameters:

path (os.PathLike) – array file path
mode (str, default 'r') – file open mode

load_array_np(path, mode='r')

Load a numpy array as numpy.memmap.

Parameters:

path (os.PathLike) – array file path
mode (str, default 'r') – file open mode

load_arrays(path, fields, counters, filter_expr, require_numpy=False)

Load all binary flow arrays from a directory.

Parameters:

path (os.PathLike) – directory path
fields (list[str]) – fields to load
counters (dict[str, int], default) –

skip_inint, default 0
number of flows to skip at the beginning of input

count_inint, default None, meaning all flows
number of flows to read from input

skip_outint, default 0
not supported

count_outint, default None, meaning all flows
not supported
filter_expr (CodeType, optional) – filter expression
require_numpy (bool, default False) – require to load arrays as numpy arrays

Returns:

arrays, filtered, size

Return type:

(dict[str, memoryview | numpy.array], numpy.array, int)

prepare_file_list(file_paths)

Prepare files list from file list or directory.

Parameters:: file_paths (list[str | os.PathLike | io.IOBase]) – list of file paths
Returns:: prepared file list
Return type:: list[pathlib.Path | io.IOBase]

read_flow_binary(in_dir, counters=None, filter_expr=None, fields=None)

Read and yield all flows in a directory containing array files.

Parameters:

in_dir (os.PathLike) – directory to read from
counters (dict[str, int], default) –

skip_inint, default 0
number of flows to skip at the beginning of input

count_inint, default None, meaning all flows
number of flows to read from input

skip_outint, default 0
number of flows to skip after filtering

count_outint, default None, meaning all flows
number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
fields (list[str], optional) – read only these fields, other can be zeros

Returns:

af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs

Return type:

(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)

read_flow_csv(in_file, counters=None, filter_expr=None, fields=None)

Read and yield all flows in a csv_flow file/stream.

Parameters:

in_file (os.PathLike | io.TextIOWrapper) – csv_flow file or stream to read
counters (dict[str, int], default) –

skip_inint, default 0
number of flows to skip at the beginning of input

count_inint, default None, meaning all flows
number of flows to read from input

skip_outint, default 0
number of flows to skip after filtering

count_outint, default None, meaning all flows
number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
fields (list[str], optional) – read only these fields, other can be zeros

Returns:

af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs

Return type:

(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)

read_nfcapd(in_file, counters=None, filter_expr=None, fields=None)

Read and yield all flows in a nfdump nfpcapd file.

This function calls nfdump program to parse nfpcapd file.

Parameters:

in_file (os.PathLike) – nfdump nfpcapd file to read
counters (dict[str, int], default) –

skip_inint, default 0
number of flows to skip at the beginning of input

count_inint, default None, meaning all flows
number of flows to read from input

skip_outint, default 0
number of flows to skip after filtering

count_outint, default None, meaning all flows
number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
fields (list[str], optional) – read only these fields, other can be zeros

Returns:

af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs

Return type:

(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)

read_pipe(in_file, counters=None, filter_expr=None, fields=None)

Read and yield all flows in a nfdump pipe file/stream.

This function calls nfdump program to parse nfdump file.

Parameters:

in_file (os.PathLike | io.TextIOWrapper) – nfdump pipe file or stream to read
counters (dict[str, int], default) –

skip_inint, default 0
number of flows to skip at the beginning of input

count_inint, default None, meaning all flows
number of flows to read from input

skip_outint, default 0
number of flows to skip after filtering

count_outint, default None, meaning all flows
number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
fields (list[str], optional) – read only these fields, other can be zeros

Returns:

af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs

Return type:

(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)

write_flow_binary(output_dir)

Write flow tuples to binary flow files.

Parameters:: output_dir (os.PathLike) – directory to write to

write_flow_csv(output)

Write flow tuples to the output.

Parameters:: output (os.PathLike | io.TextIOWrapper) – file to write or stream

write_line(output, header_line=None)

Write lines to the output.

Parameters:

output (os.PathLike | io.TextIOWrapper) – file to write or stream
header_line (str, optional) – header line to write at the beggining of the file

flow_models.lib.mix module

avg(data, x, x_val, what)

Calculate average value of a feature basing on distribution mixtures.

Parameters:

data (dict[dict[list[list]]) – container with all distribution mixtures
x (numpy.array) – input vector
x_val (str) – flow feature being sampled
what (str) – average value to calculate

Returns:

vector of average values

Return type:

numpy.array

cdf(mix, x)

Return a CDF from a distribution mixture.

Parameters:

mix (dict | list[list]) – distribution mixture
x (numpy.array) – input vector

Returns:

CDF vector

Return type:

numpy.array

cdf_comp(mix, x)

Return CDF components from a distribution mixture.

Parameters:

mix (dict | list[list]) – distribution mixture
x (numpy.array) – input vector

Returns:

dict of named CDF components

Return type:

dict[str, numpy.array]

pdf(mix, x, x_val)

Return a PDF from a distribution mixture.

Parameters:

mix (dict | list[list]) – distribution mixture
x (numpy.array) – input vector
x_val (str) – flow feature being sampled

Returns:

PDF vector

Return type:

numpy.array

pdf_comp(mix, x, x_val)

Return PDF components from a distribution mixture.

Parameters:

mix (dict | list[list]) – distribution mixture
x (numpy.array) – input vector
x_val (str) – flow feature being sampled

Returns:

dict of named PDF components

Return type:

dict[str, numpy.array]

rvs(mix, x_val, size=1, random_state=None)

Return a sample from a distribution mixture.

Parameters:

mix (dict | list[list]) – distribution mixture
x_val (str) – flow feature being sampled
size (int, default 1) – number of elements in sample
random_state (object, optional) – random state to use

Returns:

sample

Return type:

numpy.array

flow_models.lib.ml module

calculate_reduction(octets, octets_predicted, thresholds=None)

Calculate flow reduction curve.

Parameters:

octets (numpy.array) – real flow sizes (number of octets/bytes)
octets_predicted (numpy.array) – predicted flow sizes (number of octets/bytes)
thresholds (int | numpy.array, optional) – flow size thresholds to calculate upon or their number, default [2..2^24]

Returns:

[threshold, traffic_coverage, flow_table_reduction] array containing flow table size reduction and traffic coverage obtained for each flow size threshold

Return type:

np.array[3]

calculate_reduction_from_mixture(path)

Calculate flow reduction curve from distribution mixture JSON.

Parameters:: path (os.PathLike) – path to a directory with JSON mixture
Returns:: [threshold, flow_table_reduction, traffic_coverage] array containing flow table size reduction and traffic coverage obtained for each flow size threshold
Return type:: np.array[3]

interp_reduction(x, traffic_coverage, flow_table_reduction)

Interpolate flow reduction curve for given traffic coverages.

Parameters:

x (numpy.array) – traffic coverage point to interpolate upon
traffic_coverage (numpy.array) – traffic coverages corresponding to the input flow table size reductions
flow_table_reduction (numpy.array) – input flow table size reductions

Returns:

x, flow_table_reduction_for_x

Return type:

(numpy.array, numpy.array)

load_arrays(directory)

Load 5-tuple and flow sizes arrays from a directory.

Parameters:: directory (os.PathLike) – direcotry containing binary flow records
Returns:: sa, da, sp, dp, prot, oc
Return type:: (np.array, np.array, np.array, np.array, np.array, np.array)

make_slice(data, skip=0, count=None)

Make slice of 5-tuple and flow size arrays.

Parameters:

data ((np.array, np.array, np.array, np.array, np.array, np.array)) – input data: (sa, da, sp, dp, prot, oc)
skip (int, default 0) – number of flows to skip at the beggining
count (int, optional) – number of flows to use after skipping

Returns:

sa, da, sp, dp, prot, oc

Return type:

(np.array, np.array, np.array, np.array, np.array, np.array)

prepare_decision(oc, coverage)

Simulate mice/elephant decision to obtain a desired traffic coverage.

Parameters:

oc (numpy.array) – flow sizes (number of octets/bytes)
coverage (float) – desired traffic coverage

Returns:

classification decision (0 for mice, 1 for elephants)

Return type:

np.array[bool]

prepare_input(data, octets=False, bits=False)

Prepare input features array and target (flow sizes) array.

Parameters:

data ((np.array, np.array, np.array, np.array, np.array)) – input data: (sa, da, sp, dp, prot)
octets (bool, default False) – split IP addresses to separate 1-byte octets
bits (bool, default False) – split all input fields to separate bits

Returns:

input features array

Return type:

np.array[5]

score_reduction(octets, octets_predicted)

Calculate average flow table size reduction for 80% traffic coverage.

Parameters:

octets (numpy.array) – real flow sizes (number of octets/bytes)
octets_predicted (numpy.array) – predicted flow sizes (number of octets/bytes)

Returns:

average flow table size reduction for 80% traffic coverage

Return type:

float

top_idx(octets, ratio, seed=None)

Get indices of the largest flows.

Parameters:

octets (numpy.array) – flow sizes (number of octets/bytes)
ratio (float) – percentage of largest flows
seed (int, default None) – seed for random generator

Returns:

indices of the largest flows

Return type:

np.array[int]

flow_models.lib.util module

bin_calc_log(x, b)

Calculate logarithmic bin size.

Parameters:

x (int) – value
b (int) – bin width exponent of 2

Returns:

bin_lo, bin_hi

Return type:

(int, int)

bin_calc_one(x, _)

Calculate bin size of 1 width.

Parameters:

x (int) – value
_ (int) – not used

Returns:

bin_lo, bin_hi

Return type:

(int, int)

measure_memory(on=False)

Measure and print minimum, average and maximum memory usage.

Parameters:: on (default False) – run measurement