flow_models.lib package
flow_models.lib.data module
- avg_data(data, idx, what)
Calculate average data.
- calc_minmax(idx, *rest)
Calculate minimum and maximum values from many arrays.
- Parameters:
idx (numpy.array | pandas.Series | pandas.DataFrame) – container
*rest (list[numpy.array | pandas.Series | pandas.DataFrame]) – rest of containers
- Returns:
xmin, xmax, ymin, ymax
- Return type:
- load_data(objects)
Load mixtures or histograms into a structured dictionary.
- Parameters:
objects (list[os.PathLike]) – list of paths to load
- Returns:
container with loaded data
- Return type:
- normalize_data(org_data, bin_exp=None)
Normalize data binned exponentially to present it on plot.
- Parameters:
org_data (pandas.DataFrame) – histogram
bin_exp (int, optional) – bin width exponent of 2
- Returns:
normalized histogram
- Return type:
flow_models.lib.io module
- class Formatter(prog, indent_increment=2, max_help_position=24, width=None)
Bases:
RawDescriptionHelpFormatter,ArgumentDefaultsHelpFormatter
- class IOArgumentParser(**kwargs)
Bases:
ArgumentParser
- find_array_path(path)
Find an exact numpy array path and type.
- Parameters:
path (os.PathLike) – array file path
- Returns:
name, dtype, path
- Return type:
(str, str, pathlib.Path)
- load_array_mv(path, mode='r')
Load a numpy array as memoryview.
- Parameters:
path (os.PathLike) – array file path
mode (str, default 'r') – file open mode
- load_array_np(path, mode='r')
Load a numpy array as numpy.memmap.
- Parameters:
path (os.PathLike) – array file path
mode (str, default 'r') – file open mode
- load_arrays(path, fields, counters, filter_expr, require_numpy=False)
Load all binary flow arrays from a directory.
- Parameters:
path (os.PathLike) – directory path
counters (dict[str, int], default) –
- skip_inint, default 0
number of flows to skip at the beginning of input
- count_inint, default None, meaning all flows
number of flows to read from input
- skip_outint, default 0
not supported
- count_outint, default None, meaning all flows
not supported
filter_expr (CodeType, optional) – filter expression
require_numpy (bool, default False) – require to load arrays as numpy arrays
- Returns:
arrays, filtered, size
- Return type:
(dict[str, memoryview | numpy.array], numpy.array, int)
- prepare_file_list(file_paths)
Prepare files list from file list or directory.
- Parameters:
file_paths (list[str | os.PathLike | io.IOBase]) – list of file paths
- Returns:
prepared file list
- Return type:
- read_flow_binary(in_dir, counters=None, filter_expr=None, fields=None)
Read and yield all flows in a directory containing array files.
- Parameters:
in_dir (os.PathLike) – directory to read from
counters (dict[str, int], default) –
- skip_inint, default 0
number of flows to skip at the beginning of input
- count_inint, default None, meaning all flows
number of flows to read from input
- skip_outint, default 0
number of flows to skip after filtering
- count_outint, default None, meaning all flows
number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
fields (list[str], optional) – read only these fields, other can be zeros
- Returns:
af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs
- Return type:
(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)
- read_flow_csv(in_file, counters=None, filter_expr=None, fields=None)
Read and yield all flows in a csv_flow file/stream.
- Parameters:
in_file (os.PathLike | io.TextIOWrapper) – csv_flow file or stream to read
counters (dict[str, int], default) –
- skip_inint, default 0
number of flows to skip at the beginning of input
- count_inint, default None, meaning all flows
number of flows to read from input
- skip_outint, default 0
number of flows to skip after filtering
- count_outint, default None, meaning all flows
number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
fields (list[str], optional) – read only these fields, other can be zeros
- Returns:
af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs
- Return type:
(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)
- read_nfcapd(in_file, counters=None, filter_expr=None, fields=None)
Read and yield all flows in a nfdump nfpcapd file.
This function calls nfdump program to parse nfpcapd file.
- Parameters:
in_file (os.PathLike) – nfdump nfpcapd file to read
counters (dict[str, int], default) –
- skip_inint, default 0
number of flows to skip at the beginning of input
- count_inint, default None, meaning all flows
number of flows to read from input
- skip_outint, default 0
number of flows to skip after filtering
- count_outint, default None, meaning all flows
number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
fields (list[str], optional) – read only these fields, other can be zeros
- Returns:
af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs
- Return type:
(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)
- read_pipe(in_file, counters=None, filter_expr=None, fields=None)
Read and yield all flows in a nfdump pipe file/stream.
This function calls nfdump program to parse nfdump file.
- Parameters:
in_file (os.PathLike | io.TextIOWrapper) – nfdump pipe file or stream to read
counters (dict[str, int], default) –
- skip_inint, default 0
number of flows to skip at the beginning of input
- count_inint, default None, meaning all flows
number of flows to read from input
- skip_outint, default 0
number of flows to skip after filtering
- count_outint, default None, meaning all flows
number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
fields (list[str], optional) – read only these fields, other can be zeros
- Returns:
af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs
- Return type:
(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)
- write_flow_binary(output_dir)
Write flow tuples to binary flow files.
- Parameters:
output_dir (os.PathLike) – directory to write to
- write_flow_csv(output)
Write flow tuples to the output.
- Parameters:
output (os.PathLike | io.TextIOWrapper) – file to write or stream
- write_line(output, header_line=None)
Write lines to the output.
- Parameters:
output (os.PathLike | io.TextIOWrapper) – file to write or stream
header_line (str, optional) – header line to write at the beggining of the file
flow_models.lib.mix module
- avg(data, x, x_val, what)
Calculate average value of a feature basing on distribution mixtures.
- cdf(mix, x)
Return a CDF from a distribution mixture.
- cdf_comp(mix, x)
Return CDF components from a distribution mixture.
- pdf(mix, x, x_val)
Return a PDF from a distribution mixture.
- pdf_comp(mix, x, x_val)
Return PDF components from a distribution mixture.
- rvs(mix, x_val, size=1, random_state=None)
Return a sample from a distribution mixture.
flow_models.lib.ml module
- calculate_reduction(octets, octets_predicted, thresholds=None)
Calculate flow reduction curve.
- Parameters:
octets (numpy.array) – real flow sizes (number of octets/bytes)
octets_predicted (numpy.array) – predicted flow sizes (number of octets/bytes)
thresholds (int | numpy.array, optional) – flow size thresholds to calculate upon or their number, default [2..2^24]
- Returns:
[threshold, traffic_coverage, flow_table_reduction] array containing flow table size reduction and traffic coverage obtained for each flow size threshold
- Return type:
np.array[3]
- calculate_reduction_from_mixture(path)
Calculate flow reduction curve from distribution mixture JSON.
- Parameters:
path (os.PathLike) – path to a directory with JSON mixture
- Returns:
[threshold, flow_table_reduction, traffic_coverage] array containing flow table size reduction and traffic coverage obtained for each flow size threshold
- Return type:
np.array[3]
- interp_reduction(x, traffic_coverage, flow_table_reduction)
Interpolate flow reduction curve for given traffic coverages.
- Parameters:
x (numpy.array) – traffic coverage point to interpolate upon
traffic_coverage (numpy.array) – traffic coverages corresponding to the input flow table size reductions
flow_table_reduction (numpy.array) – input flow table size reductions
- Returns:
x, flow_table_reduction_for_x
- Return type:
(numpy.array, numpy.array)
- load_arrays(directory)
Load 5-tuple and flow sizes arrays from a directory.
- Parameters:
directory (os.PathLike) – direcotry containing binary flow records
- Returns:
sa, da, sp, dp, prot, oc
- Return type:
(np.array, np.array, np.array, np.array, np.array, np.array)
- make_slice(data, skip=0, count=None)
Make slice of 5-tuple and flow size arrays.
- Parameters:
- Returns:
sa, da, sp, dp, prot, oc
- Return type:
(np.array, np.array, np.array, np.array, np.array, np.array)
- prepare_decision(oc, coverage)
Simulate mice/elephant decision to obtain a desired traffic coverage.
- prepare_input(data, octets=False, bits=False)
Prepare input features array and target (flow sizes) array.
- Parameters:
- Returns:
input features array
- Return type:
np.array[5]
- score_reduction(octets, octets_predicted)
Calculate average flow table size reduction for 80% traffic coverage.
- Parameters:
octets (numpy.array) – real flow sizes (number of octets/bytes)
octets_predicted (numpy.array) – predicted flow sizes (number of octets/bytes)
- Returns:
average flow table size reduction for 80% traffic coverage
- Return type:
- top_idx(octets, ratio, seed=None)
Get indices of the largest flows.
flow_models.lib.util module
- bin_calc_log(x, b)
Calculate logarithmic bin size.
- bin_calc_one(x, _)
Calculate bin size of 1 width.
- measure_memory(on=False)
Measure and print minimum, average and maximum memory usage.
- Parameters:
on (default False) – run measurement