flow_models package
flow_models.anonymize module
Anonymizes IP addresses in IPv4 flows using Crypto-PAn algorithm.
- anonymize(in_files, output, in_format='nfcapd', out_format='csv_flow', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, key='')
Anonymizes IP addresses in IPv4 flows using Crypto-PAn algorithm.
- Parameters:
in_files (list[pathlib.Path]) – input files paths
output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream
in_format (str, default 'nfcapd') – input format
out_format (str, default 'csv_flow') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – number of flows to skip after filtering
count_out (int, default None, meaning all flows) – number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
key (str) – encryption key (32 bytes)
flow_models.convert module
Converts flow records between supported formats. Can be used for filtering and cutting flow record files.
- convert(in_files, output, in_format='nfcapd', out_format='csv_flow', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None)
Convert flow records between supported formats. Can also be used for filtering and cutting flow record files.
- Parameters:
in_files (list[pathlib.Path]) – input files paths
output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream
in_format (str, default 'nfcapd') – input format
out_format (str, default 'csv_flow') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – number of flows to skip after filtering
count_out (int, default None, meaning all flows) – number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
flow_models.cut module
Cuts binary flow records with dd.
- cut(in_files, output, in_format='binary', out_format='binary', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None)
Cut binary flow records with dd.
- Parameters:
in_files (list[pathlib.Path]) – input files paths
output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream
in_format (str, default 'binary') – input format
out_format (str, default 'binary') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – number of flows to skip after filtering
count_out (int, default None, meaning all flows) – number of flows to output after filtering
filter_expr (CodeType, optional) – not supported
flow_models.fit module
Creates General Mixture Models (GMM) fitted to flow records (requires scipy).
- fit(in_file, y_value, max_iter=100, initial=None, max_pareto_w=None, cb=None)
Fit distribution mixture to flow histogram.
- Parameters:
in_file (os.PathLike) – input histogram file
y_value (str) – y axis value
max_iter (int, default 100) – maximum number of iterations
initial (dict, optional) – initial mixture
max_pareto_w (float, optional) – maximum pareto weight
cb (function, optional) – callback function to call after each iteration
- Returns:
{‘mix’: result_mix, ‘sum’: np.sum(weights)}
- Return type:
flow_models.generate module
Generates flow records from histograms or mixture models.
- generate(in_file, output, out_format='csv_flow', size=1, x_value='length', random_state=None)
Generate flows from mixture or histogram file to output.
- Parameters:
in_file (os.PathLike) – csv_hist file or mixture director
output (os.PathLike | io.TextIOWrapper) – file or directory for output
out_format (str, default 'csv_flow') – output format
size (int, default 1) – number of flows to generate
x_value (str, default 'length') – x axis value
random_state (object, optional) – initial random state
- generate_flows(in_file, size=1, x_value='length', random_state=None)
Yield flow tuples generated from mixture or histogram file.
- Parameters:
in_file (os.PathLike) – csv_hist file or mixture director
size (int, default 1) – number of flows to generate
x_value (str, default 'length') – x axis value
random_state (object, optional) – initial random state
- Yields:
(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int) – af, prot, inif, outif, sa0, sa1, sa2, sa3, da0, da1, da2, da3, sp, dp, first, first_ms, last, last_ms, packets, octets, aggs
flow_models.hist module
Calculates histograms of flows length, size, duration or rate.
- hist(in_files, output, in_format='nfcapd', out_format='csv_hist', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, bin_exp=0, x_value='length', additional_columns=None)
Calculate histograms of flows length, size, duration or rate.
- Parameters:
in_files (list[os.PathLike]) – input files paths
output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream
in_format (str, default 'nfcapd') – input format
out_format (str, default 'csv_hist') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – number of flows to skip after filtering
count_out (int, default None, meaning all flows) – number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
bin_exp (int, default 0) – bin width exponent of 2
x_value (str, default 'length') – x axis value
additional_columns (list[str], optional) – additional column to sum
flow_models.hist_np module
Calculates histograms using multiple threads (requires numpy, much faster, but uses more memory).
- hist(in_files, output, in_format='binary', out_format='csv_hist', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, bin_exp=0, x_value='length', additional_columns=None)
Calculate histograms of flows length, size, duration or rate.
- Parameters:
in_files (list[os.PathLike]) – input files paths
output (os.PathLike | io.TextIOWrapper) – output file or directory path or stream
in_format (str, default 'binary') – input format
out_format (str, default 'csv_hist') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – not supported
count_out (int, default None, meaning all flows) – not supported
filter_expr (CodeType, optional) – filter expression
bin_exp (int, default 0) – bin width exponent of 2
x_value (str, default 'length') – x axis value
additional_columns (list[str], optional) – additional column to sum
flow_models.merge module
Merges flows which were split across multiple records due to active timeout.
- merge(in_files, output, in_format='nfcapd', out_format='csv_flow', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, inactive_timeout=15.0, active_timeout=300.0)
Merge flows split due to timeout.
- Parameters:
in_files (list[os.PathLike]) – input files paths
output (os.PathLike | io.TextIOWrapper) – directory path
in_format (str, default 'nfcapd') – input format
out_format (str, default 'csv_flow') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – number of flows to skip after filtering
count_out (int, default None, meaning all flows) – number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
inactive_timeout (float, default 15.0) – inactive timeout in seconds
active_timeout (float, default 300.0) – active timeout in seconds
flow_models.plot module
Generates plots from flow records and fitted models (requires pandas and scipy).
flow_models.series module
Generates packets and octets time series from flow records.
- series(in_files, output, in_format='nfcapd', out_format='csv_series', skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None)
Generate packets and octets time series from flow records.
- Parameters:
in_files (list[pathlib.Path]) – input files paths
output (os.PathLike) – directory path
in_format (str, default 'nfcapd') – input format
out_format (str, default 'csv_series') – output format
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – number of flows to skip after filtering
count_out (int, default None, meaning all flows) – number of flows to output after filtering
filter_expr (CodeType, optional) – filter expression
flow_models.series_plot module
flow_models.sort module
Sorts flow records according to specified key fields (requires numpy).
- create_index(path, key_fields, index_file, counters=None, reverse=False)
Create index array, optionally saving it to file.
- Parameters:
path (os.PathLike) – path of a directory with key files
index_file (str, optional) – index file path
counters (dict[str, int], default) –
- skip_inint, default 0
number of flows to skip at the beginning of input
- count_inint, default None, meaning all flows
number of flows to read from input
- skip_outint, default 0
not supported
- count_outint, default None, meaning all flows
not supported
reverse (bool, default False) – reverse order
- Returns:
index array
- Return type:
numpy.array
- sort(in_files, output, key_fields, in_format='binary', out_format='binary', index_file=None, skip_in=0, count_in=None, skip_out=0, count_out=None, filter_expr=None, reverse=False)
Sorts flow records according to specified key fields.
- Parameters:
output (os.PathLike | None) – output directory path
in_format (str, default 'binary') – input format
out_format (str, default 'binary') – output format
index_file (str, optional) – index file path
skip_in (int, default 0) – number of flows to skip at the beginning of input
count_in (int, default None, meaning all flows) – number of flows to read from input
skip_out (int, default 0) – not supported
count_out (int, default None, meaning all flows) – not supported
filter_expr (CodeType, optional) – not supported
reverse (bool, default False) – reverse order
- sort_array(input_file, output_dir, index_array, counters=None)
Sorts flow records according to an index array.
- Parameters:
input_file (os.PathLike) – input files path
output_dir (os.PathLike) – output directory path
index_array (object) – index array
counters (dict[str, int], default) –
- skip_inint, default 0
number of flows to skip at the beginning of input
- count_inint, default None, meaning all flows
number of flows to read from input
- skip_outint, default 0
not supported
- count_outint, default None, meaning all flows
not supported
flow_models.summary module
Produces TeX tables containing summary statistics of flow dataset (requires scipy).