Welcome to flow-models’s documentation!
flow-models is a software framework for creating precise and reproducible statistical flow models from NetFlow/IPFIX flow records. It offers features such as merging split records, calculating histograms of flow features, and creating General Mixture Models to fit the data. These models can be used for analytical calculations and simulations to generate realistic traffic.
First packets mirroring subpackage allows for the simulation of the first N packet mirroring feature in switches. This feature involves sending copies of the initial packets of a new flow to the switch’s CPU or controller for inspection and flow identification.
Elephant flow classification and detection subpackage provides functionalities for simulating and analyzing mechanisms related to elephant flows. Elephant flows (also called heavy-hitters) are flows which are responsible for the vast majority of traffic in the Internet. By focusing on these flows, advanced traffic engineering (TE) mechanisms can be leveraged in the network without the requirement of maintaining individual entries for every flow.
You can cite the following paper if you use flow-models in your research:
@article{flow-models,
title = {flow-models: A framework for analysis and modeling of IP network flows},
journal = {SoftwareX},
volume = {17},
pages = {100929},
year = {2022},
issn = {2352-7110},
doi = {10.1016/j.softx.2021.100929},
author = {Piotr Jurkiewicz}
}
Provided tools
The framework currently includes the following tools:
merge – merges flows which were split across multiple records due to active timeout
sort – sorts flow records according to specified fields (requires
numpy)hist – calculates histograms of flows length, size, duration or rate
hist_np – calculates histograms using multiple threads (requires
numpy, much faster, but uses more memory)fit – creates General Mixture Models (GMM) fitted to flow records (requires
scipy)plot – generates plots from flow records and fitted models (requires
pandasandscipy)generate – generates example flow records from histograms or mixture models
summary – produces TeX tables containing summary statistics of flow dataset (requires
scipy)convert – coverts flow records between different formats, can also cut and filter them
cut – cuts binary flow record files using
ddanonymize – anonymizes IPv4 addreses in flow records with prefix-preserving Crypto-PAn algorithm
series – generates time series of link’s bit or packet rate from flow records
Following the Unix philosophy, each tool is a separate Python program aimed at a single purpose. Features provided by the tools are orthogonal and they are tailored to be used sequentially in data-processing pipelines.
Models library
The GitHub repository contains a library of flow models. They consist of histogram CSV files, fitted mixture JSON files and plots. Full flow records are also included in smaller models. Available models can be explored here: https://github.com/piotrjurkiewicz/flow-models/tree/master/data
Contents
Reference
- File formats
- API Reference
- flow_models package
- flow_models.anonymize module
- flow_models.convert module
- flow_models.cut module
- flow_models.fit module
- flow_models.generate module
- flow_models.hist module
- flow_models.hist_np module
- flow_models.merge module
- flow_models.plot module
- flow_models.series module
- flow_models.series_plot module
- flow_models.sort module
- flow_models.summary module
- flow_models.first_mirror package
- flow_models.elephants package
- flow_models.elephants.skl package
- flow_models.lib package
- flow_models package