Tables and Figures Jupyter Notebooks¶

The tables and figures are created via a set of Jupyter notebooks.

The notebooks are stored on both tcbench github as well as the in the paper figshare collection.

The pages linked below show the rendered version of the notebooks. If you want to run the notebooks, make sure to

Have installed (or imported) ucdavis-icdm19, mirage19, mirage22, utmobilenet21. Please check each dataset page for more details.
Have installed the ml_artifacts
To install modeling artifacts, grab ml_artifacts.tgz and unpack it under the folder mentioned above. The tarball contains a /campaigns folder so the final structure should be
```
tree notebooks/ -d -L 2
notebooks/
├── submission_tables_and_figures
│   └── campaigns
└── tutorials
```
To install/import datasets refer to the install and import pages.

Tables¶

Table 2: Summary of Datasets Properties.
table2_datasets_properties.ipynb
Table 3: (G0) Baseline ML performance without augmentation in a supervised setting.
table3_xgboost_baseline.ipynb
Table 4: Comparing data augmentation functions applied in supervised training.
table4_ucdavis-icdm19_comparing_data_augmentations_functions.ipynb
Table 5: Impact of dropout and SimCLR projection layer dimension on fine-tuning.
table5_simclr_dropout_and_projectionlayer.ipynb
Table 6: Comparing the fine-tuning performance when using different pairs of augmentation for pretraining.
table6_simclr_other_augmentation_pairs.ipynb
Table 7: Accuracy on 32x32 flowpic when enlarging training set (w/o Dropout).
table7_larger_trainset.ipynb
Table 8: (G3) Data augmentation in supervised setting on other datasets.
table8_augmentation-at-loading_on_other_datasets.ipynb
Table 9 - appendix: Macro-average Accuracy with different retraining dataset and different sampling methods for Rezaei at al. ICM19.
table9_icdm_finetuning_per_class_metrics_on_human.ipynb
Table 10 - appendix: Performance comparison across augmentations for different flowpic sizes.
table10_ucdavis-icdm19_tukey.ipynb

Figure 1: Example of a packet time series transformed into a flowpic representation for a randomly selected flow.
figure1_flowpic_example.ipynb
Figure 3: Average confusion matrixes for the 32x32 resolution across all experiments in Table 4.
figure3_confusion_matrix_supervised_setting.ipynb
Figure 4: Average 32x32 flowpic for each class across multiple data splits.
figure4_ucdavis_per_class_average_flowpic
Figure 5: Critical distance plot of the accuracy obtained with each augmentation for the 32x32 and 64x64 cases.
figure5_ucdavis_augmentations_comparison
Figure 6: Critical distance plot of the accuracy obtained with each augmentation across the four tested datasets.
figure6_augmentations_comparison_across_datasets_critical_distance
Figure 7: Average rank obtained per augmentation and dataset. Ranks closer to 1 indicate a better performance.
figure7_augmentations_comparison_across_datasets_average_rank
Figure 8 - appendix: Investigating root cause of G1 discrepancies: Kernel density estimation of the per-class packet size distributions.
figure8_ucdavis_per_class_average_flowpic
Figure 10(b) - appendix: Classwise evaluation on human.
figure10b_icdm_finetuning_per_class_metrics_on_human
Figure 11 - appendix: Accuracy difference w/ and w/o Dropout in supervised learning.
figure11_dropout_impact_supervised_setting.ipynb

Rezaei et al., How to Achieve High Classification Accuracy with Just a Few Labels: A Semi-supervised Approach Using Sampled Packets, ICDM19,