Tables and Figures Jupyter Notebooks¶
The tables and figures are created via a set of Jupyter notebooks.
The notebooks are stored on both tcbench github as well as the in the paper figshare collection.
The pages linked below show the rendered version of the notebooks. If you want to run the notebooks, make sure to
-
Have installed (or imported)
ucdavis-icdm19
,mirage19
,mirage22
,utmobilenet21
. Please check each dataset page for more details. -
Have installed the ml_artifacts
-
To install modeling artifacts, grab
ml_artifacts.tgz
and unpack it under the folder mentioned above. The tarball contains a/campaigns
folder so the final structure should be -
To install/import datasets refer to the
install
andimport
pages.
Tables¶
-
Table 2: Summary of Datasets Properties.
table2_datasets_properties.ipynb
-
Table 3: (G0) Baseline ML performance without augmentation in a supervised setting.
table3_xgboost_baseline.ipynb
-
Table 4: Comparing data augmentation functions applied in supervised training.
table4_ucdavis-icdm19_comparing_data_augmentations_functions.ipynb
-
Table 5: Impact of dropout and SimCLR projection layer dimension on fine-tuning.
table5_simclr_dropout_and_projectionlayer.ipynb
-
Table 6: Comparing the fine-tuning performance when using different pairs of augmentation for pretraining.
table6_simclr_other_augmentation_pairs.ipynb
-
Table 7: Accuracy on 32x32 flowpic when enlarging training set (w/o Dropout).
table7_larger_trainset.ipynb
-
Table 8: (G3) Data augmentation in supervised setting on other datasets.
table8_augmentation-at-loading_on_other_datasets.ipynb
-
Table 9 - appendix: Macro-average Accuracy with different retraining dataset and different sampling methods for Rezaei at al. ICM19.
table9_icdm_finetuning_per_class_metrics_on_human.ipynb
-
Table 10 - appendix: Performance comparison across augmentations for different flowpic sizes.
table10_ucdavis-icdm19_tukey.ipynb
Figures¶
-
Figure 1: Example of a packet time series transformed into a flowpic representation for a randomly selected flow.
figure1_flowpic_example.ipynb
-
Figure 3: Average confusion matrixes for the 32x32 resolution across all experiments in Table 4.
figure3_confusion_matrix_supervised_setting.ipynb
-
Figure 4: Average 32x32 flowpic for each class across multiple data splits.
figure4_ucdavis_per_class_average_flowpic
-
Figure 5: Critical distance plot of the accuracy obtained with each augmentation for the 32x32 and 64x64 cases.
figure5_ucdavis_augmentations_comparison
-
Figure 6: Critical distance plot of the accuracy obtained with each augmentation across the four tested datasets.
figure6_augmentations_comparison_across_datasets_critical_distance
-
Figure 7: Average rank obtained per augmentation and dataset. Ranks closer to 1 indicate a better performance.
figure7_augmentations_comparison_across_datasets_average_rank
-
Figure 8 - appendix: Investigating root cause of G1 discrepancies: Kernel density estimation of the per-class packet size distributions.
figure8_ucdavis_per_class_average_flowpic
-
Figure 10(b) - appendix: Classwise evaluation on
human
.
figure10b_icdm_finetuning_per_class_metrics_on_human
-
Figure 11 - appendix: Accuracy difference w/ and w/o Dropout in supervised learning.
figure11_dropout_impact_supervised_setting.ipynb
Others¶
- Miscellaneous stats across the paper.
miscellaneous_stats.ipynb
References¶
Rezaei et al., How to Achieve High Classification Accuracy with Just a Few Labels: A Semi-supervised Approach Using Sampled Packets, ICDM19,