Samples count report¶
An important dataset property to keep an eye on when aiming for modeling is the number of samples for each class available in the datasets.
You can easily recover this using the datasets samples-count subcommand.
For instance,
the following command computes the samples count for the unfitered
version of the ucdavis-icdm19 dataset.
Output
unfiltered
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ partition ┃ app ┃ samples ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ pretraining │ google-doc │ 1221 │
│ │ google-drive │ 1634 │
│ │ google-music │ 592 │
│ │ google-search │ 1915 │
│ │ youtube │ 1077 │
│ │ __total__ │ 6439 │
├─────────────────────────────┼───────────────┼─────────┤
│ retraining-human-triggered │ google-doc │ 15 │
│ │ google-drive │ 18 │
│ │ google-music │ 15 │
│ │ google-search │ 15 │
│ │ youtube │ 20 │
│ │ __total__ │ 83 │
├─────────────────────────────┼───────────────┼─────────┤
│ retraining-script-triggered │ google-doc │ 30 │
│ │ google-drive │ 30 │
│ │ google-music │ 30 │
│ │ google-search │ 30 │
│ │ youtube │ 30 │
│ │ __total__ │ 150 │
└─────────────────────────────┴───────────────┴─────────┘
While to obtain the breakdown of the first train split
Output
...or the human test split