Samples count report¶
An important dataset property to keep an eye on when aiming for modeling is the number of samples for each class available in the datasets.
You can easily recover this using the datasets samples-count
subcommand.
For instance,
the following command computes the samples count for the unfitered
version of the ucdavis-icdm19
dataset.
Output
unfiltered
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ partition ┃ app ┃ samples ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ pretraining │ google-doc │ 1221 │
│ │ google-drive │ 1634 │
│ │ google-music │ 592 │
│ │ google-search │ 1915 │
│ │ youtube │ 1077 │
│ │ __total__ │ 6439 │
├─────────────────────────────┼───────────────┼─────────┤
│ retraining-human-triggered │ google-doc │ 15 │
│ │ google-drive │ 18 │
│ │ google-music │ 15 │
│ │ google-search │ 15 │
│ │ youtube │ 20 │
│ │ __total__ │ 83 │
├─────────────────────────────┼───────────────┼─────────┤
│ retraining-script-triggered │ google-doc │ 30 │
│ │ google-drive │ 30 │
│ │ google-music │ 30 │
│ │ google-search │ 30 │
│ │ youtube │ 30 │
│ │ __total__ │ 150 │
└─────────────────────────────┴───────────────┴─────────┘
While to obtain the breakdown of the first train split
Output
...or the human
test split