ucdavis-icdm19
¶
Below we report the samples count for each version of the dataset.
Semantic of the splits
The split available for this datasets relate to our IMC23 paper.
unfiltered¶
The unfitered version contains all data before curation.
Output
unfiltered
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ partition ┃ app ┃ samples ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ pretraining │ google-doc │ 1221 │
│ │ google-drive │ 1634 │
│ │ google-music │ 592 │
│ │ google-search │ 1915 │
│ │ youtube │ 1077 │
│ │ __total__ │ 6439 │
├─────────────────────────────┼───────────────┼─────────┤
│ retraining-human-triggered │ google-doc │ 15 │
│ │ google-drive │ 18 │
│ │ google-music │ 15 │
│ │ google-search │ 15 │
│ │ youtube │ 20 │
│ │ __total__ │ 83 │
├─────────────────────────────┼───────────────┼─────────┤
│ retraining-script-triggered │ google-doc │ 30 │
│ │ google-drive │ 30 │
│ │ google-music │ 30 │
│ │ google-search │ 30 │
│ │ youtube │ 30 │
│ │ __total__ │ 150 │
└─────────────────────────────┴───────────────┴─────────┘
First training split¶
Output
human
test split¶
This is equivalent to the human
partition of the unfiltered dataset.
Output
script
test split¶
This is equivalent to the script
partition of the unfiltered dataset.