Skip to content

ucdavis-icdm19

Below we report the samples count for each version of the dataset.

Semantic of the splits

The split available for this datasets relate to our IMC23 paper.

unfiltered

The unfitered version contains all data before curation.

tcbench datasets samples-count --name ucdavis-icdm19

Output

unfiltered
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ partition                   ┃ app           ┃ samples ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ pretraining                 │ google-doc    │    1221 │
│                             │ google-drive  │    1634 │
│                             │ google-music  │     592 │
│                             │ google-search │    1915 │
│                             │ youtube       │    1077 │
│                             │ __total__     │    6439 │
├─────────────────────────────┼───────────────┼─────────┤
│ retraining-human-triggered  │ google-doc    │      15 │
│                             │ google-drive  │      18 │
│                             │ google-music  │      15 │
│                             │ google-search │      15 │
│                             │ youtube       │      20 │
│                             │ __total__     │      83 │
├─────────────────────────────┼───────────────┼─────────┤
│ retraining-script-triggered │ google-doc    │      30 │
│                             │ google-drive  │      30 │
│                             │ google-music  │      30 │
│                             │ google-search │      30 │
│                             │ youtube       │      30 │
│                             │ __total__     │     150 │
└─────────────────────────────┴───────────────┴─────────┘

First training split

tcbench datasets samples-count --name ucdavis-icdm19 --split 0

Output

filtered, split: 0
┏━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ app           ┃ samples ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ google-doc    │     100 │
│ google-drive  │     100 │
│ google-music  │     100 │
│ google-search │     100 │
│ youtube       │     100 │
├───────────────┼─────────┤
│ __total__     │     500 │
└───────────────┴─────────┘

human test split

This is equivalent to the human partition of the unfiltered dataset.

tcbench datasets samples-count --name ucdavis-icdm19 --split human

Output

filtered, split: human
┏━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ app           ┃ samples ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ youtube       │      20 │
│ google-drive  │      18 │
│ google-doc    │      15 │
│ google-music  │      15 │
│ google-search │      15 │
├───────────────┼─────────┤
│ __total__     │      83 │
└───────────────┴─────────┘

script test split

This is equivalent to the script partition of the unfiltered dataset.

tcbench datasets samples-count --name ucdavis-icdm19 --split script

Output

filtered, split: script
┏━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ app           ┃ samples ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ google-doc    │      30 │
│ google-drive  │      30 │
│ google-music  │      30 │
│ google-search │      30 │
│ youtube       │      30 │
├───────────────┼─────────┤
│ __total__     │     150 │
└───────────────┴─────────┘