Concept Drift Datasets

Contents

  1. Referred Papers
  2. Datasets Used in [1] and [2]
  3. The PKLot for Concept Drift Scenarios (Used in [1] and [7])
  4. Datasets Used in [7]

Referred Papers

  • [1] Almeida, P. R. L., Oliveira, L. S., Britto Jr, A., S., Sabourin, R. Adapting Dynamic Classifier Selection for Concept Drift. Expert Systems with Applications, 2018. See the publication.
  • [2] Almeida, P., Oliveira, L. S., Britto Jr, A., Sabourin, R., Handling Concept Drifts Using Dynamic Selection of Classifiers. IEEE International Conference on Tools with Artiticial Intelligence, San Jose, USA, 2016. See the publication.
  • [3] Almeida, P. R. L., Oliveira, L. S., Britto Jr, A., S., Silva Jr, E., Koerich,A., PKLot - A Robust dataset for parking lot classification. Expert Systems with Applications, 42(11):4937-4949, 2015. See the publication
  • [4]Almeida, P. R. L., Oliveira, L. S., Britto Jr, A., S., Sabourin, R. Adapting Dynamic Classifier Selection for Concept Drift. Expert Systems with Applications, 2018. .
  • [5] Ojala, T., Pietikainen, M., & Maenpaa, T. (2002, Jul). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971-987. doi: 10.1109/TPAMI.2002.1017623
  • [7] Almeida, P. R. L., Oliveira, L. S., Britto Jr, Barddal, J. P. Naı̈ve Approaches to Deal With Concept Drifts.


Datasets Used in [1] and [2]

Nebraska-Norm - The normalized version of the Nebraska Weather Dataset (the original version can be found here).

Checkerboard - The arff converted files from the checkerboard datasets (the original version can be found here).

Gaussian - The arff converted files from the checkerboard datasets (the original version can be found here).

Forest Covertype - The normalized arff of the Forest Covertype dataset (note: this dataset is hosted in the MOA repository).

Digit-Norm - The normalized version of the Digit Dataset (the original version can be found here).

Digit - The non-normalized version of the Digit Dataset (in this version the data were just put in the arff format).

Letters-Norm - The normalized version of the Letters Dataset (the original version can be found here).

Letters - The non-normalized version of the Letters Dataset (in this version the data were just put in the arff format).



The PKLot for Concept Drift Scenarios (Used in [1])

About The Protocol

This protocol use the real world PKLot [3] problem as a concept drift benchmark, as discussed in [4].

License

Creative Commons License
The PKLot database is licensed under a Creative Commons Attribution 4.0 International License.

Protocol Definition

  • The problem is defined as classifying each individual parking space as vacant or occupied.
  • The LBP uniform [5] is defined as the feature set.
  • Days containing less than 50 samples for each class (vacant or occupied) from the original dataset are not considered. These days were already removed in this version of the benchmark.
  • The parking lots are presented in the order UFPR04, UFPR05 and PUCPR. The images collected in each parking lot are ordered in the chronological order and each day represent a time step. Thus, the time steps are: Day1_UFPR04, ... , Last_UFPR04, Day1_UFPR05, ... , Last_UFPR05, Day1_PUCPR, ... , Last_PUCPR. This configuration generate a camera position change (UFPR04 to UFPR05) and then a parking lot change (UFPR05 to PUCPR).
  • At each time step, all instances of the current day must be classified. Also, at each time step (day) 50 samples from each class from the previous day are randomly selected for training. This configuration simulates a scenario where a human supervisor may label a small batch to update the classification system.

Download

You may donwload the files containing the LBP uniform features already extracted and ordered according to the proposed protocol here.

When you extract the tar.gz file, you will find several directories. The numbers in parenthesis in each directory represents the order that the data should be fed. First, the days in the UFPR04, then UFPR05 and Finally PUC. Each day is presented as a directory, named by a number in parenthesis representing its order, and the day that it was collected in the format YYYY-MM-DD. Inside the days folder you can find the LBP Uniform features extracted from each individual parking space, in the Weka/MOA (arff) format.



Datasets Used in [7]

Electricity - The normalized arff of the Electricity dataset (note: this dataset is hosted in the MOA repository).

Nebraska - The the Nebraska Weather Dataset in the arff format (the original version can be found here).

Forest Covertype - The normalized arff of the Forest Covertype dataset (note: this dataset is hosted in the MOA repository).

Poker-Hand - The Poker-Hand dataset (note: this dataset is hosted in the MOA repository).

Airlines - The airlines dataset (note: this dataset is hosted in the MOA repository).

PKLot - The PKLot for Concept Drift Scenarios (Described Above).