The following data on historical SCADA operations are provided.
- First set: Data from six months preceding the installation of the smart devices (September 2015 to February 2016). These data are guaranteed to be without attacks and can be used to study the normal system operations. (Data availability: September 9, 2016.)
- Second set: A few months of data following the installation of the smart devices (April to June 2016). This dataset contains the attacks causing anomalous low levels in T5 (13-18 May) and high levels in T1 (21-22 May). CPU engineers were able to discover these attacks and label them properly. The last month includes the attack that caused Tank T1 to overflow. The engineers were able to label the attack only when the overflow occurred (22-23 June). They also suspect that other attacks might be contained in the remaining data of this second dataset. (Data availability: September 9, 2016.) Additional data may be released after the abstract submission (October 2, 2016).
- Training set 3: This dataset with unlabeled data will be released on November 20 2016. The dataset does not contain attacks and it was generated with a one-year simulation. It is thus similar to Dataset 1 - the only difference stands in the water demand patterns, which have been modified slightly. Please use this dataset as you develop your attack identification algorithms.
- Training set 4: This dataset with partially labeled data was released November 28 2016. The dataset does contain several attacks, with some of them approximately labeled. It was generated with a one-year simulation, with the same demand patterns used for Dataset 3.
- Test set: This dataset with unlabeled data will be released on February 20 2017. It will be used to quantitatively compare the performance of the algorithms (see rules document for details).