This dataset directory contains the 'infected' and 'not infected' samples and the models used for each T configuration, each one in a separated folder.
Each folder (for example 10s/) contains 8 files:
- tree.py -> Python script with the Tree model.
ensemble.json -> JSON file with the information about the Ensemble model.
- NN_XhiddenLayer.json -> JSON file with the information about the NN model with X hidden layers (1, 2 or 3).
- train_data.csv -> All samples used for training each model in this folder. It is in csv format for using in bigML application.
- zeroDays.csv -> All zero-day samples used for testing each model in this folder. It is in csv format for using in bigML application.
- FPtest_data -> All samples used for validating each model in this folder. It is in csv format for using in bigML application.
The six models present in each folder are the configuration reached training it with the train_data.csv file. The validation and the measurement of the data lost have been done with the FPtest_data.csv and the zeroDays.csv files respectively.
The files containing samples (train_data.csv, zeroDays.csv and FPtest_data.csv) are structured as follows:
- Each line is one sample.
- Each sample has 3*T features and the label. Columns 1-10 contain the number of bytes from client to server in each of the T intervals (ordered by time), columns 11-20 contain the number of bytes from server to client in those intervals, columns 21-30 contain the number of short commands in those intervals, column 31 is the label (1 if it is 'infected' sample and 0 if it is not).
- The features are separated by ',' because it is a csv file.
- The last column is the label of the sample.
Download the compressed dataset (44MB)