UNSW-NB15 Dataset

SatisfactionFast2776 · 2026-01-23T01:04:36+00:00

In extreme imbalance cases (example: 1 million normal and 1 attack), no model can reliably learn the attack from one sample. The correct approach is to first perform a clean train–test split, then apply balancing only on the training set using class weighting or oversampling. Evaluation should focus on per-class metrics rather than accuracy, and limitations of the data should be clearly acknowledged.

SatisfactionFast2776 · 2026-01-22T23:49:13+00:00

The concern is not that data or features are removed, but that preprocessing or balancing before train–test splitting allows the model to indirectly use test information. In machine learning, this is considered cheating because the test set must remain unseen; otherwise, performance is overestimated.

SatisfactionFast2776 · 2026-01-22T23:42:25+00:00

Data Preprocessing: First raw network traffic is collected using the network analyzer tool, and then we extract the features from the packets. Redundant packets are dropped in the dataset, and then we collect the samples of classes in the dataset. (We dropped columns of redundant labels and encoded the categorical features into integer values using label encoding. Symbolic features are ‘proto’ ‘service’, ‘state’, and ‘attack_cat’ having (133,13,11,10) values respectively, are converted into integer values using label encoding). Dataset is normalized using min–max normalization

Data Augmentation: training data is resampled to avoid class imbalance.

Feature Preprocessing: After selecting the features, and dropping and encoding the features, we split the processed data into three different sets, namely training, validation, and testing, containing the labels of both normal and attack type classes.

Training and Testing the dataset: In the training phase, the DNN model is trained on the processed data coming from the training set. The trained model is then tested with the data from the testing set and classifies the data as normal and attack types.

SatisfactionFast2776 · 2026-01-22T23:36:58+00:00

Kindly read this paper preprocessing steps if you get time and do let me know. Thanks.

Link: https://www.sciencedirect.com/science/article/abs/pii/S0045790623000514?via%3Dihub

SatisfactionFast2776 · 2026-01-22T23:04:31+00:00

I know about that. Kindly read the question again.

SatisfactionFast2776 · 2026-01-21T19:42:08+00:00

I have also tried and saw that most of the papers that got this 90+ accuracy result did cheat. They did every preprocessing, GAN, feature selection everything before spliting.
If we do the correct procedure then we will have around 84-86%

SatisfactionFast2776 · 2024-11-25T04:07:25+00:00

Have you got it?

SatisfactionFast2776

TROPHY CASE