Please wait a minute...
Big Data Mining and Analytics  2021, Vol. 4 Issue (1): 18-24    DOI: 10.26599/BDMA.2020.9020019
Special Issue on Intelligent Recommendation System and Big Data Analysis     
Mathematical Validation of Proposed Machine Learning Classifier for Heterogeneous Traffic and Anomaly Detection
Azidine Guezzaz*(),Younes Asimi,Mourade Azrour,Ahmed Asimi
Department of Computer Science and Mathematics, High School of Technology, Cadi Ayyad University, Essaouira 44000, Morocco.
Department of Computer Science, High School of Technology, Ibn Zohr University, Guelmim 81000, Morocco.
IDMS Team, Department of Computer Science, Faculty of Science and Technology, Moulay Ismail University, Errachidia 52000, Morocco.
Department of Computer Science and Mathematics, Faculty of Sciences Agadir, Ibn Zohr University, Agadir 80000, Morocco.
Download: PDF (7030 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

The modeling of an efficient classifier is a fundamental issue in automatic training involving a large volume of representative data. Hence, automatic classification is a major task that entails the use of training methods capable of assigning classes to data objects by using the input activities presented to learn classes. The recognition of new elements is possible based on predefined classes. Intrusion detection systems suffer from numerous vulnerabilities during analysis and classification of data activities. To overcome this problem, new analysis methods should be derived so as to implement a relevant system to monitor circulated traffic. The main objective of this study is to model and validate a heterogeneous traffic classifier capable of categorizing collected events within networks. The new model is based on a proposed machine learning algorithm that comprises an input layer, a hidden layer, and an output layer. A reliable training algorithm is proposed to optimize the weights, and a recognition algorithm is used to validate the model. Preprocessing is applied to the collected traffic prior to the analysis step. This work aims to describe the mathematical validation of a new machine learning classifier for heterogeneous traffic and anomaly detection.



Key wordsanomaly detection      heterogeneous traffic      preprocessing      machine learning      training      classification     
Received: 09 June 2020      Published: 12 January 2021
Corresponding Authors: Azidine Guezzaz     E-mail: A.GUZZAZ@gmail.com
About author: Azidine Guezzaz received the MS degree in the field of computer science and distributed systems from Department of Mathematics and Computer Science, Faculty of Science, University Ibn Zohr, Agadir, Morocco in 2013. He received the PhD degree from Faculty of Science, University Ibn Zohr, Agadir, Morocco in 2018. He was a professor at the Technology High School and BTS in the period 2014-2018. He then joined Cadi Ayyad University in 2018 as an assistant professor. His main field of research interests are intrusion detection and prevention, computer and network security, and cryptography.|Younes Asimi received the PhD degree from Ibn Zohr University in 2015. He is currently an assistant professor in computer science at Ibn Zohr University since 2018. His research interests include authentication protocols, computer and network security, and cryptography.|Mourade Azrour received the PhD degree from Faculty of Sciences and Technologies, Moulay Ismail University, Errachidia, Morocco in 2019, and the MS degree in computer and distributed systems from Faculty of Sciences, Ibn Zouhr University, Agadir, Morocco in 2014. He currently works as a computer science professor at the Department of Computer Science, Faculty of Sciences and Technologies, Moulay Ismail University. His research interests include authentication protocol, computer security, Internet of Things, and smart systems. He is a scientific committee member of numerous international conferences. He is also a reviewer of various scientific journals, such as International Journal of Cloud Computing and International Journal of Cyber-Security and Digital Forensics (IJCSDF).|Ahmed Asimi received the PhD degree in number theory from the University Mohammed V Agdal in 2001. He is reviewer of International Journal of Network Security (IJNS). His research interest includes number theory, code theory, and computer cryptology and security. He is a full professor at the Faculty of Science Agadir, Ibn Zohr University, Morocco since 2008.
Cite this article:

Azidine Guezzaz,Younes Asimi,Mourade Azrour,Ahmed Asimi. Mathematical Validation of Proposed Machine Learning Classifier for Heterogeneous Traffic and Anomaly Detection. Big Data Mining and Analytics, 2021, 4(1): 18-24.

URL:

http://bigdata.tsinghuajournals.com/10.26599/BDMA.2020.9020019     OR     http://bigdata.tsinghuajournals.com/Y2021/V4/I1/18

NotationMeaning
fSigmoid function
(Xi)i=1,,nPresent inputs
Xi=(xi,j)j=1,,mPresent occurrences to input Xi
W(0)=(wi,0)i=1,,nInitialize weights
Wi=(wi,j)j=1,,mModel weights initialized randomly and associated to input Xi
w0,iInitialize bias to 1 and associated with input Xi
aiWeight sum associated to input Xi
y?(ai)=f?(ai)Output associated to input Xi
εi?Error associated to input Xi
ε=max?{εi,i=1,,n}Maximum error (Xi)i=1,,n
Wi(op)=(wi,j(op))j=1,,mOptimal system solution (training algorithm) for input Xi
wj(max)=max?{wi,j(op),i=1,,n}Maximum weights associated with input Xi
ai(max)=j=1mwj(max)?xi,j+w0(max)Maximum bias associated with input Xi
ai(op)=j=1mwi,j(op)?xi,j+w0,i(op)?Optimal weighted sum associated to input Xi
W(max)=(wj(max))j=1,,mMaximum weights
w0(max)=max?{w0,i(op),i=1,,n}Maximum bias
d=+1Normal output
d=-1Abnormal output
Table 1 Notations used in the study.
MethodTraining typeClassification typeNature of dataConvergenceAccuracyAlgorithm goal
SVMSupervisedLinearSmall sizeFastAverageFind the best hyper plane separator.
MLPSupervisedLinearLarge sizeSlowHighMinimize the error between result and desired output.
UnsupervisedNonlinearIncomplete
KNNSupervisedLinearSmall sizeFastAveragePredict the values of new data points.
Table 2 Performance of studied classification methods.
Fig. 1 Proposed classifier model.
Fig. 2 Trainning database structure.
[1]   Hao S. Y., Long J., and Yang Y. C., BL-IDS: Detecting web attacks using Bi-LSTM model based on deep learning, in Security and Privacy in New Computing Environments, Li J., Liu Z. L., and Peng H., eds. Springer, 2019, pp. 551-563.
[2]   Zhou Y. and Wang P. C., An ensemble learning approach for XSS attack detection with domain knowledge and threat intelligence, Comp. Secur., vol. 82, pp. 261-269, 2019.
[3]   Rupam S., Verma A., and Singh A., An approach to detect packets using packet sniffing, Int. J. Comp. Sci. Eng. Surv., vol. 4, no. 3, pp. 21-25, 2013.
[4]   Igual L. and Seguín S., Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications. Springer, 2017.
[5]   Sahingoza O. K., Buberb E., Demirb O., and Diri B., Machine learning based phishing detection from URLs, Expert Syst. Appl., vol. 117, pp. 345-357, 2019.
[6]   Raschka S. and Mirjalili V., Python Machine Learning. 2nd ed. Birmingham, UK: Packt Publishing, 2017.
[7]   Kotsiantis S. B., Zaharakis I. D., and Pintelas P. E., Machine learning: A review of classification and combining techniques, Artif. Intell. Rew., vol. 26, no. 3, pp. 159-190, 2006.
[8]   Guezzaz A., Asimi A., Sadqi Y., Asimi Y., and Tbatou Z., A new hybrid network sniffer model based on pcap language and sockets (Pcapsocks), Int. J. Adv. Comp. Sci. Appl., vol. 7, no. 2, pp. 207-214, 2016.
[9]   Guezzaz A., Asimi A., Asimi Y., Tbatous Z., and Sadqi Y., A global intrusion detection system using PcapSockS sniffer and multilayer perceptron classifier, Int. J. Netw. Secur., vol. 21, no. 3, pp. 438-450, 2019.
[10]   Vapnik V. N., An overview of statistical learning theory, IEEE Trans. Neural Netw., vol. 10, no. 5, pp. 988-999, 1999.
[11]   Lauer F. and Bloch G., Méthodes SVM pour l’identication, , 2006.
[12]   Rochaa M., Cortezb P., and Nevesa J., Evolution of neural networks for classification and regression, Neurocomputing, vol. 70, nos. 16-18, pp. 2809-2816, 2007.
[13]   Idhammad M., Afdel K., and Belouch M., Detection system of HTTP DDoS attacks in a cloud environment based on information theoretic entropy and random forest, Hindawi Secur. Commun. Netw., vol. 2018, p. 1263123, 2018.
[14]   Guezzaz A., Asimi A., Azrour M., Batou Z., and Asimi Y., A multilayer perceptron classifier for monitoring network traffic, in Big Data and Networks Technologies, Farhaoui Y., ed. Springer, 2020.
[15]   Farhaoui Y. and Asimi A., Performance method of assessment of the intrusion detection and prevention systems, Int. J. Eng. Sci. Technol., vol. 3, no. 7, pp. 5916-5928, 2011.
[16]   Yong B. B., Liu X., Yu Q. C., Huang L., and Zhou Q. G., Malicious web traffic detection for internet of things environments, Comp. Electr. Eng., vol. 77, pp. 260-272, 2019.
[17]   ul-Hassan M., Khan M. A., Mahmood K., and Shah A. M.. Analysis of IPv4 vs IPv6 traffic in US, Int. J. Adv. Comp. Sci. Appl., vol. 7, no. 12, pp. 261-267, 2016.
[1] Mei Lu,Fanzhang Li. Survey on Lie Group Machine Learning[J]. Big Data Mining and Analytics, 2020, 3(4): 235-258.
[2] Wenjie Liu,Guoqing Wu,Fuji Ren,Xin Kang. DFF-ResNet: An Insect Pest Recognition Model Based on Residual Networks[J]. Big Data Mining and Analytics, 2020, 3(4): 300-310.
[3] Farid Ablayev, Marat Ablayev, Joshua Zhexue Huang, Kamil Khadiev, Nailya Salikhova, Dingming Wu. On Quantum Methods for Machine Learning Problems Part I: Quantum Tools[J]. Big Data Mining and Analytics, 2020, 03(01): 41-55.
[4] Farid Ablayev, Marat Ablayev, Joshua Zhexue Huang, Kamil Khadiev, Nailya Salikhova, Dingming Wu. On Quantum Methods for Machine Learning Problems Part II: Quantum Classification Algorithms[J]. Big Data Mining and Analytics, 2020, 03(01): 56-67.
[5] James Palmer, Victor S. Sheng, Travis Atkison, Bernard Chen. Classification on Grade, Price, and Region with Multi-Label and Multi-Target Methods in Wineinformatics[J]. Big Data Mining and Analytics, 2020, 03(01): 1-12.
[6] Yang Yang, Nengjun Zhu, Yifeng Wu, Jian Cao, Dechuan Zhan, Hui Xiong. A Semi-Supervised Attention Model for Identifying Authentic Sneakers[J]. Big Data Mining and Analytics, 2020, 03(01): 29-40.
[7] Wanling Liu, Weikun Wu, Yingming Wang, Yanggeng Fu, Yanqing Lin. Selective Ensemble Learning Method for Belief-Rule-Base Classification System Based on PAES[J]. Big Data Mining and Analytics, 2019, 2(4): 306-318.
[8] Mondher Bouazizi, Tomoaki Ohtsuki. Multi-Class Sentiment Analysis on Twitter: Classification Performance and Challenges[J]. Big Data Mining and Analytics, 2019, 2(3): 181-194.
[9] Jin Liu, Yi Pan, Min Li, Ziyue Chen, Lu Tang, Chengqian Lu, Jianxin Wang. Applications of Deep Learning to MRI Images: A Survey[J]. Big Data Mining and Analytics, 2018, 1(1): 1-18.
[10] Bo Zhao, Hucheng Zhou, Guoqiang Li, Yihua Huang. ZenLDA: Large-Scale Topic Model Training on Distributed Data-Parallel Platform[J]. Big Data Mining and Analytics, 2018, 1(1): 57-74.
[11] Ji Feng, Yan Wei, Qingsheng Zhu. Natural Neighborhood-Based Classification Algorithm Without Parameter k[J]. Big Data Mining and Analytics, 2018, 01(04): 257-265.
[12] Chenxi Yang, Yang Chen, Qingyuan Gong, Xinlei He, Yu Xiao, Yuhuan Huang, Xiaoming Fu. Understanding the Behavioral Differences Between American and German Users: A Data-Driven Study[J]. Big Data Mining and Analytics, 2018, 01(04): 284-296.
[13] Ning Yu, Zhihua Li, Zeng Yu. Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning[J]. Big Data Mining and Analytics, 2018, 01(03): 191-210.
[14] Runyan Zhang, Fanrong Meng, Yong Zhou, Bing Liu. Relation Classification via Recurrent Neural Network with Attention and Tensor Layers[J]. Big Data Mining and Analytics, 2018, 01(03): 234-244.