Please wait a minute...
Big Data Mining and Analytics  2021, Vol. 4 Issue (1): 33-46    DOI: 10.26599/BDMA.2020.9020023
Special Issue on Intelligent Recommendation System and Big Data Analysis     
Improvement in Automated Diagnosis of Soft Tissues Tumors Using Machine Learning
El Arbi Abdellaoui Alaoui(),Stéphane Cédric Koumetio Tekouabou*(),Sri Hartini(),Zuherman Rustam(),Hassan Silkan(),Said Agoujil()
Department of Computer Sciences, Faculty of Sciences and Technologies, My Ismail University, Errachidia 52000, Morocco.
Department of Computer Science, Faculty of Sciences, Chouaib Doukkali University, El Jadida 24000, Morocco.
Department of Mathematics, Universitas Indonesia, Depok 16424, Indonesia.
Download: PDF (5099 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

Soft Tissue Tumors (STT) are a form of sarcoma found in tissues that connect, support, and surround body structures. Because of their shallow frequency in the body and their great diversity, they appear to be heterogeneous when observed through Magnetic Resonance Imaging (MRI). They are easily confused with other diseases such as fibroadenoma mammae, lymphadenopathy, and struma nodosa, and these diagnostic errors have a considerable detrimental effect on the medical treatment process of patients. Researchers have proposed several machine learning models to classify tumors, but none have adequately addressed this misdiagnosis problem. Also, similar studies that have proposed models for evaluation of such tumors mostly do not consider the heterogeneity and the size of the data. Therefore, we propose a machine learning-based approach which combines a new technique of preprocessing the data for features transformation, resampling techniques to eliminate the bias and the deviation of instability and performing classifier tests based on the Support Vector Machine (SVM) and Decision Tree (DT) algorithms. The tests carried out on dataset collected in Nur Hidayah Hospital of Yogyakarta in Indonesia show a great improvement compared to previous studies. These results confirm that machine learning methods could provide efficient and effective tools to reinforce the automatic decision-making processes of STT diagnostics.



Key wordsclassification      soft tissues tumours      preprocessing techniques      Support Vector Machine (SVM)      Decision Tree (DT)      machine learning      predictive diagnosis     
Received: 01 August 2020      Published: 12 January 2021
Corresponding Authors: Stéphane Cédric Koumetio Tekouabou     E-mail: abdellaoui.e@gmail.com;ctekouaboukoumetio@gmail.com;sri.hartini@sci.ui.ac.id;rustam@sci.ui.ac.id;silkan.h@ucd.ac.ma;agoujil@gmail.com
About author: El Arbi Abdellaoui Alaoui received the PhD degree in computer science from Faculty of Sciences and Technology, Errachidia, University of Moulay Ismaïl, Meknès, Morocco in 2017. Prior to this, he received the master degree in telecommunication from the National School of Applied Sciences, University of Sidi Mohamed Ben Abdallah, Fès, Morocco in 2013. He is currently a research professor at EIGSI, Casablanca and My Ismail University, Errachidia, Morocco. His research publications include mainly wireless networking, ad hoc networking, DTN networks, game theory, Internet of Things (IoT), smart cites, and optimisation.|Stéphane Cédric Koumetio Tekouabou received the PhD degree from the Faculty of Sciences, Chouaib Doukkali University, El Jadida, Morocco in 2020. He obtained the MEng degree from the National School of Applied Sciences El Jadida (ENSAJ) of the same university in 2016. He has also participated in the scientific committee of many international conferences. His research interests include computer vision (recognition, detection, and classification problems), artificial intelligence, machine learning, deep learning, and optimization.|Sri Hartini received the bachelor and master degrees from Universitas Indonesia in 2019 and 2020, respectively. She is currently pursuing the PhD degree in intelligence computation at Universitas Indonesia. She is passionately carrying out researches on machine learning, computer vision, neural networks, and deep learning in various fields.|Zuherman Rustam is an associate professor and a lecture of the intelligence computation at the Department of Mathematics, Universitas Indonesia. He obtained the master degree in informatics from the Paris Diderot University, France in 1989, and completed the PhD degree from computer science, Universitas Indonesia in 2006. His research interests are machine learning, pattern recognition, neural network, and artificial intelligence.|Hassan Silkan received the PhD degree from Sidi Mohamed Ben Abdellah University, Fès, Morocco in 2009. He is a professor at the Department of Computer Science, Faculty of Sciences, Chouaib Doukkali University, El Jadida, Morocco. His research area is shape representation and description, similarity search, content based image retrieval, database indexing, 2D/3D shapes indexing and retrieval, and multimedia databases.|Said Agoujil received the PhD and MS degrees in mathematics from Faculty of Sciences and Technology of Marrakech (FSTM), Morocco in 2008 and 2004, respectively. He is currently a professor at the Department of Computer Science, Faculty of Sciences and Technology, My Ismail University, Errachidia, Morocco. His current research interests include numerical analysis, wireless network, linear algebra, and speech coding.
Cite this article:

El Arbi Abdellaoui Alaoui,Stéphane Cédric Koumetio Tekouabou,Sri Hartini,Zuherman Rustam,Hassan Silkan,Said Agoujil. Improvement in Automated Diagnosis of Soft Tissues Tumors Using Machine Learning. Big Data Mining and Analytics, 2021, 4(1): 33-46.

URL:

http://bigdata.tsinghuajournals.com/10.26599/BDMA.2020.9020023     OR     http://bigdata.tsinghuajournals.com/Y2021/V4/I1/33

16]. (b) Nuclear labelling of tumour cells with the antibody directed against the COOH- (carboxylic acid) end of the wilm’s tumor 1 (WT-1) protein. It is an indirect reflection of the t(11; 22) (q24; q12) translocation involving the Ewing Sarcoma (EWS) genes on chromosome 22 and the WT-1 gene on chromosome 11[16,17,18]. (c) Tumor proliferation of diffuse architecture composed of small round monomorphic cells with a high nucleocytoplasmic ratio[16]. (d) Nuclear labeling of tumor cells by the antibody directed against the friend leukemia integration 1 (FLI-1) transcription factor. It is an indirect reflection of a t(11; 22) (q24; q12) translocation involving the EWS gene on chromosome 22 and the Fli-1 gene on chromosome 11[16,17,18].
">
Fig. 1 (a) Peritoneal nodule composed of tumor masses of variable size, enclosed in a large desmoplastic stroma. The tumor population is characterized by small round monomorphic cells with a high nucleocytoplasmic ratio. Some masses are centered by necrosis[16]. (b) Nuclear labelling of tumour cells with the antibody directed against the COOH- (carboxylic acid) end of the wilm’s tumor 1 (WT-1) protein. It is an indirect reflection of the t(11; 22) (q24; q12) translocation involving the Ewing Sarcoma (EWS) genes on chromosome 22 and the WT-1 gene on chromosome 11[16,17,18]. (c) Tumor proliferation of diffuse architecture composed of small round monomorphic cells with a high nucleocytoplasmic ratio[16]. (d) Nuclear labeling of tumor cells by the antibody directed against the friend leukemia integration 1 (FLI-1) transcription factor. It is an indirect reflection of a t(11; 22) (q24; q12) translocation involving the EWS gene on chromosome 22 and the Fli-1 gene on chromosome 11[16,17,18].
No.AttributeDescriptionType
1IDID of the patientNumerical
2AgeAge of the patient in yearsNumerical
3GenderMan or womanNumerical
4Kind of diseaseIf the patient is diagnosed with STT or notCategorical
5WBCNumber of white blood cells in thousands per microliter of bloodNumerical
6RBCNumber of red blood cells in millions per microliter of bloodNumerical
7HGBNumber of hemoglobin in grams per deciliter of bloodNumerical
8HCTHematocrit (the volume percentage of red blood cells in blood)Numerical
9MCVAverage volume of a red blood cell in femtoliter of bloodNumerical
10MCHAverage mass of hemoglobin per red blood cell in picograms per littroNumerical
11MCHCConcentration of hemoglobin in a red blood cell in grams per deciliter of bloodNumerical
12PLTNumber of platelet count in thousands per microliter of bloodNumerical
13Lymphocytes (%)Percentage of lymphocytes in bloodNumerical
14Monocytes (%)Percentage of monocytes in bloodNumerical
15Neutrophils (%)Percentage of neutrophils in bloodNumerical
16Blood typeBlood type of the patientCategorical
17Clotting-timeAmount of time required for a sample of blood to clot in minutesNumerical
18Bleeding timeAmount of time needed for bleeding to stop in minutesNumerical
19AGS-AGTotal protein & albumin/globuminCategorical
20Blood glucoseLevel of glucose in the bloodNumerical
Table 1 Detail about soft tissues tumor dataset.
Fig. 2 Flowchart of machine learning-based system for the automatic discrimination of soft tissue sarcomas.
Fig. 3 Principle of k-fold cross-validation.
Fig. 4 Variation of cross-validation according to the number of splits.
Fig. 5 Cross-validation before and after resampling of the soft tissues tumour data.
Fig. 6 DT model performances.
Fig. 7 SVM model performances.
AlgorithmAccuracyAUCf1-measure
DT99.096.499.3
SVM97.794.798.3
Table 2 Best performance results of model-based on DT and SVM algorithms. (%)
Fig. 8 Importance of features for DT-based model.
Fig. 9 Importance of features for SVM-based model.
AlgorithmAccuracyAUCf1-measure
DT99.096.499.3
SVM97.794.798.3
LR90.781.694.1
KNN (k=3)88.063.291.7
NB81.363.288.9
ANN (MLP)76.052.686.2
Table 3 Comparison with other algorithms and the corresponding performance of each of them. Here, MLP represents multi-layer perception. (%)
AlgorithmAccuracyAUCf1-measure
DT99.0096.4099.30
SVM97.7094.7098.30
SVM[34]57.82-66.00
Stochastic-SVM[34]64.80-72.00
Fuzzy C-means[10]71.43-81.82
SVM[10]71.43-83.33
Table 4 Comparison of the performance between our automatic classification model with previous work. (%)
[1]   Collin F., Gelly-Marty M., Binh M. B. N., and Coindre J. M., Sarcomes des tissus mous: Donneés anatomopathologiques actuelles, Cancer/Radiothérapie, vol. 10, nos. 1&2, pp. 7-14, 2006.
[2]   Juntu J., De Schepper A. M., Van Dyck P., Van Dyck D., Gielen J., Parizel P. M., and Sijbers J., Classification of soft tissue tumors by machine learning algorithms, in Soft Tissue Tumors, Derbel F., ed. London, UK: IntechOpen, 2011, pp. 53-69.
[3]   De Schepper A. M. and Bloem J. L., Soft tissue tumors: Grading, staging, and tissue-specific diagnosis, Top. Magn. Reson. Imaging, vol. 18, no. 6, pp. 431-444, 2007.
[4]   Hayashi T., Horiuchi A., Sano K., Kanai Y., Yaegashi N., Aburatani H., and Konishi I., Biological characterization of soft tissue sarcomas, Annals of Translational Medicine, vol. 22, no. 3, p. 368, 2015.
[5]   Castellano G., Bonilha L., Li L. M., and Cendes F., Texture analysis of medical images, Clin. Radiol., vol. 59, no. 12, pp. 1061-1069, 2004.
[6]   Huang Y. L., Wang K. L., and Chen D. R., Diagnosis of breast tumors with ultrasonic texture analysis using support vector machines, Neural Comput. Appl., vol. 15, no. 2, pp. 164-169, 2006.
[7]   Julesz B., Gilbert E. N., Shepp L. A., and Frisch H. L., Inability of humans to discriminate between visual textures that agree in second-order statistics—Revisited, Perception, vol. 2, no. 4, pp. 391-405, 1973.
[8]   Farhidzadeh H., Chaudhury B., Zhou M., Goldgof D. B., Hall L. O., Gatenby R. A., Gillies R. J., and Raghavan M., Prediction of treatment outcome in soft tissue sarcoma based on radiologically defined habitats, in Proc. SPIE 9414, Medical Imaging 2015: Computer-Aided Diagnosis, Orlando, FL, USA, 2015, p. 94141U.
[9]   Karanian M. and Coindre J. M., Quatrième édition de la classification OMS des tumeurs des tissus mous, Ann. Pathol., vol. 35, no. 1, pp. 71-85, 2015.
[10]   Rustam Z., Hartini S., Siswantining T., Utami D. A., and Putri N. K., Comparison between fuzzy kernel C-means, fuzzy kernel possibilistic C-means and support vector machines in soft tissue tumor classification, in Advanced Intelligent Systems for Sustainable Development (AI2SD’2019), Ezziyyani M., ed. Cham, Germany: Springer, 2020, pp. 92-105.
[11]   Xu H. S., Wang L., and Gan W. L., Application of improved decision tree method based on rough set in building smart medical analysis CRM system, Int. J. Smart Home, vol. 10, no. 1, pp. 251-266, 2016.
[12]   Afonso P. D. and Mascarenhas V. V., Imaging techniques for the diagnosis of soft tissue tumors, Rep. Med. Imaging, vol. 8, pp. 63-70, 2015.
[13]   Fletcher C. D. M., Unni K. K., and Mertens F., Pathology and Genetics of Tumours of Soft Tissue and Bone. Lyon, France: IARC Press, 2002.
[14]   Fletcher C. D. M., The evolving classification of soft tissue tumours: An update based on the new WHO classification, Histopathology, vol. 48, no. 1, pp. 3-12, 2006.
[15]   Fletcher C. D. M., The evolving classification of soft tissue tumours-An update based on the new 2013 WHO classification, Histopathology, vol. 64, no. 1, pp. 2-11, 2014.
[16]   Guillou C. G. L., Tumeurs des tissus mous: R?le du pathologiste dans l’approche diagnostique, Rev. Med. Suisse, vol. 3, p. 32473, 2007.
[17]   Marec-Bérard P., Chotel F., and Claude L., PNET/Ewing tumours: Current treatments and future perspectives, Bull. Cancer, vol. 97, no. 6, pp. 707-713, 2010.
[18]   Scotlandi K., Remondini D., Castellani G., Manara M. C., Nardi F., Cantiani L., Francesconi M., Mercuri M., Caccuri A. M., Serra M., et al., Overcoming resistance to conventional drugs in Ewing sarcoma and identification of molecular predictors of outcome, Journal of Clinical Oncology, vol. 27, no. 13, pp. 2209-2216, 2009.
[19]   Komura D. and Ishikawa S., Machine learning methods for histopathological image analysis, Comput. Struct. Biotechnol. J., vol. 16, pp. 34-42, 2018.
[20]   Koumetio C. S. T., Cherif W., and Hassan S., Optimizing the prediction of telemarketing target calls by a classification technique, in Proc. 2018 6th Int. Conf. on Wireless Networks and Mobile Communications, Marrakesh, Morocco, 2018, pp. 1-6.
[21]   Tekouabou S. C. K., Cherif W., and Silkan H., A data modeling approach for classification problems: application to bank telemarketing prediction, in Proc. 2ndInt. Conf. on Networking, Information Systems & Security, Rabat, Morocco, 2019, pp. 1-7.
[22]   Lakshminarayan K., Harp S. A., and Samad T., Imputation of missing data in industrial databases, Appl. Intell., vol. 11, no. 3, pp. 259-275, 1999.
[23]   Jindal A., Dua A., Kaur K., Singh M., Kumar N., and Mishra S., Decision tree and SVM-based data analytics for theft detection in smart grid, IEEE Trans. Ind. Inform., vol. 12, no. 3, pp. 1005-1016, 2016.
[24]   Chang Y. W., Hsieh C. J., Chang K. W., Ringgaard M., and Lin C. J., Training and testing low-degree polynomial data mappings via linear SVM, J. Mach. Learn. Res., vol. 11, pp. 1471-1490, 2010.
[25]   Shrivastava N. A., Khosravi A., and Panigrahi B. K., Prediction interval estimation of electricity prices using PSO-tuned support vector machines, IEEE Trans. Ind. Inform., vol. 11, no. 2, pp. 322-331, 2015.
[26]   Keerthi S. S. and Lin C. J., Asymptotic behaviors of support vector machines with Gaussian kernel, Neural Comput., vol. 15, no. 7, pp. 1667-1689, 2003.
[27]   Lippert R. A. and Rifkin R. M., Infinite-σ limits for Tikhonov regularization, J. Mach. Learn. Res., vol. 7, pp. 855-876, 2006.
[28]   Ruggieri S., Efficient C4.5 [classification algorithm], IEEE Trans. Knowl. Data Eng., vol. 14, no. 2, pp. 438-444, 2002.
[29]   Bashir S., Qamar U., Khan F. H., and Javed M. Y., An efficient rule-based classification of Diabetes using ID3, C4.5, & CART ensembles, in Proc. 2014 12th Int. Conf. on Frontiers of Information Technology, Islamabad, Pakistan, 2014, pp. 226-231.
[30]   Salzberg S. L., Book review: C4.5: Programs for machine learning by J. Ross Quinlan. Morgan Kaufmann publishers, inc., 1993, Mach. Learn, vol. 16, no. 3, pp. 235-240, 1994.
[31]   Raschka S. and Mirjalili V., Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow. Birmingham, UK: Packt Publishing, 2019.
[32]   Marinakos G. and Daskalaki S., Imbalanced customer classification for bank direct marketing, J. Mark. Anal., vol. 5, no. 1, pp. 14-30, 2017.
[33]   Young S., Huang C. H., and McDermott M., Internationalization and competitive catch-up processes: Case study evidence on Chinese multinational enterprises, Manage. Int. Rev., vol. 36, no. 4, 295-314, 1996.
[34]   Zahras D., Rustam Z., and Sarwinda D., Soft tissue tumor classification using stochastic support vector machine, IOP Conf. Ser. Mater. Sci. Eng, vol. 546, no. 5, p. 052089, 2019.
[35]   Zhang Y., Zhu Y. F., Shi X. M., Tao J., Cui J. J., Dai Y., Zheng M. T., and Wang S. W., Soft tissue sarcomas: Preoperative predictive histopathological grading based on radiomics of MRI, Acad. Radiol., vol. 26, no 9, pp. 1262-1268, 2019.
[36]   Lee Y., Seo J. B., Lee J. G., Kim S. S., Kim N., and Kang S. H., Performance testing of several classifiers for differentiating obstructive lung diseases based on texture analysis at high-resolution computerized tomography (HRCT), Comput. Methods Programs Biomed., vol. 93, no. 2, pp. 206-215, 2009.
[37]   Juntu J., Sijbers J., De Backer S., Rajan J., and van Dyck D., Machine learning study of several classifiers trained with texture analysis features to differentiate benign from malignant soft-tissue tumors in T1-MRI images, J. Magn. Reson. Imaging, vol. 31, no 3, pp. 680-689, 2010.
[38]   Boone J. M., Lindfors K. K., Beatty C. S., and Seibert J. A., A breast density index for digital mammograms based on radiologists’ randing, J. Digit. Imaging, vol. 11, no. 3, p. 101, 1998.
[1] Azidine Guezzaz,Younes Asimi,Mourade Azrour,Ahmed Asimi. Mathematical Validation of Proposed Machine Learning Classifier for Heterogeneous Traffic and Anomaly Detection[J]. Big Data Mining and Analytics, 2021, 4(1): 18-24.
[2] Mei Lu,Fanzhang Li. Survey on Lie Group Machine Learning[J]. Big Data Mining and Analytics, 2020, 3(4): 235-258.
[3] Wenjie Liu,Guoqing Wu,Fuji Ren,Xin Kang. DFF-ResNet: An Insect Pest Recognition Model Based on Residual Networks[J]. Big Data Mining and Analytics, 2020, 3(4): 300-310.
[4] Madichetty Sreenivasulu, M. Sridevi. Comparative Study of Statistical Features to Detect the Target Event During Disaster[J]. Big Data Mining and Analytics, 2020, 3(2): 121-130.
[5] Farid Ablayev, Marat Ablayev, Joshua Zhexue Huang, Kamil Khadiev, Nailya Salikhova, Dingming Wu. On Quantum Methods for Machine Learning Problems Part I: Quantum Tools[J]. Big Data Mining and Analytics, 2020, 03(01): 41-55.
[6] Farid Ablayev, Marat Ablayev, Joshua Zhexue Huang, Kamil Khadiev, Nailya Salikhova, Dingming Wu. On Quantum Methods for Machine Learning Problems Part II: Quantum Classification Algorithms[J]. Big Data Mining and Analytics, 2020, 03(01): 56-67.
[7] James Palmer, Victor S. Sheng, Travis Atkison, Bernard Chen. Classification on Grade, Price, and Region with Multi-Label and Multi-Target Methods in Wineinformatics[J]. Big Data Mining and Analytics, 2020, 03(01): 1-12.
[8] Yang Yang, Nengjun Zhu, Yifeng Wu, Jian Cao, Dechuan Zhan, Hui Xiong. A Semi-Supervised Attention Model for Identifying Authentic Sneakers[J]. Big Data Mining and Analytics, 2020, 03(01): 29-40.
[9] Wanling Liu, Weikun Wu, Yingming Wang, Yanggeng Fu, Yanqing Lin. Selective Ensemble Learning Method for Belief-Rule-Base Classification System Based on PAES[J]. Big Data Mining and Analytics, 2019, 2(4): 306-318.
[10] Mondher Bouazizi, Tomoaki Ohtsuki. Multi-Class Sentiment Analysis on Twitter: Classification Performance and Challenges[J]. Big Data Mining and Analytics, 2019, 2(3): 181-194.
[11] Jin Liu, Yi Pan, Min Li, Ziyue Chen, Lu Tang, Chengqian Lu, Jianxin Wang. Applications of Deep Learning to MRI Images: A Survey[J]. Big Data Mining and Analytics, 2018, 1(1): 1-18.
[12] Bo Zhao, Hucheng Zhou, Guoqiang Li, Yihua Huang. ZenLDA: Large-Scale Topic Model Training on Distributed Data-Parallel Platform[J]. Big Data Mining and Analytics, 2018, 1(1): 57-74.
[13] Ji Feng, Yan Wei, Qingsheng Zhu. Natural Neighborhood-Based Classification Algorithm Without Parameter k[J]. Big Data Mining and Analytics, 2018, 01(04): 257-265.
[14] Chenxi Yang, Yang Chen, Qingyuan Gong, Xinlei He, Yu Xiao, Yuhuan Huang, Xiaoming Fu. Understanding the Behavioral Differences Between American and German Users: A Data-Driven Study[J]. Big Data Mining and Analytics, 2018, 01(04): 284-296.
[15] Ning Yu, Zhihua Li, Zeng Yu. Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning[J]. Big Data Mining and Analytics, 2018, 01(03): 191-210.