Top Read Articles Published in last 1 year | In last 2 years | In last 3 years | All
Please wait a minute...
 Select Analysis of Protein-Ligand Interactions of SARS-CoV-2 Against Selective Drug Using Deep Neural Networks Natarajan Yuvaraj,Kannan Srihari,Selvaraj Chandragandhi,Rajan Arshath Raja,Gaurav Dhiman,Amandeep Kaur Big Data Mining and Analytics   2021, 4 (2): 76-83.   DOI: 10.26599/BDMA.2020.9020007 Accepted: 09 July 2020 Online available: 09 July 2020 Abstract （246）   HTML （14）    PDF （1660KB）（265）       In recent time, data analysis using machine learning accelerates optimized solutions on clinical healthcare systems. The machine learning methods greatly offer an efficient prediction ability in diagnosis system alternative with the clinicians. Most of the systems operate on the extracted features from the patients and most of the predicted cases are accurate. However, in recent time, the prevalence of COVID-19 has emerged the global healthcare industry to find a new drug that suppresses the pandemic outbreak. In this paper, we design a Deep Neural Network (DNN) model that accurately finds the protein-ligand interactions with the drug used. The DNN senses the response of protein-ligand interactions for a specific drug and identifies which drug makes the interaction that combats effectively the virus. With limited genome sequence of Indian patients submitted to the GISAID database, we find that the DNN system is effective in identifying the protein-ligand interactions for a specific drug.
 Select Survey on Lie Group Machine Learning Mei Lu,Fanzhang Li Big Data Mining and Analytics   2020, 3 (4): 235-258.   DOI: 10.26599/BDMA.2020.9020011 Abstract （105）   HTML （2）    PDF （1364KB）（61）       Lie group machine learning is recognized as the theoretical basis of brain intelligence, brain learning, higher machine learning, and higher artificial intelligence. Sample sets of Lie group matrices are widely available in practical applications. Lie group learning is a vibrant field of increasing importance and extraordinary potential and thus needs to be developed further. This study aims to provide a comprehensive survey on recent advances in Lie group machine learning. We introduce Lie group machine learning techniques in three major categories: supervised Lie group machine learning, semisupervised Lie group machine learning, and unsupervised Lie group machine learning. In addition, we introduce the special application of Lie group machine learning in image processing. This work covers the following techniques: Lie group machine learning model, Lie group subspace orbit generation learning, symplectic group learning, quantum group learning, Lie group fiber bundle learning, Lie group cover learning, Lie group deep structure learning, Lie group semisupervised learning, Lie group kernel learning, tensor learning, frame bundle connection learning, spectral estimation learning, Finsler geometric learning, homology boundary learning, category representation learning, and neuromorphic synergy learning. Overall, this survey aims to provide an insightful overview of state-of-the-art development in the field of Lie group machine learning. It will enable researchers to comprehensively understand the state of the field, identify the most appropriate tools for particular applications, and identify directions for future research.
 Select Survey on Data Analysis in Social Media: A Practical Application Aspect Qixuan Hou,Meng Han,Zhipeng Cai Big Data Mining and Analytics   2020, 3 (4): 259-279.   DOI: 10.26599/BDMA.2020.9020006 Abstract （105）   HTML （0）    PDF （1746KB）（39）       Social media has more than three billion users sharing events, comments, and feelings throughout the world. It serves as a critical information source with large volumes, high velocity, and a wide variety of data. The previous studies on information spreading, relationship analyzing, and individual modeling, etc., have been heavily conducted to explore the tremendous social and commercial values of social media data. This survey studies the previous literature and the existing applications from a practical perspective. We outline a commonly used pipeline in building social media-based applications and focus on discussing available analysis techniques, such as topic analysis, time series analysis, sentiment analysis, and network analysis. After that, we present the impacts of such applications in three different areas, including disaster management, healthcare, and business. Finally, we list existing challenges and suggest promising future research directions in terms of data privacy, 5G wireless network, and multilingual support.
 Select DFF-ResNet: An Insect Pest Recognition Model Based on Residual Networks Wenjie Liu,Guoqing Wu,Fuji Ren,Xin Kang Big Data Mining and Analytics   2020, 3 (4): 300-310.   DOI: 10.26599/BDMA.2020.9020021 Abstract （94）   HTML （1）    PDF （15426KB）（19）       Insect pest control is considered as a significant factor in the yield of commercial crops. Thus, to avoid economic losses, we need a valid method for insect pest recognition. In this paper, we proposed a feature fusion residual block to perform the insect pest recognition task. Based on the original residual block, we fused the feature from a previous layer between two 1$×$1 convolution layers in a residual signal branch to improve the capacity of the block. Furthermore, we explored the contribution of each residual group to the model performance. We found that adding the residual blocks of earlier residual groups promotes the model performance significantly, which improves the capacity of generalization of the model. By stacking the feature fusion residual block, we constructed the Deep Feature Fusion Residual Network (DFF-ResNet). To prove the validity and adaptivity of our approach, we constructed it with two common residual networks (Pre-ResNet and Wide Residual Network (WRN)) and validated these models on the Canadian Institute For Advanced Research (CIFAR) and Street View House Number (SVHN) benchmark datasets. The experimental results indicate that our models have a lower test error than those of baseline models. Then, we applied our models to recognize insect pests and obtained validity on the IP102 benchmark dataset. The experimental results show that our models outperform the original ResNet and other state-of-the-art methods.
 Select CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization Yuchen Zhang,Xiujuan Lei,Zengqiang Fang,Yi Pan Big Data Mining and Analytics   2020, 3 (4): 280-291.   DOI: 10.26599/BDMA.2020.9020025 Abstract （91）   HTML （0）    PDF （4531KB）（61）       Circular RNA (circRNA) is a novel non-coding endogenous RNAs. Evidence has shown that circRNAs are related to many biological processes and play essential roles in different biological functions. Although increasing numbers of circRNAs are discovered using high-throughput sequencing technologies, these techniques are still time-consuming and costly. In this study, we propose a computational method to predict circRNA-disesae associations which is based on metapath2vec++ and matrix factorization with integrated multiple data (called PCD_MVMF). To construct more reliable networks, various aspects are considered. Firstly, circRNA annotation, sequence, and functional similarity networks are established, and disease-related genes and semantics are adopted to construct disease functional and semantic similarity networks. Secondly, metapath2vec++ is applied on an integrated heterogeneous network to learn the embedded features and initial prediction score. Finally, we use matrix factorization, take similarity as a constraint, and optimize it to obtain the final prediction results. Leave-one-out cross-validation, five-fold cross-validation, and f-measure are adopted to evaluate the performance of PCD_MVMF. These evaluation metrics verify that PCD_MVMF has better prediction performance than other methods. To further illustrate the performance of PCD_MVMF, case studies of common diseases are conducted. Therefore, PCD_MVMF can be regarded as a reliable and useful circRNA-disease association prediction tool.
 Select Intelligent Monitoring System for Biogas Detection Based on the Internet of Things: Mohammedia, Morocco City Landfill Case Jamal Mabrouki,Mourade Azrour,Ghizlane Fattah,Driss Dhiba,Souad El Hajjaji Big Data Mining and Analytics   2021, 4 (1): 10-17.   DOI: 10.26599/BDMA.2020.9020017 Abstract （91）   HTML （4）    PDF （1847KB）（56）       Mechanization is a depollution activity, because it provides an energetic and ecological response to the problem of organic waste treatment. Through burning, biogas from mechanization reduces gas pollution from fermentation by a factor of 20. This study aims to better understand the influence of the seasons on the emitted biogas in the landfill of the city Mohammedia. The composition of the biogas that naturally emanates from the landfill has been continuously analyzed by our intelligent system, from different wells drilled in recent and old waste repositories. During the rainy season, the average production of methane, carbon dioxide, and oxygen and nitrogen are currently 56%, 32%, and 1%, respectively, compared to 51%, 31%, and 0.8%, respectively, for old waste. Hazards levels, potential fire, and explosion risks associated with biogas are lower than those of natural gases in most cases. For this reason a system is proposed to measure and monitor the biogas production of the landfill site remotely. Measurement results carried out at various sites of the landfill in the city of Mohammedia by the system show that the biogas contents present dangers and sanitary risks which are of another order.
 Select Analysis and Predictions of Spread, Recovery, and Death Caused by COVID-19 in India Rajani Kumari,Sandeep Kumar,Ramesh Chandra Poonia,Vijander Singh,Linesh Raja,Vaibhav Bhatnagar,Pankaj Agarwal Big Data Mining and Analytics   2021, 4 (2): 65-75.   DOI: 10.26599/BDMA.2020.9020013 Abstract （84）   HTML （16）    PDF （20954KB）（34）       The novel coronavirus outbreak was first reported in late December 2019 and more than 7 million people were infected with this disease and over 0.40 million worldwide lost their lives. The first case was diagnosed on 30 January 2020 in India and the figure crossed 0.24 million as of 6 June 2020. This paper presents a detailed study of recently developed forecasting models and predicts the number of confirmed, recovered, and death cases in India caused by COVID-19. The correlation coefficients and multiple linear regression applied for prediction and autocorrelation and autoregression have been used to improve the accuracy. The predicted number of cases shows a good agreement with 0.9992 R-squared score to the actual values. The finding suggests that lockdown and social distancing are two important factors that can help to suppress the increasing spread rate of COVID-19.
 Select IoT-Based Data Logger for Weather Monitoring Using Arduino-Based Wireless Sensor Networks with Remote Graphical Application and Alerts Jamal Mabrouki,Mourade Azrour,Driss Dhiba,Yousef Farhaoui,Souad El Hajjaji Big Data Mining and Analytics   2021, 4 (1): 25-32.   DOI: 10.26599/BDMA.2020.9020018 Abstract （83）   HTML （1）    PDF （4478KB）（52）       In recent years, the monitoring systems play significant roles in our life. So, in this paper, we propose an automatic weather monitoring system that allows having dynamic and real-time climate data of a given area. The proposed system is based on the internet of things technology and embedded system. The system also includes electronic devices, sensors, and wireless technology. The main objective of this system is sensing the climate parameters, such as temperature, humidity, and existence of some gases, based on the sensors. The captured values can then be sent to remote applications or databases. Afterwards, the stored data can be visualized in graphics and tables form.
 Select Improvement in Automated Diagnosis of Soft Tissues Tumors Using Machine Learning El Arbi Abdellaoui Alaoui,Stéphane Cédric Koumetio Tekouabou,Sri Hartini,Zuherman Rustam,Hassan Silkan,Said Agoujil Big Data Mining and Analytics   2021, 4 (1): 33-46.   DOI: 10.26599/BDMA.2020.9020023 Abstract （80）   HTML （0）    PDF （5099KB）（58）       Soft Tissue Tumors (STT) are a form of sarcoma found in tissues that connect, support, and surround body structures. Because of their shallow frequency in the body and their great diversity, they appear to be heterogeneous when observed through Magnetic Resonance Imaging (MRI). They are easily confused with other diseases such as fibroadenoma mammae, lymphadenopathy, and struma nodosa, and these diagnostic errors have a considerable detrimental effect on the medical treatment process of patients. Researchers have proposed several machine learning models to classify tumors, but none have adequately addressed this misdiagnosis problem. Also, similar studies that have proposed models for evaluation of such tumors mostly do not consider the heterogeneity and the size of the data. Therefore, we propose a machine learning-based approach which combines a new technique of preprocessing the data for features transformation, resampling techniques to eliminate the bias and the deviation of instability and performing classifier tests based on the Support Vector Machine (SVM) and Decision Tree (DT) algorithms. The tests carried out on dataset collected in Nur Hidayah Hospital of Yogyakarta in Indonesia show a great improvement compared to previous studies. These results confirm that machine learning methods could provide efficient and effective tools to reinforce the automatic decision-making processes of STT diagnostics.
 Select New Enhanced Authentication Protocol for Internet of Things Mourade Azrour,Jamal Mabrouki,Azedine Guezzaz,Yousef Farhaoui Big Data Mining and Analytics   2021, 4 (1): 1-9.   DOI: 10.26599/BDMA.2020.9020010 Abstract （78）   HTML （0）    PDF （1097KB）（61）       Internet of Things (IoT) refers to a new extended network that enables to any object to be linked to the Internet in order to exchange data and to be controlled remotely. Nowadays, due to its multiple advantages, the IoT is useful in many areas like environment, water monitoring, industry, public security, medicine, and so on. For covering all spaces and operating correctly, the IoT benefits from advantages of other recent technologies, like radio frequency identification, wireless sensor networks, big data, and mobile network. However, despite of the integration of various things in one network and the exchange of data among heterogeneous sources, the security of user’s data is a central question. For this reason, the authentication of interconnected objects is received as an interested importance. In 2012, Ye et al. suggested a new authentication and key exchanging protocol for Internet of things devices. However, we have proved that their protocol cannot resist to various attacks. In this paper, we propose an enhanced authentication protocol for IoT. Furthermore, we present the comparative results between our proposed scheme and other related ones.
 Select Multi-Attention Fusion Modeling for Sentiment Analysis of Educational Big Data Guanlin Zhai,Yan Yang,Heng Wang,Shengdong Du Big Data Mining and Analytics   2020, 3 (4): 311-319.   DOI: 10.26599/BDMA.2020.9020024 Abstract （73）   HTML （0）    PDF （937KB）（60）       As an important branch of natural language processing, sentiment analysis has received increasing attention. In teaching evaluation, sentiment analysis can help educators discover the true feelings of students about the course in a timely manner and adjust the teaching plan accurately and timely to improve the quality of education and teaching. Aiming at the inefficiency and heavy workload of college curriculum evaluation methods, a Multi-Attention Fusion Modeling (Multi-AFM) is proposed, which integrates global attention and local attention through gating unit control to generate a reasonable contextual representation and achieve improved classification results. Experimental results show that the Multi-AFM model performs better than the existing methods in the application of education and other fields.
 Select Multivariate Deep Learning Approach for Electric Vehicle Speed Forecasting Youssef Nait Malek,Mehdi Najib,Mohamed Bakhouya,Mohammed Essaaidi Big Data Mining and Analytics   2021, 4 (1): 56-64.   DOI: 10.26599/BDMA.2020.9020027 Abstract （69）   HTML （0）    PDF （7075KB）（8）       Speed forecasting has numerous applications in intelligent transport systems’ design and control, especially for safety and road efficiency applications. In the field of electromobility, it represents the most dynamic parameter for efficient online in-vehicle energy management. However, vehicles’ speed forecasting is a challenging task, because its estimation is closely related to various features, which can be classified into two categories, endogenous and exogenous features. Endogenous features represent electric vehicles’ characteristics, whereas exogenous ones represent its surrounding context, such as traffic, weather, and road conditions. In this paper, a speed forecasting method based on the Long Short-Term Memory (LSTM) is introduced. The LSTM model training is performed upon a dataset collected from a traffic simulator based on real-world data representing urban itineraries. The proposed models are generated for univariate and multivariate scenarios and are assessed in terms of accuracy for speed forecasting. Simulation results show that the multivariate model outperforms the univariate model for short- and long-term forecasting.
 Select Prediction of COVID-19 Confirmed, Death, and Cured Cases in India Using Random Forest Model Vishan Kumar Gupta,Avdhesh Gupta,Dinesh Kumar,Anjali Sardana Big Data Mining and Analytics   2021, 4 (2): 116-123.   DOI: 10.26599/BDMA.2020.9020016 Abstract （66）   HTML （0）    PDF （1236KB）（31）       A novel coronavirus (SARS-CoV-2) is an unusual viral pneumonia in patients, first found in late December 2019, latter it declared a pandemic by World Health Organizations because of its fatal effects on public health. In this present, cases of COVID-19 pandemic are exponentially increasing day by day in the whole world. Here, we are detecting the COVID-19 cases, i.e., confirmed, death, and cured cases in India only. We are performing this analysis based on the cases occurring in different states of India in chronological dates. Our dataset contains multiple classes so we are performing multi-class classification. On this dataset, first, we performed data cleansing and feature selection, then performed forecasting of all classes using random forest, linear model, support vector machine, decision tree, and neural network, where random forest model outperformed the others, therefore, the random forest is used for prediction and analysis of all the results. The K-fold cross-validation is performed to measure the consistency of the model.
 Select Hybrid Recommender System for Tourism Based on Big Data and AI: A Conceptual Framework Khalid AL Fararni,Fouad Nafis,Badraddine Aghoutane,Ali Yahyaouy,Jamal Riffi,Abdelouahed Sabri Big Data Mining and Analytics   2021, 4 (1): 47-55.   DOI: 10.26599/BDMA.2020.9020015 Abstract （66）   HTML （0）    PDF （6284KB）（43）       With the development of the Internet, technology, and means of communication, the production of tourist data has multiplied at all levels (hotels, restaurants, transport, heritage, tourist events, activities, etc.), especially with the development of Online Travel Agency (OTA). However, the list of possibilities offered to tourists by these Web search engines (or even specialized tourist sites) can be overwhelming and relevant results are usually drowned in informational "noise", which prevents, or at least slows down the selection process. To assist tourists in trip planning and help them to find the information they are looking for, many recommender systems have been developed. In this article, we present an overview of the various recommendation approaches used in the field of tourism. From this study, an architecture and a conceptual framework for tourism recommender system are proposed, based on a hybrid recommendation approach. The proposed system goes beyond the recommendation of a list of tourist attractions, tailored to tourist preferences. It can be seen as a trip planner that designs a detailed program, including heterogeneous tourism resources, for a specific visit duration. The ultimate goal is to develop a recommender system based on big data technologies, artificial intelligence, and operational research to promote tourism in Morocco, specifically in the Daraa-Tafilalet region.
 Select Mathematical Validation of Proposed Machine Learning Classifier for Heterogeneous Traffic and Anomaly Detection Azidine Guezzaz,Younes Asimi,Mourade Azrour,Ahmed Asimi Big Data Mining and Analytics   2021, 4 (1): 18-24.   DOI: 10.26599/BDMA.2020.9020019 Abstract （61）   HTML （0）    PDF （7030KB）（12）       The modeling of an efficient classifier is a fundamental issue in automatic training involving a large volume of representative data. Hence, automatic classification is a major task that entails the use of training methods capable of assigning classes to data objects by using the input activities presented to learn classes. The recognition of new elements is possible based on predefined classes. Intrusion detection systems suffer from numerous vulnerabilities during analysis and classification of data activities. To overcome this problem, new analysis methods should be derived so as to implement a relevant system to monitor circulated traffic. The main objective of this study is to model and validate a heterogeneous traffic classifier capable of categorizing collected events within networks. The new model is based on a proposed machine learning algorithm that comprises an input layer, a hidden layer, and an output layer. A reliable training algorithm is proposed to optimize the weights, and a recognition algorithm is used to validate the model. Preprocessing is applied to the collected traffic prior to the analysis step. This work aims to describe the mathematical validation of a new machine learning classifier for heterogeneous traffic and anomaly detection.
 Select Effect of E-Learning on Public Health and Environment During COVID-19 Lockdown Avani Agarwal,Sahil Sharma,Vijay Kumar,Manjit Kaur Big Data Mining and Analytics   2021, 4 (2): 104-115.   DOI: 10.26599/BDMA.2020.9020014 Abstract （59）   HTML （0）    PDF （3067KB）（51）       E-learning is the most promising venture in the entire world. During the COVID-19 lockdown, e-learning is successfully providing potential information to the students and researchers. In developing nations like India, with limited resources, e-learning tools and platforms provide a chance to make education available to middle and low income households. This paper gives insights about three different online services, namely Google Classroom, Zoom, and Microsoft Teams being used by three different educational institutions. We aim to analyze the efficiency and acceptability of e-learning tools among Indian students during the COVID-19 lockdown. The paper also aims to evaluate the impact of e-learning on the environment and public health during COVID-19 lockdown. It is found that e-learning has potential to reduce carbon emissions, which has beneficial impact on the environment. However, the mental health is impacted as e-learning may lead to self-isolation and reduction in academic achievements that may lead to anxiety and mental depression. Due to usage of electronic devices for learning, the eyes and neck muscles may be put in strain, having deleterious effects on physical health.
 Select Improvising Personalized Travel Recommendation System with Recency Effects Paromita Nitu,Joseph Coelho,Praveen Madiraju Big Data Mining and Analytics   2021, 4 (3): 139-154.   DOI: 10.26599/BDMA.2020.9020026 Abstract （58）   HTML （1）    PDF （3406KB）（52）       A travel recommendation system based on social media activity provides a customized place of interest to accommodate user-specific needs and preferences. In general, the user’s inclination towards travel destinations is subject to change over time. In this project, we have analyzed users’ twitter data, as well as their friends and followers in a timely fashion to understand recent travel interest. A machine learning classifier identifies tweets relevant to travel. The travel tweets are then used to obtain personalized travel recommendations. Unlike most of the personalized recommendation systems, our proposed model takes into account a user’s most recent interest by incorporating time-sensitive recency weight into the model. Our proposed model has outperformed the existing personalized place of interest recommendation model, and the overall accuracy is 75.23%.
 Select A Survey on Algorithms for Intelligent Computing and Smart City Applications Zhao Tong,Feng Ye,Ming Yan,Hong Liu,Sunitha Basodi Big Data Mining and Analytics   2021, 4 (3): 155-172.   DOI: 10.26599/BDMA.2020.9020029 Abstract （57）   HTML （2）    PDF （12470KB）（23）       With the rapid development of human society, the urbanization of the world’s population is also progressing rapidly. Urbanization has brought many challenges and problems to the development of cities. For example, the urban population is under excessive pressure, various natural resources and energy are increasingly scarce, and environmental pollution is increasing, etc. However, the original urban model has to be changed to enable people to live in greener and more sustainable cities, thus providing them with a more convenient and comfortable living environment. The new urban framework, the smart city, provides excellent opportunities to meet these challenges, while solving urban problems at the same time. At this stage, many countries are actively responding to calls for smart city development plans. This paper investigates the current stage of the smart city. First, it introduces the background of smart city development and gives a brief definition of the concept of the smart city. Second, it describes the framework of a smart city in accordance with the given definition. Finally, various intelligent algorithms to make cities smarter, along with specific examples, are discussed and analyzed.
 Select An Advanced Uncertainty Measure Using Fuzzy Soft Sets: Application to Decision-Making Problems Nitin Bhardwaj,Pallvi Sharma Big Data Mining and Analytics   2021, 4 (2): 94-103.   DOI: 10.26599/BDMA.2020.9020020 Abstract （51）   HTML （0）    PDF （2613KB）（70）       In this paper, uncertainty has been measured in the form of fuzziness which arises due to imprecise boundaries of fuzzy sets. Uncertainty caused due to human’s cognition can be decreased by the use of fuzzy soft sets. There are different approaches to deal with the measurement of uncertainty. The method we proposed uses fuzzified evidence theory to calculate total degree of fuzziness of the parameters. It consists of mainly four parts. The first part is to measure uncertainties of parameters using fuzzy soft sets and then to modulate the uncertainties calculated. Afterward, the appropriate basic probability assignments with respect to each parameter are produced. In the last, we use Dempster’s rule of combination to fuse independent parameters into integrated one. To validate the proposed method, we perform an experiment and compare our outputs with grey relational analysis method. Also, a medical diagnosis application in reference to COVID-19 has been given to show the effectiveness of advanced method by comparing with other method.
 Select Diagnosis of COVID-19 from Chest X-Ray Images Using Wavelets-Based Depthwise Convolution Network Krishna Kant Singh,Akansha Singh Big Data Mining and Analytics   2021, 4 (2): 84-93.   DOI: 10.26599/BDMA.2020.9020012 Abstract （44）   HTML （1）    PDF （7953KB）（26）       Coronavirus disease 2019 also known as COVID-19 has become a pandemic. The disease is caused by a beta coronavirus called Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). The severity of the disease can be understood by the massive number of deaths and affected patients globally. If the diagnosis is fast-paced, the disease can be controlled in a better manner. Laboratory tests are available for diagnosis, but they are bounded by available testing kits and time. The use of radiological examinations that comprise Computed Tomography (CT) can be used for the diagnosis of the disease. Specifically, chest X-Ray images can be analysed to identify the presence of COVID-19 in a patient. In this paper, an automated method for the diagnosis of COVID-19 from the chest X-Ray images is proposed. The method presents an improved depthwise convolution neural network for analysing the chest X-Ray images. Wavelet decomposition is applied to integrate multiresolution analysis in the network. The frequency sub-bands obtained from the input images are fed in the network for identifying the disease. The network is designed to predict the class of the input image as normal, viral pneumonia, and COVID-19. The predicted output from the model is combined with Grad-CAM visualization for diagnosis. A comparative study with the existing methods is also performed. The metrics like accuracy, sensitivity, and F1-measure are calculated for performance evaluation. The performance of the proposed method is better than the existing methodologies and thus can be used for the effective diagnosis of the disease.
 Select Effective Density-Based Clustering Algorithms for Incomplete Data Zhonghao Xue,Hongzhi Wang Big Data Mining and Analytics   2021, 4 (3): 183-194.   DOI: 10.26599/BDMA.2021.9020001 Abstract （42）   HTML （0）    PDF （5624KB）（22）       Density-based clustering is an important category among clustering algorithms. In real applications, many datasets suffer from incompleteness. Traditional imputation technologies or other techniques for handling missing values are not suitable for density-based clustering and decrease clustering result quality. To avoid these problems, we develop a novel density-based clustering approach for incomplete data based on Bayesian theory, which conducts imputation and clustering concurrently and makes use of intermediate clustering results. To avoid the impact of low-density areas inside non-convex clusters, we introduce a local imputation clustering algorithm, which aims to impute points to high-density local areas. The performances of the proposed algorithms are evaluated using ten synthetic datasets and five real-world datasets with induced missing values. The experimental results show the effectiveness of the proposed algorithms.
 Select LotusSQL: SQL Engine for High-Performance Big Data Systems Xiaohan Li,Bowen Yu,Guanyu Feng,Haojie Wang,Wenguang Chen Big Data Mining and Analytics   2021, 4 (4): 252-265.   DOI: 10.26599/BDMA.2021.9020009 Abstract （16）   HTML （0）    PDF （2651KB）（7）       In recent years, Apache Spark has become the de facto standard for big data processing. SparkSQL is a module offering support for relational analysis on Spark with Structured Query Language (SQL). SparkSQL provides convenient data processing interfaces. Despite its efficient optimizer, SparkSQL still suffers from the inefficiency of Spark resulting from Java virtual machine and the unnecessary data serialization and deserialization. Adopting native languages such as C++ could help to avoid such bottlenecks. Benefiting from a bare-metal runtime environment and template usage, systems with C++ interfaces usually achieve superior performance. However, the complexity of native languages also increases the required programming and debugging efforts. In this work, we present LotusSQL, an engine to provide SQL support for dataset abstraction on a native backend Lotus. We employ a convenient SQL processing framework to deal with frontend jobs. Advanced query optimization technologies are added to improve the quality of execution plans. Above the storage design and user interface of the compute engine, LotusSQL implements a set of structured dataset operations with high efficiency and integrates them with the frontend. Evaluation results show that LotusSQL achieves a speedup of up to $9×$ in certain queries and outperforms Spark SQL in a standard query benchmark by more than $2×$ on average.