Please wait a minute...
Big Data Mining and Analytics  2021, Vol. 4 Issue (4): 242-251    DOI: 10.26599/BDMA.2021.9020010
    
Coronavirus Pandemic Analysis Through Tripartite Graph Clustering in Online Social Networks
Xueting Liao(),Danyang Zheng*(),Xiaojun Cao()
Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
Suzhou Key Laboratory of Advanced Optical Communication Network Technology, School of Electronic and Information Engineering, Soochow University, Suzhou 215006, China
Download: PDF (6377 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

The COVID-19 pandemic has hit the world hard. The reaction to the pandemic related issues has been pouring into social platforms, such as Twitter. Many public officials and governments use Twitter to make policy announcements. People keep close track of the related information and express their concerns about the policies on Twitter. It is beneficial yet challenging to derive important information or knowledge out of such Twitter data. In this paper, we propose a Tripartite Graph Clustering for Pandemic Data Analysis (TGC-PDA) framework that builds on the proposed models and analysis: (1) tripartite graph representation, (2) non-negative matrix factorization with regularization, and (3) sentiment analysis. We collect the tweets containing a set of keywords related to coronavirus pandemic as the ground truth data. Our framework can detect the communities of Twitter users and analyze the topics that are discussed in the communities. The extensive experiments show that our TGC-PDA framework can effectively and efficiently identify the topics and correlations within the Twitter data for monitoring and understanding public opinions, which would provide policy makers useful information and statistics for decision making.



Key wordsCOVID-19      clustering      online social network      Twitter     
Received: 25 February 2021      Published: 30 August 2021
Corresponding Authors: Danyang Zheng     E-mail: xliao3@student.gsu.edu;drdan940606@gmail.com;cao@gsu.edu
About author: Xueting Liao received the MEng degree from Rutgers University, USA in 2013. She is currently a PhD candidate in computer science at Georgia State University, USA. Her research interests include applications of data mining and graph mining algorithms, social network related analysis, and network related algorithms.|Danyang Zheng received the PhD degree in computer science from Georgia State University, USA in 2021. He is currently an assistant professor at Soochow University, China. His research interests include network function virtualization, software-defined networks, optical networks, networking performance optimization, and combinational optimization.|Xiaojun Cao received the BEng degree from Tsinghua University, China in 1996, the MEng degree from Chinese Academy of Sciences, China in 1999, and the PhD degree in computer science from the State University of New York at Buffalo, USA in 2004. He is currently a professor at the Department of Computer Science, Georgia State University, where he leads the Advanced Network Research Group (aNet). Prior to joining Georgia State University, he was an assistant professor at the College of Computing and Information Sciences, Rochester Institute of Technology. He and his group are working on modeling, analysis, protocols/algorithms design, as well as data processing for networks and cyber physical systems. He was a distinguished lecturer of the IEEE ComSoc (2019-2020) and served as the secretary/vice chair/chair for IEEE ComSoc Optical Networking Technical Committee (ONTC). His research has been sponsored by U.S. National Science Foundation (NSF), Centers for Disease Control and Prevention (CDC), IBM, and Cisco’s University Research Program. He is a recipient of NSF Career Award, 2006-2011.
Cite this article:

Xueting Liao,Danyang Zheng,Xiaojun Cao. Coronavirus Pandemic Analysis Through Tripartite Graph Clustering in Online Social Networks. Big Data Mining and Analytics, 2021, 4(4): 242-251.

URL:

http://bigdata.tsinghuajournals.com/10.26599/BDMA.2021.9020010     OR     http://bigdata.tsinghuajournals.com/Y2021/V4/I4/242

Fig. 1 Examples of multipartite graph with different k.
Fig. 2 An example of tripartite graph co-clustering problem.
Fig. 3 An example of tripartite graph in Twitter.
SymbolDefinition
n,m,tNumber of users, tweets, and topics
G?(V,E)Graph with node set V and edge set E
U,T,HNode set of users, tweets, and topics
BMatrix representation of a bipartite graph
Pi,jNumber of paths between node i and node j
LNormalized Laplacian matrix
DDegree matrix: diagonal with [𝑫]i,i=degree?(vi)
𝑭,𝑮Decomposed matrices: 𝑭Ψn×d and 𝑮Ψk×n
SAssociation matrix: 𝑺R+d×k
ΨSet of all cluster indicator matrices
Tr?(𝑿)Trace of matrix X: Tr?(𝑿)=1nxi,i
||𝑿||FFrobenius norm of a matrix X
Table 1 Notation.
Fig. 4 An overview of the TGC-PDA framework.
Fig. 5 Build the user-topic bipartite by removing the tweet nodes of the tripartite graph and leveraging the tweet nodes as the connection for user and topic nodes.
MethodAccuracyPurityNMI
Kmeans0.6130.5490.513
NMF0.5830.5360.493
SNMF0.6270.5620.534
ONMTF0.6740.5780.557
NMFRU0.7060.6170.621
Table 2 Performance results of classifiers.
Fig. 6 Total loss with different numbers of iterations.
Fig. 7 Convergence time of methods.
KeywordPositiveNeutralNegative
marketcrash202118.248.733.1
maskshortage14.141.644.3
death4.173.122.8
NYbreak12.257.530.3
antibody30.541.328.2
stimulus32.641.725.7
testing32.738.928.4
vaccine20.461.218.4
symptoms26.348.924.8
stayathome23.651.824.6
Table 3 Largest ten communities with its polarity ratio. (%)
[1]   Everyone included: Social impact of COVID-19, , 2020.
[2]   Wikipedia, COVID-19 pandemic, , 2021.
[3]   Domestic travel during the COVID-19 pandemic, , 2020.
[4]   Travelers prohibited from entry to the United States, , 2020.
[5]   Cohen K., Tokyo 2020 Olympics officially postponed until 2021, , 2020.
[6]   Wikipedia, RNA virus, , 2021.
[7]   How does fake news of 5G and COVID-19 spread worldwide?, , 2021.
[8]   Chang L. J., Li W., Qin L., Zhang W. J., and Yang S. Y., pSCAN: Fast and exact structural graph clustering, IEEE Trans. Knowl. Data Eng., vol. 29, no. 2, pp. 387-401, 2017.
[9]   El Bacha R. and Zin T. T., Ranking of influential users based on user-tweet bipartite graph, in Proc. of 2018 IEEE Int. Conf. Service Operations and Logistics, and Informatics (SOLI), Singapore, 2018, pp. 97-101.
[10]   Rodríguez A., Argueta C., and Chen Y. L., Automatic detection of hate speech on facebook using sentiment and emotion analysis, in Proc. of 2019 Int. Conf. Artificial Intelligence in Information and Communication (ICAIIC), Okinawa, Japan, 2019, pp. 169-174.
[11]   Zhou J. and Kwan C., Missing link prediction in social networks, in Proc. 15th Int. Symp. Neural Networks, Minsk, Belarus, 2018, pp. 346-354.
[12]   Reyes-Menendez A., Saura J. R., and Alvarez-Alonso C., Understanding #worldEnvironmentDay user opinions in twitter: A topic-based sentiment analysis approach, Int. J. Environ. Res. Public Health, vol. 15, no. 11, p. 2537, 2018.
[13]   Tan C. H., Lee L. L., Tang J., Jiang L., Zhou M., and Li P., User-level sentiment analysis incorporating social networks, in Proc. 17th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, New York, NY, USA, 2011, pp. 1397-1405.
[14]   Giachanou A. and Crestani F., Like it or not: A survey of twitter sentiment analysis methods, ACM Comput. Surv., vol. 49, no. 2, p. 28, 2016.
[15]   Iyer R. R., Chen J., Sun H. N., and Xu K. Y., A heterogeneous graphical model to understand user-level sentiments in social media, arXiv preprint arXiv: 1912.07911, 2019.
[16]   Deng H. B., Han J. W., Li H., Ji H., Wang H. N., and Lu Y., Exploring and inferring user-user pseudo-friendship for sentiment analysis with heterogeneous networks, Stat. Anal. Data Min., vol. 7, no. 4, pp. 308-321, 2014.
[17]   Phillips C. A., Multipartite graph algorithms for the analysis of heterogeneous data, PhD dissertation, Univ. Tennessee, Knoxville, TN, USA, 2015.
[18]   Zhou D. W., Zhang S., Yildirim M. Y., Alcorn S., Tong H. H., Davulcu H., and He J. R., A local algorithm for structure-preserving graph cut, in Proc. 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Halifax, Canada, 2017, pp. 655-664.
[19]   Comar P. M., Tan P. N., and Jain A. K., A framework for joint community detection across multiple related networks, Neurocomputing, vol. 76, no. 1, pp. 93-104, 2012.
[20]   Sun Y. Z., Yu Y. T., and Han J. W., Ranking-based clustering of heterogeneous information networks with star network schema, in Proc. 15th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Paris, France, 2009, pp. 797-806.
[21]   Lee D. D. and Seung H. S., Algorithms for non-negative matrix factorization, in Proc. 13th Int. Conf. Neural Information Proc. Systems, Cambridge, MA, USA, 2001, pp. 535-541.
[22]   Gillis N., The why and how of nonnegative matrix factorization, arXiv preprint arXiv: 1401.5226v2, 2014.
[23]   Abdi H. and Williams L. J., Principal component analysis, WIRs Comput. Stat., vol. 2, no. 4, pp. 433-459, 2010.
[24]   Wall M. E., Rechtsteiner A., and Rocha L. M., Singular value decomposition and principal component analysis, in A Practical Approach to Microarray Data Analysis, Berrar D. P., Dubitzky W., Granzow M., eds. Norwell, MA, USA: Springer, 2003, pp. 91-109.
[25]   Ding C., Li T., Peng W., and Park H., Orthogonal nonnegative matrix t-factorizations for clustering, in Proc. 12th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 2006, pp. 126-135.
[26]   Kim D., Sra S., and Dhillon I. S., Fast newton-type methods for the least squares nonnegative matrix approximation problem, in Proc. 2007 SIAM Int. Conf. Data Mining, Minneapolis, MN, USA, 2007, pp. 343-354.
[27]   Lin C. J., On the convergence of multiplicative update algorithms for nonnegative matrix factorization, IEEE Trans. Neural Netw., vol. 18, no. 6, pp. 1589-1596, 2007.
[28]   Kim J. and Park H., Toward faster nonnegative matrix factorization: A new algorithm and comparisons, in Proc. of 2008 Eighth IEEE Int. Conf. Data Mining, Pisa, Italy, 2008, pp. 353-362.
[29]   Wang F. and Li P., Efficient nonnegative matrix factorization with random projections, in Proc. 2010 SIAM Int. Conf. Data Mining, Columbus, OH, USA, 2010, pp. 281-292.
[30]   Annett M. and Kondrak G., A comparison of sentiment analysis techniques: Polarizing movie blogs, in Proc. 21st Conference of the Canadian Society for Computational Studies of Intelligence, Windsor, Canada, 2008, pp. 25-35.
[31]   Hillmann R. and Trier M., Sentiment polarization and balance among users in online social networks, , 2021.
[32]   Del Vicario M., Vivaldo G., Bessi A., Zollo F., Scala A., Caldarelli G., and Quattrociocchi W., Echo chambers: Emotional contagion and group polarization on facebook, Sci. Rep., vol. 6, p. 37825, 2016.
[33]   Mohammad S. M., Zhu X. D., Kiritchenko S., and Martin J., Sentiment, emotion, purpose, and style in electoral tweets, Informat. Proc. Manag., vol. 51, no. 4, pp. 480-499, 2015.
[34]   Chakraborty K., Bhattacharyya S., Bag R., and Hassanien A., Sentiment analysis on a set of movie reviews using deep learning techniques, in Social Network Analytics Computational Research Methods and Techniques, Cambridge, MA, USA, 2019, pp. 127-147.
[35]   Sailunaz K. and Alhajj R., Emotion and sentiment analysis from twitter text, J. Comput. Sci., vol. 36, p. 101003, 2019.
[36]   Meisheri H., Ranjan K., and Dey L., Sentiment extraction from consumer-generated noisy short texts, in Proc. of 2017 IEEE Int. Conf. Data Mining Workshops (ICDMW), New Orleans, LA, USA, 2017, pp. 399-406.
[37]   Alharbi A. S. M. and de Doncker E., Twitter sentiment analysis with a deep neural network: An enhanced approach using user behavioral information, Cogn. Syst. Res., vol. 54, pp. 50-61, 2019.
[38]   Newman M. E. J., Modularity and community structure in networks, Proc. Natl. Acad. Sci. USA, vol. 103, no. 23, pp. 8577-8582, 2006.
[39]   Wang M., Wang C. K., Yu J. X., and Zhang J., Community detection in social networks: An in-depth benchmarking study with a procedure-oriented framework, Proc. VLDB Endow., vol. 8, no. 10, pp. 998-1009, 2015.
[40]   Cai D., He X. F., Wu X. Y., and Han J. W., Non-negative matrix factorization on manifold, in Proc. 2008 8th IEEE Int. Conf. Data Mining, Pisa, Italy, 2008, pp. 63-72.
[41]   Wang H., Nie F. P., Huang H., and Makedon F., Fast nonnegative matrix tri-factorization for large-scale data co-clustering, in Proc. 22nd Int. Joint Conf. Artificial Intelligence, Barcelona, Spain, 2011, pp. 1553-1558.
[42]   TextBlob: Simplified text processing, , 2020.
[43]   Ding C. H. Q., Li T., and Jordan M. I., Convex and semi-nonnegative matrix factorizations, IEEE Trans. Patt. Anal. Mach. Intell., vol. 32, no. 1, pp. 45-55, 2010.
[44]   Abe H. and Yadohisa H., Orthogonal nonnegative matrix tri-factorization based on tweedie distributions, Adv. Data Anal. Classi., vol. 13, no. 4, pp. 825-853, 2019.
[45]   Shivaswamy P. K. and Jebara T., Permutation invariant SVMs, in Proc. 23rd Int. Conf. Machine Learning, Pittsburgh, PA, USA, 2006, pp. 817-824.
[1] Zhonghao Xue,Hongzhi Wang. Effective Density-Based Clustering Algorithms for Incomplete Data[J]. Big Data Mining and Analytics, 2021, 4(3): 183-194.
[2] Rajani Kumari,Sandeep Kumar,Ramesh Chandra Poonia,Vijander Singh,Linesh Raja,Vaibhav Bhatnagar,Pankaj Agarwal. Analysis and Predictions of Spread, Recovery, and Death Caused by COVID-19 in India[J]. Big Data Mining and Analytics, 2021, 4(2): 65-75.
[3] Krishna Kant Singh,Akansha Singh. Diagnosis of COVID-19 from Chest X-Ray Images Using Wavelets-Based Depthwise Convolution Network[J]. Big Data Mining and Analytics, 2021, 4(2): 84-93.
[4] Avani Agarwal,Sahil Sharma,Vijay Kumar,Manjit Kaur. Effect of E-Learning on Public Health and Environment During COVID-19 Lockdown[J]. Big Data Mining and Analytics, 2021, 4(2): 104-115.
[5] Vishan Kumar Gupta,Avdhesh Gupta,Dinesh Kumar,Anjali Sardana. Prediction of COVID-19 Confirmed, Death, and Cured Cases in India Using Random Forest Model[J]. Big Data Mining and Analytics, 2021, 4(2): 116-123.
[6] Wei Zhong, Ning Yu, Chunyu Ai. Applying Big Data Based Deep Learning System to Intrusion Detection[J]. Big Data Mining and Analytics, 2020, 3(3): 181-195.
[7] Madichetty Sreenivasulu, M. Sridevi. Comparative Study of Statistical Features to Detect the Target Event During Disaster[J]. Big Data Mining and Analytics, 2020, 3(2): 121-130.
[8] Jianjiang Li, Huihui Jiao, Jie Wang, Zhiguo Liu, Jie Wu. Online Real-Time Trajectory Analysis Based on Adaptive Time Interval Clustering Algorithm[J]. Big Data Mining and Analytics, 2020, 3(2): 131-142.
[9] Sunil Kumar, Maninder Singh. A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem[J]. Big Data Mining and Analytics, 2019, 2(4): 240-247.
[10] Balsam Alkouz, Zaher Al Aghbari, Jemal Hussien Abawajy. Tweetluenza: Predicting Flu Trends from Twitter Data[J]. Big Data Mining and Analytics, 2019, 2(4): 273-287.
[11] Mondher Bouazizi, Tomoaki Ohtsuki. Multi-Class Sentiment Analysis on Twitter: Classification Performance and Challenges[J]. Big Data Mining and Analytics, 2019, 2(3): 181-194.
[12] Qi Rao, Yan Yang, Yongquan Jiang. Condition Recognition of High-Speed Train Bogie Based on Multi-View Kernel FCM[J]. Big Data Mining and Analytics, 2019, 2(1): 1-11.
[13] Chenguang Kong, Guangchun Luo, Ling Tian, Xiaojun Cao. Disseminating Authorized Content via Data Analysis in Opportunistic Social Networks[J]. Big Data Mining and Analytics, 2019, 2(1): 12-24.
[14] Chenxi Yang, Yang Chen, Qingyuan Gong, Xinlei He, Yu Xiao, Yuhuan Huang, Xiaoming Fu. Understanding the Behavioral Differences Between American and German Users: A Data-Driven Study[J]. Big Data Mining and Analytics, 2018, 01(04): 284-296.