Please wait a minute...
Big Data Mining and Analytics  2018, Vol. 1 Issue (1): 34-46    DOI: 10.26599/BDMA.2018.9020004
    
Event Detection and Identification of Influential Spreaders in Social Media Data Streams
Leilei Shi, Yan Wu, Lu Liu*, Xiang Sun, Liang Jiang
Leilei Shi, Yan Wu, Lu Liu, Xiang Sun, and Liang Jiang are with School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang 212013, China.
Lu Liu is also with Department of Computing and Mathematics, University of Derby, UK.
Download: PDF (1291 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

Microblogging, a popular social media service platform, has become a new information channel for users to receive and exchange the most up-to-date information on current events. Consequently, it is a crucial platform for detecting newly emerging events and for identifying influential spreaders who have the potential to actively disseminate knowledge about events through microblogs. However, traditional event detection models require human intervention to detect the number of topics to be explored, which significantly reduces the efficiency and accuracy of event detection. In addition, most existing methods focus only on event detection and are unable to identify either influential spreaders or key event-related posts, thus making it challenging to track momentous events in a timely manner. To address these problems, we propose a Hypertext-Induced Topic Search (HITS) based Topic-Decision method (TD-HITS), and a Latent Dirichlet Allocation (LDA) based Three-Step model (TS-LDA). TD-HITS can automatically detect the number of topics as well as identify associated key posts in a large number of posts. TS-LDA can identify influential spreaders of hot event topics based on both post and user information. The experimental results, using a Twitter dataset, demonstrate the effectiveness of our proposed methods for both detecting events and identifying influential spreaders.



Key wordsevent detection      microblogging      Hypertext-Induced Topic Search (HITS)      Latent Dirichlet Allocation (LDA)      identification of influential spreader     
Received: 09 September 2017      Published: 08 January 2020
Corresponding Authors: Lu Liu   
Cite this article:

Leilei Shi, Yan Wu, Lu Liu, Xiang Sun, Liang Jiang. Event Detection and Identification of Influential Spreaders in Social Media Data Streams. Big Data Mining and Analytics, 2018, 1(1): 34-46.

URL:

http://bigdata.tsinghuajournals.com/10.26599/BDMA.2018.9020004     OR     http://bigdata.tsinghuajournals.com/Y2018/V1/I1/34

Fig. 1 TD-HITS procedure.
Fig. 2 Iterative model for determining the authority score of posts and the hub score of users.
Fig. 3 Procedure of the TS-LDA model.
3].">
Fig. 4 LDA model[3].
SymbolMeaning
αSuper parameter of θm
βSuper parameter of ϕk
dPost
wWord
zTopic
θTopic distribution
ϕWord distribution
ZNumber of topics
KNumber of words
Table?1 Symbols used in the LDA model.
Fig. 5 Three-step model.
Fig. 6 Number of topics from the TD-HITS method.
Key post IDMinimum distanceAuthority value
6590510110556119046.855 654 6002.609 579 007 761 16
6590945616171335686.855 654 6002.609 579 007 761 16
6589030376630558726.082 762 5302.241 254 882 530 98
6587866504079646726.082 762 532.241 254 882 530 98
6587507785727098895.916 079 7832.241 254 882 530 98
6586762202305454095.916 079 7832.241 254 882 530 98
6586758836334141455.916 079 7832.241 254 882 530 98
6586027771831541775.916 079 7832.241 254 882 530 98
Table?2 Minimum distance and authority of posts.
Minimum distanceReal-life eventKey postCreated time
6.855 654 6The rise and controversy of classical economics@namasteacup "classical economics"is specifically in the text:pTue Oct 27 16:56:20 2015
6.855 654 6Economic deficit in United States@Shamsher1111 @johnefrancis If you have a master in economics and don’t understand uses of deficit spending, you are a very great fool.Tue Oct 27 19:49:23 2015
6.082 762 53Economic crisis in Poland@BeingAnkit_ My mind starts boggling at Economics. I better leave you to study.Tue Oct 27 07:08:20 2015
6.082 762 53The rise of cultural economics"each part has a size measuring its efficiency economics became more efficient than culture for organizing society @enleuk"Mon Oct 26 23:25:51 2015
5.916 079 783The rise of cultural economics@NYSELaxative @StartlinglyOkay What kind of input, output and filter? And economics is limited to property and only one part of culture.Mon Oct 26 21:03:19 2015
5.916 079 783The rise of football economics@ArsenalReport @JanuzajA11 @Firzaapras err I study economics so I I’d know about this subject especially, and its a fact that it doesn’tMon Oct 26 16:07:03 2015
5.916 079 783The rise of football economics@mk_9873 @januzaja11 @firzaapras Maybe because you only hang out with mouth breathers? It’s how all economics work, not just football.Mon Oct 26 16:05:42 2015
5.916 079 783Economic crisis in Ireland@UB_Economics I am sorry to hear this @hazeyhall, have you managed to arrange an appointment now? PHMon Oct 26 11:15:12 2015
Table?3 Evaluation results for event detection.
Fig. 7 Trends in reply number changes of top four popular events over time.
Fig. 8 Trend of reply number changes for the hottest event over time.
MethodK=1K=5K=8K=10
PLSA1/15/56/10
LDA1/15/56/10
EVE1/15/56/10
TS-LDA6/8
Table?4 Comparison of precision.
MethodTime (K=8)
HITSTopic decision methodGibbs samplingEMTotal
PLSAN.AN.AN.A24.05 min24.05 min
LDAN.AN.A15.62 minN.A15.62 min
EVE10922 msN.AN.A7.32 min7.51 min
TS-LDA10922 ms3.69 min1.16 minN.A5.03 min
Table?5 Comparison of time efficiency.
User IDHub valueDegree value
14101081150.0034024630.0015
294423130.0030784190.0019
808647100.0030784190.0012
250738770.0027543750.0125
256544210.0021062860.0017
Table?6 Results of identification of influential spreaders.
User IDRelated event
1410108115The rise of cultural economics
29442313Economics deficit in United States
80864710Economics crisis in Poland
25073877Economic deficit in United States
25654421The rise of football economics
Table?7 Results of identification of influential spreaders in related event.
User IDRetweet and comment count
141010811520
2944231318
8086471018
2507387715
2565442112
Table?8 Effectiveness of identification of influential spreaders in related event.
[1]   Zhou X. M. and Chen L., Event detection over twitter social media streams, VLDB J., vol. 23, no. 3, pp. 381-400, 2014.
[2]   Aldhaheri A. and Lee J., Event detection on large social media using temporal analysis, in Proc. 7th Annu. Computing and Communication Workshop and Conf., Las Vegas, NV, USA, 2017, pp. 1-6.
[3]   Yan P., MapReduce and semantics enabled event detection using social media, J. Artif. Intell. Soft Comput. Res., vol. 7, no. 3, pp. 201-213, 2017.
[4]   Zhou Y. D., Xu H., and Lei L., Event detection based on interactive communication streams in social network, in Proc. 9th EAI Int. Conf. Mobile Multimedia Communications, Xi’an, China, 2016, pp. 54-57.
[5]   Hofmann T., Probabilistic latent semantic indexing, in Proc. 22nd Annu. Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Berkeley, CA, USA, 1999, pp. 50-57.
[6]   Hofmann T., Probabilistic latent semantic indexing, in Proc. 22nd Annu. Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Berkeley, CA, USA, 1999, pp. 50-57.
[7]   Blei D. M., Ng A. Y., and Jordan M. I., Latent Dirichlet allocation, J. Mach. Learn. Res., vol. 3, pp. 993-1022, 2003.
[8]   Diao Q. M., Jiang J., Zhu F. D., and Lim E. P., Finding bursty topics from microblogs, in Proc. 50th Annu. Meeting of the Association for Computational Linguistics: Long Papers–Volume 1, Jeju Island, Korea, 2012, pp. 536-544.
[9]   Wang X. H., Zhai C. X., Hu X., and Sproat R., Mining correlated bursty topic patterns from coordinated text streams, in Proc. 13th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Jose, CA, USA, 2007, pp. 784-793.
[10]   AlSumait L., Barbara D., and Domeniconi C., On-Line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking, in Proc. 8th IEEE Int. Conf. Data Mining, Pisa, Italy, 2008, pp. 3-12.
[11]   Li J. X., Tai Z. Y., Zhang R. C., Yu W. R., and Liu L., Online bursty event detection from microblog, in Proc. 7th IEEE/ACM Int. Conf. Utility and Cloud Computing, London, UK, 2014, pp. 865-870.
[12]   Chakrabarti S., Dom B., Raghavan P., Rajagopalan S., Gibson D., and Kleinberg J., Automatic resource compilation by analyzing hyperlink structure and associated text, Comput. Netw. ISDN Syst., vol. 30, nos. 1–7, pp. 65-74, 1998.
[13]   Bao J., Zheng Y., and Mokbel M. F., Location-based and preference-aware recommendation using sparse geo-social networking data, in Proc. 20th Int. Conf. Advances in Geographic Information Systems, Redondo Beach, CA, USA, 2012, pp. 199-208.
[14]   Kleinberg J., Bursty and hierarchical structure in streams, in Proc. 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD ‘02), Edmonton, Canada, 2002, pp. 91-101.
[15]   Yang Y. M., Pierce T., and Carbonell J., A study of retrospective and on–line event detection, in Proc. 21st Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR ‘98), Melbourne, Australia, 1998, pp. 28-36.
[16]   Mathioudakis M. and Koudas N., Twittermonitor: Trend detection over the twitter stream, in Proc. 2010 ACM SIGMOD Int. Conf. Management of Data, Indianapolis, IN, USA, 2010, pp. 1155-1158.
[17]   Allan J., Lavrenko V., Malin D., and Swan R., Detections, bounds, and timelines: UMass and TDT–3, in Proc. Topic Detection and Tracking Workshop, TDT–3, Vienna, Austria, 2000, pp. 167-174.
[18]   Atefeh F. and Khreich W., A survey of techniques for event detection in twitter, Comput. Intell., vol. 31, no, 1, pp. 132-164, 2015.
[19]   Twitter, REST API v1.1 resources, , 2017.
[20]   Facebook, Quickstart for the Azure AD Graph API, , 2017.
[21]   Weng J. S. and Lee B. S., Event detection in Twitter, in Proc. 5th Int. AAAI Conf. Weblogs and Social Media, Barcelona, Spain, 2011, pp. 401-408.
[22]   Li Y. F., Jia C. Y., and Yu J., A parameter–free community detection method based on centrality and dispersion of nodes in complex networks, Phys. A: Stat. Mech. Appl., vol. 438, pp. 321-334, 2015.
[23]   Lü L. Y. and Zhou T., Link prediction in complex networks: A survey, Phys. A: Stat. Mech. Appl., vol. 390, no. 6, pp. 1150-1170, 2011.
[24]   Jaccard P., étude comparative de la distribution florale dans une portion des Alpes et du Jura, Bulletin del la Société Vaudoise des Sciences Naturelles, vol. 37, no. 142, pp. 547-579, 1901.
[25]   Hu Y. Q., Li M. H., Zhang P., Fan Y., and Di Z. R., Community detection by signaling on complex networks, Phys. Rev. E, vol. 78, no. 1, p. 016115, 2008.
[26]   Asuncion A., Welling M., Smyth P., and Teh Y. W., On smoothing and inference for topic models, in Proc. 25th Conf. Uncertainty in Artificial Intelligence, Montreal, Canada, 2009, pp. 27-34.
[27]   Alhamzawi R. and Yu K. M., Variable selection in quantile regression via Gibbs sampling, J. Appl. Stat., vol. 39, no. 4, pp. 799-813, 2012.
[28]   Sun P. G. and Yang Y., Methods to find community based on edge centrality, Phys. A Stat. Mech. Appl., vol. 392, no. 9, pp. 1977-1988, 2013.
[29]   Campiteli M. G., Holanda A. J., Soares L. D. H., Soles P. R. C., and Kinouchi O., Lobby index as a network centrality measure, Phys. A: Stat. Mech. Appl., vol. 392, no. 21, pp. 5511-5515, 2013.
[30]   Sohn J., Kang D., Park H., Joo B. G., and Chung I. J., An improved social network analysis method for social networks, in Advanced Technologies, Embedded and Multimedia for HumanCentric Computing, Huang Y. M., Chao H. C., Deng D. J., and Park J. J., eds. Amsterdam, The Netherlands: Springer, 2014, pp. 115-123.
[31]   Bonacich P., Factoring and weighting approaches to status scores and clique identification, J. Math. Sociol., vol. 2, no. 1, pp. 113-120, 1972.
[32]   Green O. and Bader D. A., Faster betweenness centrality based on data structure experimentation, Procedia Comput. Sci., vol. 18, pp. 399-408, 2013.
No related articles found!