Please wait a minute...
Big Data Mining and Analytics  2019, Vol. 2 Issue (3): 205-216    DOI: 10.26599/BDMA.2019.9020004
    
A Semi-Supervised Deep Network Embedding Approach Based on the Neighborhood Structure
Wenmao Wu, Zhizhou Yu, Jieyue He*
Wenmao Wu, Zhizhou Yu, and Jieyue He are with School of Computer Science and Engineering, and also with MOE Key Laboratory of Computer Network and Information Integration, Southeast University, Nanjing 211100, China, Email: wwmjeff@163.com, zhizhou.yu@seu.edu.cn.
Download: PDF (515 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

Network embedding is a very important task to represent the high-dimensional network in a low-dimensional vector space, which aims to capture and preserve the network structure. Most existing network embedding methods are based on shallow models. However, actual network structures are complicated which means shallow models cannot obtain the high-dimensional nonlinear features of the network well. The recently proposed unsupervised deep learning models ignore the labels information. To address these challenges, in this paper, we propose an effective network embedding method of Structural Labeled Locally Deep Nonlinear Embedding (SLLDNE). SLLDNE is designed to obtain highly nonlinear features through utilizing deep neural network while preserving the label information of the nodes by using a semi-supervised classifier component to improve the ability of discriminations. Moreover, we exploit linear reconstruction of neighborhood nodes to enable the model to get more structural information. The experimental results of vertex classification on two real-world network datasets demonstrate that SLLDNE outperforms the other state-of-the-art methods.



Key wordsnetwork embedding      deep learning      network analysis     
Received: 22 November 2018      Published: 06 January 2020
Corresponding Authors: Jieyue He   
About author:

? Jiangcheng Zhu and Shuang Hu contribute equally to this paper. This work was done when they were visiting researchers in Data Science Institute, Imperial College London, London SW7 2AZ, UK.

Cite this article:

Wenmao Wu, Zhizhou Yu, Jieyue He. A Semi-Supervised Deep Network Embedding Approach Based on the Neighborhood Structure. Big Data Mining and Analytics, 2019, 2(3): 205-216.

URL:

http://bigdata.tsinghuajournals.com/10.26599/BDMA.2019.9020004     OR     http://bigdata.tsinghuajournals.com/Y2019/V2/I3/205

Fig. 1 A simple example of network. From the point of view of first-order proximity, vertex 2 and vertex 3 should be closer in low-dimensional space because they connect directly to each other through an edge. In consideration of second-order proximity, vertex 1 and vertex 2 should be more similar in low-dimensional space as they share similar neighbors.
Fig. 2 Framework of SLLDNE.
PycharmPythonTensorflow
Version2016.4.33.61.4
Table 1 Software environment configuration.
DatasetNodesEdgesLabels
CiteSeer331247326
Cora270854297
Table 2 Datasets used in our experiments.
Labeled nodes (%)DeepWalkLINESDNETLINEMMDWSLLDNE
1049.0939.8252.3249.3355.656.22
2055.9646.8358.0855.9161.5461.75
3060.6549.0260.4460.3863.3664.25
4063.9750.6564.9263.6665.1868.41
5064.5253.7766.165.5566.9369.99
6067.2954.268.2367.7169.5271.77
7066.858.9468.0567.5470.4773.34
8066.8259.7768.3467.0670.8774.96
9063.9159.3766.7663.5870.9577.41
Table 3 Micro-F1 (%) of vertex classification on CiteSeer.
Labeled nodes (%)DeepWalkLINESDNETLINEMMDWSLLDNE
1067.265.1369.467.4574.9477.28
2072.5370.1774.1571.3480.8382.37
3075.8772.274.6773.3782.8383.68
4077.6472.9275.1473.9683.6885.11
5080.3573.4576.176.3184.7185.82
6081.4775.6776.477.9685.5186.44
7083.3175.257877.3787.0187.45
8084.5876.7878.7578.1187.2789.85
9085.6179.3481.1981.8288.1990.03
Table 4 Micro-F1 (%) of vertex classification on Cora.
Fig. 3 Performance w.r.t. the number of embedding dimensions on CiteSeer.
Fig. 4 Performance w.r.t. the number of embedding dimensions on Cora.
Fig. 5 Validity of the weights on CiteSeer.
Fig. 6 Validity of the weights on Cora.
Fig. 7 Convergence on CiteSeer.
Fig. 8 Convergence on Cora.
[1]   Sen P., Namata G., Bilgic M., Getoor L., Galligher B., and Eliassi-Rad T., Collective classification in network data, AI Mag., vol. 29, no. 3, pp. 93-106, 2008.
[2]   Herman I., Melancon G., and Marshall M. S., Graph visualization and navigation in information visualization: A survey, IEEE Trans. Vis. Comput. Graph., vol. 6, no. 1, pp. 24-43, 2000.
[3]   Liben-Nowell D. and Kleinberg J., The link-prediction problem for social networks, J. Amer. Soc. Inf. Sci. Technol., vol. 58, no. 7, pp. 1019-1031, 2007.
[4]   Tenenbaum J. B., De Silva V., and Langford J. C., A global geometric framework for nonlinear dimensionality reduction, Science, vol. 290, no. 5500, pp. 2319-2323, 2000.
[5]   Roweis S. T. and Lawrence K. S., Nonlinear dimensionality reduction by locally linear embedding, Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
[6]   Belkin M. and Niyogi P., Laplacian eigenmaps and spectral techniques for embedding and clustering, in Proc. 14th Int. Conf. Neural Information Processing Systems: Natural and Synthetic, Vancouver, Canada, 2001, pp. 585-591.
[7]   Chen M., Yang Q., and Tang X. O., Directed graph embedding, in Proc. 20th Int. Joint Conf. Artificial Intelligence, Hyderabad, India, 2007, pp. 2707-2712.
[8]   Mikolov T., Sutskever I., Chen K., Corrado G. S., and Dean J., Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems 26, Lake Tahoe, NV, USA, 2013, pp. 3111-3119.
[9]   Mikolov T., Chen K., Corrado G., and Dean J., Efficient estimation of word representations in vector space, arXiv preprint arXiv: 1301.3781, 2013.
[10]   Mikolov T., Yih W. T., and Zweig G., Linguistic regularities in continuous space word representations, in Proc. 2013 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA, 2013, pp. 746-751.
[11]   Perozzi B., Al-Rfou R., and Skiena S., DeepWalk: Online learning of social representations, in Proc. 20th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, New York, NY, USA, 2014, pp. 701-710.
[12]   Tang J., Qu M., Wang M. Z., Zhang M., Yan J., and Mei Q. Z., LINE: Large-scale information network embedding, arXiv preprint arXiv: 1503.03578, 2015.
[13]   Grover A. and Leskovec J., node2vec: Scalable feature learning for networks, in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 855-864.
[14]   Tang T., Qu M., and Mei Q. Z., PTE: Predictive text embedding through large-scale heterogeneous text networks, in Proc. 21st ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Sydney, Australia, 2015, pp. 1165-1174.
[15]   Wang D. X., Cui P., and Zhu W. W., Structural deep network embedding, in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 1225-1234.
[16]   Cao S. S., Wei L., and Xu Q. K., Deep neural networks for learning graph representations, in Proc. 13th AAAI Conf. Artificial Intelligence, Phoenix, AZ, USA, 2016, pp. 1145-1152.
[17]   Cortes C. and Vapnik V., Support-vector networks, Mach. Learn., vol. 20, no. 3, pp. 273-297, 1995.
[18]   Tian F., Gao B., Cui Q., Chen E. H., and Liu T. Y., Learning deep representations for graph clustering, in Proc. 28th AAAI Conf. Artificial Intelligence, Québec City, Canada, 2014, pp. 1293-1299.
[19]   Vincent P., Larochelle H., Bengio Y., and Manzagol P. A., Extracting and composing robust features with denoising autoencoders, in Proc. 25th Int. Conf. Machine Learning, Helsinki, Finland, 2008, pp. 1096-1103.
[20]   Chang S. Y., Han W., Tang J. L., Qi G. J., Aggarwal C. C., and Huang T. S., Heterogeneous network embedding via deep architectures, in Proc. 21th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Sydney, Australia, 2015, pp. 119-128.
[21]   Page L., Brin S., Motwani R., and Winograd T., The PageRank citation ranking: Bringing order to the web. Stanford InfoLab, Stanford University, Stanford, CA, USA, 1999.
[22]   Newman M. E. J., Finding community structure in networks using the eigenvectors of matrices, Phys. Rev. E, vol. 74, p. 036104, 2006.
[23]   Wang X., Cui P., Wang J., Pei J., Zhu W. W., and Yang S. Q., Community preserving network embedding, in Proc. 31st AAAI Conf. Artificial Intelligence, San Francisco, CA, USA, 2017, pp. 203-209.
[24]   Kingma D. P. and Jimmy B., Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2014.
[25]   LeCun Y., Bottou L., Bengio Y., and Haffner P., Gradient-based learning applied to document recognition, Proc. IEEE, vol. 86, vol. 11, pp. 2278-2324, 1998.
[26]   McCallum A. K., Nigam K., Rennie J., and Seymore K., Automating the construction of internet portals with machine learning, Information Retrieval, vol. 3, no. 2, pp. 127-163, 2000.
[27]   Zhang X., Chen W. Z., Xie Z. M., and Yan H. F., Learning transductive network embedding, J. Front. Comput. Sci. Technol., vol. 11, no. 4, pp. 520-527, 2017.
[28]   Tu C. C., Zhang W. C., Liu Z. Y., and Sun M. S., Max-Margin DeepWalk: Discriminative learning of network representation, in Proc. 25th Int. Joint Conf. Artificial Intelligence, New York, NY, USA, 2016, pp. 3889-3895.
[29]   Fan R. E., Chang K. W., Hsieh C. J., Wang X. R., and Lin C. J., LIBLINEAR: A library for large linear classification, J. Mach. Learn. Res., vol. 9, pp. 1871-1874, 2008.
[1] Zhenxing Guo, Shihua Zhang. Sparse Deep Nonnegative Matrix Factorization[J]. Big Data Mining and Analytics, 2020, 03(01): 13-28.
[2] Ying Yu, Min Li, Liangliang Liu, Yaohang Li, Jianxin Wang. Clinical Big Data and Deep Learning: Applications, Challenges, and Future Outlooks[J]. Big Data Mining and Analytics, 2019, 2(4): 288-305.
[3] Qile Zhu, Xiyao Ma, Xiaolin Li. Statistical Learning for Semantic Parsing: A Survey[J]. Big Data Mining and Analytics, 2019, 2(4): 217-239.
[4] Jiangcheng Zhu, Shuang Hu, Rossella Arcucci, Chao Xu, Jihong Zhu, Yi-ke Guo. Model Error Correction in Data Assimilation by Integrating Neural Networks[J]. Big Data Mining and Analytics, 2019, 2(2): 83-91.
[5] Thosini Bamunu Mudiyanselage, Yanqing Zhang. Feature Selection with Graph Mining Technology[J]. Big Data Mining and Analytics, 2019, 2(2): 73-82.
[6] Yaojing Wang, Yuan Yao, Hanghang Tong, Feng Xu, Jian Lu. A Brief Review of Network Embedding[J]. Big Data Mining and Analytics, 2019, 2(1): 35-47.
[7] Jin Liu, Yi Pan, Min Li, Ziyue Chen, Lu Tang, Chengqian Lu, Jianxin Wang. Applications of Deep Learning to MRI Images: A Survey[J]. Big Data Mining and Analytics, 2018, 1(1): 1-18.
[8] Qianyu Meng, Kun Wang, Xiaoming He, Minyi Guo. QoE-Driven Big Data Management in Pervasive Edge Computing Environment[J]. Big Data Mining and Analytics, 2018, 01(03): 222-233.
[9] Ning Yu, Zhihua Li, Zeng Yu. Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning[J]. Big Data Mining and Analytics, 2018, 01(03): 191-210.