Please wait a minute...
Big Data Mining and Analytics  2021, Vol. 4 Issue (4): 266-278    DOI: 10.26599/BDMA.2021.9020011
    
A Deep-Learning Prediction Model for Imbalanced Time Series Data Forecasting
Chenyu Hou(),Jiawei Wu(),Bin Cao(),Jing Fan*()
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
Download: PDF (3273 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

Time series forecasting has attracted wide attention in recent decades. However, some time series are imbalanced and show different patterns between special and normal periods, leading to the prediction accuracy degradation of special periods. In this paper, we aim to develop a unified model to alleviate the imbalance and thus improving the prediction accuracy for special periods. This task is challenging because of two reasons: (1) the temporal dependency of series, and (2) the tradeoff between mining similar patterns and distinguishing different distributions between different periods. To tackle these issues, we propose a self-attention-based time-varying prediction model with a two-stage training strategy. First, we use an encoder-?decoder module with the multi-head self-attention mechanism to extract common patterns of time series. Then, we propose a time-varying optimization module to optimize the results of special periods and eliminate the imbalance. Moreover, we propose reverse distance attention in place of traditional dot attention to highlight the importance of similar historical values to forecast results. Finally, extensive experiments show that our model performs better than other baselines in terms of mean absolute error and mean absolute percentage error.



Key wordstime series forecasting      imbalanced data      deep learning      prediction model     
Received: 21 May 2021      Published: 30 August 2021
Fund:  National Key R&D Program of China(2018YFB1402800);Fundamental Research Funds for the Provincial Universities of Zhejiang(RF-A2020007)
Corresponding Authors: Jing Fan     E-mail: houcy@zjut.edu.cn;wujw@zjut.edu.cn;bincao@zjut.edu.cn;fanjing@zjut.edu.cn
About author: Chenyu Hou received the BEng degree from Zhejiang University of Technology, Hangzhou, China in 2016. He is now a PhD student at Zhejiang University of Technology. He has published many papers on international journals and conferences, like TKDE, CIKM, and WWWJ, etc. His research interests are database and data mining.|Jiawei Wu received the BEng degree from Zhejiang University of Technology, Hangzhou, China in 2017. He is now a PhD student at Zhejiang University of Technology. He has published several papers on international journals and conferences, like TOIT, MONET, etc. His research interest is natural language processing.|Bin Cao received the PhD degree in computer science from Zhejiang University, China in 2013. He then worked as a research associate at Hongkong University of Science and Technology and Noah’s Ark Lab, Huawei. He joined Zhejiang University of Technology, Hangzhou, China in 2014, and is now an associate professor at the College of Computer Science and Technology, Zhejiang University of Technology. He has published more than 30 papers on many international authoritative journals and conferences, including TKDE, TSC, and CIKM. His research interests include data mining and natural language processing.|Jing Fan received the BEng, MEng, and PhD degrees in computer science from Zhejiang University, China in 1990, 1993 and 2003, respectively. She is now a professor at the College of Computer Science and Technology, Zhejiang University of Technology, China. She is a director of China Computer Federation (CCF). She has published more than 100 papers on many international journals and conferences, including TSC, TKDE, and IEEE Virtual Reality. Her current research interests include service computing, virtual reality, and intelligent interaction.
Cite this article:

Chenyu Hou,Jiawei Wu,Bin Cao,Jing Fan. A Deep-Learning Prediction Model for Imbalanced Time Series Data Forecasting. Big Data Mining and Analytics, 2021, 4(4): 266-278.

URL:

http://bigdata.tsinghuajournals.com/10.26599/BDMA.2021.9020011     OR     http://bigdata.tsinghuajournals.com/Y2021/V4/I4/266

Fig. 1 An example of call arrival prediction.
Fig. 2 Distribution differences throughout the day.
Fig. 3 Framework of STV.
Fig. 4 Attention architecture.
Fig. 5 Two-stage training strategy.
CityMethodH=14H=42H=98
MAEMAPEMAEMAPEMAEMAPE
HZARIMA125.210.11145.600.13193.450.19
LSTM309.470.35322.590.36346.800.39
Seq2Seq110.690.10153.420.14206.460.19
LSTNet121.130.10148.690.13166.480.15
NBeat118.650.10150.220.14179.220.16
STV104.810.09131.530.12166.630.15
TZARIMA121.660.11145.990.13187.550.18
LSTM284.060.30292.830.31321.430.34
Seq2Seq105.890.09132.020.11148.460.13
LSTNet104.640.09126.500.11131.480.11
NBeat102.880.08122.700.10161.550.14
STV99.470.08111.500.09136.320.11
LSARIMA100.900.11122.270.14158.940.19
LSTM229.290.31242.090.33261.260.36
Seq2Seq90.180.09119.800.13205.060.23
LSTNet93.990.1093.980.10116.690.13
NBeat83.350.0998.390.11123.090.14
STV77.980.0891.320.09113.260.12
Table 1 Overall MAE and MAPE of different methods.
CityMethodH=14H=42H=98
MAE_NMAPE_NMAE_HMAPE_HMAE_NMAPE_NMAE_HMAPE_HMAE_NMAPE_NMAE_HMAPE_H
HZARIMA116.290.10243.740.26139.240.12270.900.29175.300.17432.920.51
LSTM297.250.33471.990.59307.600.34521.270.65329.970.37568.850.71
Seq2Seq103.620.09204.650.21138.130.12356.150.38186.980.17463.310.50
LSTNet111.990.09242.740.25135.140.12328.350.35152.810.13346.820.37
NBeat111.200.09217.750.23136.800.12328.200.36164.520.15373.170.41
STV98.730.08185.760.19122.410.11252.490.27160.980.14241.050.25
TZARIMA115.240.10207.050.19142.630.13190.560.19176.590.16332.090.35
LSTM275.620.29396.200.44282.740.30426.680.48310.110.33470.810.53
Seq2Seq101.190.08168.380.15126.100.10210.630.19138.560.12279.040.27
LSTNet100.730.08156.590.14117.730.10242.760.23124.060.10229.410.21
NBeat98.630.08159.400.14115.430.10219.160.21152.600.13279.580.27
STV95.720.08149.290.13107.370.09166.300.15136.240.11137.330.12
LSARIMA97.960.11139.870.17120.650.13143.700.19151.710.18254.330.34
LSTM224.610.30291.570.42235.780.32325.880.46254.050.35356.330.50
Seq2Seq87.260.09128.940.15112.360.12218.410.26195.790.21327.270.40
LSTNet89.660.09151.600.1789.430.10154.330.18109.420.12212.610.26
NBeat80.150.08125.900.1593.550.10162.550.20118.990.13177.160.21
STV76.570.0896.750.1189.350.09117.400.14111.410.12137.670.17
Table 2 MAE and MAPE on normal days and holidays.
Fig. 6 Performance of different training mechanisms.
Removed moduleHZTZLS
MAEMAPEMAEMAPEMAEMAPE
Conv253.760.21113.150.09223.070.23
Encoder139.220.13128.410.1199.490.11
Decoder133.720.12138.050.12105.220.11
No module removed131.530.12111.500.0991.320.09
Table 3 Inside the two-stage training.
Fig. 7 MAE comparison on the convolution layer.
Fig. 8 MAE comparison of STV and STV-NTV.
Fig. 9 MAE comparison of STV and STV-Dot.
Fig. 10 MAE comparison of different training data volumes.
Fig. 11 Training efficiency of different models.
MethodMAEMAPEMAE_NMAPE_NMAE_HMAPE_H
ARIMA570.290.05564.950.05675.910.07
LSTM1281.170.121273.130.121614.560.13
Seq2Seq527.100.05522.290.05726.280.06
LSTNet499.010.05496.030.05622.510.05
NBeat515.100.05513.140.05596.650.05
STV464.090.04461.720.04562.400.04
Table 4 Model comparison on the ElecCONS dataset.
[1]   Cao B., Wu J. W., Cao L. C., Xu Y. S., and Fan J., Long-term and multi-step ahead call traffic forecasting with temporal features mining, Mobile Netw. Appl., vol. 25, no. 2, pp. 701-712, 2020.
[2]   Shao X. R., Kim C. S., and Sontakke P., Accurate deep model for electricity consumption forecasting using multi-channel and multi-scale feature fusion CNN-LSTM, Energies, vol. 13, no. 8, p. 1881, 2020.
[3]   Yi X. W., Zhang J. B., Wang Z. Y., Li T. R., and Zheng Y., Deep distributed fusion network for air quality prediction, in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, London, UK, 2018, pp. 965-973.
[4]   Ho S. L. and Xie M., The use of ARIMA models for reliability forecasting and analysis, Computers & Industrial Engineering, vol. 35, nos. 1&2, pp. 213-216, 1998.
[5]   Liu C. H., Hoi S. C., Zhao P. L., and Sun J. L., Online ARIMA algorithms for time series prediction, in Proc. 30th AAAI Conf. Artificial Intelligence, Phoenix, AR, USA, 2016, pp. 1867-1873.
[6]   Box G. E. P. and Pierce D. A., Distribution of residual autocorrelations in autoregressive-integrated moving average time series models, J. Am. Stat. Assoc., vol. 65, no. 332, pp. 1509-1526, 1970.
[7]   Dudek G., Short-term load forecasting using random forests, in Intelligent Systems’2014, Filev D., Jablkowski J., Kacprzyk J., Krawczak M., Popchev I., Rutkowski L., Sgurev V., Sotirova E., Szynkarczyk P., and Zadrozny S., eds. Cham, Germany: Springer, 2015, pp. 821-828.
[8]   Sapankevych N. I. and Sankar R., Time series prediction using support vector machines: A survey, IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 24-38, 2009.
[9]   Krawczyk B., Learning from imbalanced data: Open challenges and future directions, Prog. Artif. Intell., vol. 5, no. 4, pp. 221-232, 2016.
[10]   Liu X. Y., Wu J. X., and Zhou Z. H., Exploratory undersampling for class-imbalance learning, IEEE Trans. Syst., Man, Cybern., Part B (Cybern.), vol. 39, no. 2, pp. 539-550, 2009.
[11]   Chawla N. V., Bowyer K. W., Hall L. O., and Kegelmeyer W. P., SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res., vol. 16, pp. 321-357, 2002.
[12]   Sun Y. M., Kamel M. S., Wong A. K. C., and Wang Y., Cost-sensitive boosting for classification of imbalanced data, Pattern Recognit., vol. 40, no. 12, pp. 3358-3378, 2007.
[13]   Khan S. H., Hayat M., Sohel F., and Togneri R.. Cost-sensitive learning of deep feature representations from imbalanced data, IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 8, pp. 3573-3587, 2017.
[14]   Moniz N., Branco P., and Torgo L., Resampling strategies for imbalanced time series forecasting, Int. J. Data Sci. Anal., vol. 3, no. 3, pp. 161-181, 2017.
[15]   Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., and Polosukhin I., Attention is all you need, in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6000-6010.
[16]   He K. M., Zhang X. Y., Ren S. Q., and Sun J., Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.
[17]   Ba J. L., Kiros J. R., and Hinton G. E., Layer normalization, arXiv preprint arXiv: 1607.06450, 2016.
[18]   Nair V. and Hinton G. E., Rectified linear units improve restricted Boltzmann machines, in Proc. 27th Int. Conf. Machine Learning, Haifa, Israel, 2010, pp. 807-814.
[19]   Hochreiter S. and Schmidhuber J., Long short-term memory, Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
[20]   Sutskever I., Vinyals O., and Le Q. V., Sequence to sequence learning with neural networks, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 3104-3112.
[21]   Lai G. K., Chang W. C., Yang Y. M., and Liu H. X., Modeling long- and short-term temporal patterns with deep neural networks, in Proc. 41st Int. ACM SIGIR Conf. Research & Development in Information Retrieval, Ann Arbor, MI, USA, 2018, pp. 95-104.
[22]   Oreshkin B. N., Carpov D., Chapados N., and Bengio Y., N-BEATS: Neural basis expansion analysis for interpretable time series forecasting, arXiv preprint arXiv:1905.10437, 2019.
[23]   Kingma D. P. and Ba J. L., Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2014.
[24]   Bianchi L., Jarrett J., and Hanumara R. C., Improving forecasting for telemarketing centers by ARIMA modeling with intervention, Int. J. Forecasting, vol. 14, no. 4, pp. 497-504, 1998.
[25]   Contreras J., Espinola R., Nogales F. J., and Conejo A. J., ARIMA models to predict next-day electricity prices, IEEE Trans. Power Syst., vol. 18, no. 3, pp. 1014-1020, 2003.
[26]   Feigelson E. D., Babu G. J., and Caceres G. A., Autoregressive times series methods for time domain astronomy, Front. Phys., vol. 6, p. 80, 2018.
[27]   Lu C. J., Lee T. S., and Chiu C. C., Financial time series forecasting using independent component analysis and support vector regression, Decis. Support Syst., vol. 47, no. 2, pp. 115-125, 2009.
[28]   Kong W. C., Dong Z. Y., Jia Y. W., Hill D. J., Xu Y., and Zhang Y., Short-term residential load forecasting based on LSTM recurrent neural network, IEEE Trans. Smart Grid, vol. 10, no. 1, pp. 841-851, 2019.
[29]   Rangapuram S. S., Seeger M., Gasthaus J., Stella L., Wang Y. Y., and Januschowski T., Deep state space models for time series forecasting, in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 7796-7805.
[30]   Batista G. E. A. P. A., Prati R. C., and Monard M. C., A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorat. Newsl., vol. 6, no. 1, pp. 20-29, 2004.
[31]   Elkan C., The foundations of cost-sensitive learning, in Proc. 17th Int. Joint Conf. Artificial Intelligence, Seattle, WA, USA, 2001, pp. 973-978.
[32]   Wang S. J., Liu W., Wu J., Cao L. B., Meng Q. X., and Kennedy P. J., Training deep neural networks on imbalanced data sets, in Proc. Int. Joint Conf. Neural Networks, Vancouver, Canada, 2016, pp. 4368-4374.
[33]   Lin T. Y., Goyal P., Girshick R., He K. M., and Dollár P., Focal loss for dense object detection, in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 2999-3007.
[34]   Wang H. S., Cui Z. C., Chen Y. X., Avidan M., Abdallah A. B., and Kronzer A., Predicting hospital readmission via cost-sensitive deep learning, IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 15, no. 6, pp. 1968-1978, 2018.
[1] Changjie Wang,Zhihua Li,Benjamin Sarpong. Multimodal Adaptive Identity-Recognition Algorithm Fused with Gait Perception[J]. Big Data Mining and Analytics, 2021, 4(4): 223-232.
[2] Shuai Zhang,Hongyan Liu,Jun He,Sanpu Han,Xiaoyong Du. Deep Sequential Model for Anchor Recommendation on Live Streaming Platforms[J]. Big Data Mining and Analytics, 2021, 4(3): 173-182.
[3] Yong Bie,Yan Yang. A Multitask Multiview Neural Network for End-to-End Aspect-Based Sentiment Analysis[J]. Big Data Mining and Analytics, 2021, 4(3): 195-207.
[4] Krishna Kant Singh,Akansha Singh. Diagnosis of COVID-19 from Chest X-Ray Images Using Wavelets-Based Depthwise Convolution Network[J]. Big Data Mining and Analytics, 2021, 4(2): 84-93.
[5] Natarajan Yuvaraj,Kannan Srihari,Selvaraj Chandragandhi,Rajan Arshath Raja,Gaurav Dhiman,Amandeep Kaur. Analysis of Protein-Ligand Interactions of SARS-CoV-2 Against Selective Drug Using Deep Neural Networks[J]. Big Data Mining and Analytics, 2021, 4(2): 76-83.
[6] Youssef Nait Malek,Mehdi Najib,Mohamed Bakhouya,Mohammed Essaaidi. Multivariate Deep Learning Approach for Electric Vehicle Speed Forecasting[J]. Big Data Mining and Analytics, 2021, 4(1): 56-64.
[7] Wei Zhong, Ning Yu, Chunyu Ai. Applying Big Data Based Deep Learning System to Intrusion Detection[J]. Big Data Mining and Analytics, 2020, 3(3): 181-195.
[8] Sunitha Basodi, Chunyan Ji, Haiping Zhang, Yi Pan. Gradient Amplification: An Efficient Way to Train Deep Neural Networks[J]. Big Data Mining and Analytics, 2020, 3(3): 196-207.
[9] Chaity Banerjee, Tathagata Mukherjee, Eduardo Pasiliao Jr.. Feature Representations Using the Reflected Rectified Linear Unit (RReLU) Activation[J]. Big Data Mining and Analytics, 2020, 3(2): 102-120.
[10] Zhenxing Guo, Shihua Zhang. Sparse Deep Nonnegative Matrix Factorization[J]. Big Data Mining and Analytics, 2020, 03(01): 13-28.
[11] Qile Zhu, Xiyao Ma, Xiaolin Li. Statistical Learning for Semantic Parsing: A Survey[J]. Big Data Mining and Analytics, 2019, 2(4): 217-239.
[12] Ying Yu, Min Li, Liangliang Liu, Yaohang Li, Jianxin Wang. Clinical Big Data and Deep Learning: Applications, Challenges, and Future Outlooks[J]. Big Data Mining and Analytics, 2019, 2(4): 288-305.
[13] Wenmao Wu, Zhizhou Yu, Jieyue He. A Semi-Supervised Deep Network Embedding Approach Based on the Neighborhood Structure[J]. Big Data Mining and Analytics, 2019, 2(3): 205-216.
[14] Jiangcheng Zhu, Shuang Hu, Rossella Arcucci, Chao Xu, Jihong Zhu, Yi-ke Guo. Model Error Correction in Data Assimilation by Integrating Neural Networks[J]. Big Data Mining and Analytics, 2019, 2(2): 83-91.
[15] Jin Liu, Yi Pan, Min Li, Ziyue Chen, Lu Tang, Chengqian Lu, Jianxin Wang. Applications of Deep Learning to MRI Images: A Survey[J]. Big Data Mining and Analytics, 2018, 1(1): 1-18.