Using the hybrid model STL-ADABOOST-ESN for forecasting the monthly number of HIV patient in China
-
摘要:
目的 根据全国人类免疫缺陷病毒(human immunodeficiency virus, HIV)月发病数的情况, 采用基于Loess季节趋势分解方法(seasonal-trend decomposition procedures based on loess, STL)和自适应提升(adaptive boosting, AdaBoost)框架下的回声状态网络(echo state network, ESN), 构建HIV月发病数模型, 并预测2017年全国HIV月发病数。 方法 从中国疾病预防控制中心官网收集2013年1月-2016年12月全国HIV月发病数, 通过STL将月发病数序列分解为季节和非季节序列, 对两部分分别采用简单的季节估计方法和ADABOOST-ESN进行建模, 最后将它们的输出值进行加和, 得到HIV月发病数的预测值。 结果 本文以均方根误差(root mean squared error, RMSE)和平均绝对百分比误差(mean absolute percentage error, MAPE)作为模型性能的评价指标, 得到STL-ADABOOST-ESN的建模性能RMSE和MAPE分别为164.083和1.842%, 预测性能RMSE和MAPE分别为359.404和3.776%, 其预测精度高于乘积季节模型(seasonal autoregressive integrated moving average, SARIMA)、ESN、ADABOOST-ESN和STL-ESN等模型。采用本文提出的方法得出2017年1-12月全国HIV发病数预测值为5 654~8 497人。 结论 本研究提出的STL-ADABOOST-ESN模型预测准确度较高, 预测2017年全国HIV年发病数将达到92 040人, 相比于2016年增长了4.87%, 因此有必要在全国实施更为严格的HIV预防控制工作。 Abstract:Objective Based on the monthly number of human immunodeficiency virus(HIV) patient in China, this paper utilizes seasonal-trend decomposition procedure based on loess(STL) and echo state network(ESN) under the integrated framework adaptive boosting(AdaBoost) to build a hybrid model, and uses it to forecast the epidemic situation in 2017. Methods From the official website of center of Chinese disease prevention and control, we collected the data concerning the monthly number of HIV infection in China from January 2013 to December 2016. Firstly, the time series was decomposed into seasonal and remainder components via STL. Then, the simple seasonal estimation method was used to model the seasonal component, and ADABOOST-ESN was employed to model the remainder component. Finally, the output values of two components are summed to obtain the forecasting results of the monthly number of HIV infection. Results The assessment indices including root mean square error(RMSE) and mean absolute percentage error(MAPE), were used to evaluate the performance of model. The results showed that the values of RMSE and MAPE given by STL-ADABOOST-ESN were 164.083 and 1.842% respectively in training set, 359.404 and 3.776% respectively in test set. In addition, the forecasting accuracy of STL-ADABOOST-ESN was absolutely higher than other four models, including seasonal autoregressive integrated moving average(SARIMA), ESN, ADABOOST-ESN, STL-ESN. The monthly number of HIV patient in 2017 was predicted between 5 654 and 8 497 via the proposed method. Conclusions The model STL-ADABOOST-ESN proposed by this paper has high prediction accuracy. The forecasting results showed that the annual number of HIV patient in 2017 will reach 92 040, which increased by 4.87 percentage compared to 2016. Therefore, it is imperative to implement stricter measures of HIV prevention and control in China. -
表 1 不同模型预测精度比较
Table 1. Comparison of forecasting accuracy of different models
模型 训练集 测试集 RMSE MAPE (%) RMSE MAPE (%) SARIMA 690.963 7.898 673.846 7.497 ESN 285.309 3.887 552.133 6.638 ADABOOST-ESN 238.729 3.021 530.963 6.413 STL-ESN 206.697 2.424 391.330 3.995 STL-ADABOOST-ESN 164.083 1.842 359.404 3.776 表 2 2016年HIV月发病数预测结果
Table 2. The forecasting results of the number of HIV infection in 2016
月份(月) 实际值 SARIMA ESN ADABOOST-ESN STL-ESN STL-ADABOOST-ESN 1 6 270 6 402 7 284 7 005 6 534 6 523 2 4 631 4 186 4 288 4 812 4 629 4 514 3 8 221 7 057 8 104 7 813 7 853 7 990 4 7 422 7 048 7 825 7 762 6 851 7 015 5 7 239 6 842 6 982 6 852 7 274 7 144 6 7 499 7 077 8 220 7 842 7 146 7 375 7 7 701 7 861 8 136 8 237 7 509 7 512 8 8 119 6 848 7 758 7 135 7 253 7 249 9 8 102 7 341 8 071 8 144 8 191 8 165 10 6 541 6 129 5 920 5 965 6 196 6 038 11 7 594 6 594 7 332 7 240 7 646 7 662 12 8 425 8 311 9 398 9 163 7 911 8 007 -
[1] 王超, 贾忠伟, 郭秀花, 等. HIV/AIDS疫情发生与进展的预测方法[J].北京医学, 2010, 32(12): 993-996. DOI: 10.15932/j.0253-9713.2010.12.014.Wang C, Jia ZW, Guo XH, et al. Prediction method for the outbreak and progression of HIV/AIDS[J]. Beijing Med J, 2010, 32(12): 993-996. DOI: 10.15932/j.0253-9713.2010.12.014. [2] 范引光, 吕金伟, 戴色莺, 等. ARIMA模型与灰色预测模型GM(1, 1)在HIV感染人数预测中的应用[J].中华疾病控制杂志, 2012, 16(12): 1100-1103. http://d.wanfangdata.com.cn/Periodical/jbkzzz201212025Fang YG, Lv JW, Dai SY, et al. Application of ARIMA model and grey prediction model GM(1, 1)for predicting the number of HIV infections[J]. Chin J Dis Control Prev, 2012, 16(12): 1100-1103. http://d.wanfangdata.com.cn/Periodical/jbkzzz201212025 [3] 陈卫永, 罗艳, 许珂, 等.杭州市艾滋病GM(1, 1)模型灰色预测研究[J].中国艾滋病性病, 2006, 12(2): 164-165. DOI: 10.3969/j.issn.1672-5662.2006.02.022.Chen WY, Luo Y, Xu K, et al. Grey prediction of AIDS using GM(1, 1)model in Hangzhou[J]. Chin J AIDS STD, 2006, 12(2): 164-165. DOI: 10.3969/j.issn.1672-5662.2006.02.022. [4] 徐学琴, 王瑾瑾, 马晓梅, 等.基于支持向量机模型的河南艾滋病发病率预测[J].中国现代医学杂志, 2017, 27(12): 93-95. DOI: 10.3969/j.issn.1005-8982.2017.12.019.Xu XQ, Wang JJ, Ma XM, et al. AIDS incidence prediction in Henan based on support vector machine model[J]. Chin J Modern Med, 2017, 27(12): 93-95. DOI: 10.3969/j.issn.1005-8982.2017.12.019. [5] 王雅文, 沈忠周, 严宝湖, 等. ARIMA模型和ARIMA-GRNN模型在AIDS发病预测中的应用[J].中华疾病控制杂志, 2018, 22(12): 91-94. DOI: 10.16462/j.cnki.zhjbkz.2018.12.020.Wang YW, Shen ZZ, Yan BH, et al. Application of ARIMA model and ARIMA-GRNN model for predicting the ADIS incidence[J]. Chin J Dis Control Prev, 2018, 22(12): 91-94. DOI: 10.16462/j.cnki.zhjbkz.2018.12.020. [6] 韩西龙, 李青, 刘锋.基于时序分解的飞机平均故障间隔飞行时间组合预测[J].计算机应用, 2016, 36(2): 99-102, 119. http://www.cnki.com.cn/Article/CJFDTotal-JSJY2016S2026.htmHan XL, Li Q, Liu F. Combined prediction of aircraft flight time between mean failures based on time sequence decomposition[J]. Journal of Computer Applications, 2016, 36(2): 99-102, 119. http://www.cnki.com.cn/Article/CJFDTotal-JSJY2016S2026.htm [7] 古万荣, 谢贤芬, 何亦琛, 等.基于AdaBoost算法的药物-靶向蛋白作用预测算法[J].生物医学工程学杂志, 2018, 35(6): 935-942. DOI: 10.7507/1001-5515.2018.02.026.Gu WR, Xie XF, He YC, et al. Drug-targeted protein action prediction based on AdaBoost algorithm[J]. J Biomed Eng, 2018, 35(6): 935-942. DOI: 10.7507/1001-5515.2018.02.026. [8] 宋绍剑, 王尧, 林小峰.基于蚁群算法优化回声状态网络的研究[J].计算机工程与科学, 2017, 39(12): 2326-2332. DOI: 10.3969/j.issn.1007-130X.2017.12.023.Song SJ, Wang Y, Lin XF. The study of echo state network optimized by ant colony algorithm[J]. Computer Engineering & Science, 2017, 39(12): 2326-2332. DOI: 10.3969/j.issn.1007-130X.2017.12.023. [9] Theodosiou M. Forecasting monthly and quarterly time series using STL decomposition[J]. Int J Forecast, 2011, 27(4): 1178-1195. DOI: 10.1016/j.ijforecast.2010.11.002. [10] Xiong T, Li CG, Bao Y. Seasonal forecasting of agricultural commodity price using a hybrid STL and ELM method: evidence from the vegetable market in China[J]. Neurocomputing, 2018, 275: 2831-2844. DOI: 10.1016/j.neucom.2017.11.053. [11] Gregor K, Danihelka I, Graves A, et al. DRAW: a recurrent neural network for image generation[J]. Computer Science, 2015: 1462-1471. DOI: 10.1109/LANMAN.2007.4295973. [12] 邹乐强.最小二乘法原理及其简单应用[J].科技信息, 2010, 2(23): 282-283. DOI: 10.3969/j.issn.1001-9960.2010.23.875.Zou LQ. The principle of least square method and its simple application[J]. Science & Technology Information, 2010, 2(23): 282-283. DOI: 10.3969/j.issn.1001-9960.2010.23.875. [13] 孙娅楠, 林文斌.梯度下降法在机器学习中的应用[J].苏州科技学院学报(自然科学版), 2018, 35(2): 26-31. DOI: 10.12084/j.issn.2096-3289.2018.02.006.Sun YN, Lin WB. Application of gradient descent method in machine learning[J]. Journal of Suzhou University of Science and Technology(Natural Science Edition), 2018, 35(2): 26-31. DOI: 10.12084/j.issn.2096-3289.2018.02.006. [14] Ghimire D, Lee J. Geometric feature-based facial expression recognition in image sequences using multi-class AdaBoost and support vector machines[J]. Sensors, 2013, 13(6): 7714-7734. DOI: 10.3390/s130607714. [15] 杨福芹, 冯海宽, 李振海, 等.基于赤池信息量准则的冬小麦叶面积指数高光谱估测[J].农业工程学报, 2016, 32(3): 163-168. DOI: 10.11975/j.issn.1002-6819.2016.03.023.Yang FQ, Feng HK, Li ZH, et al. Hyperspectral estimation of winter wheat leaf area index based on red pool information criterion[J]. Transactions of the Chinese Society of Agricultural Engineering, 2016, 32(3): 163-168. DOI: 10.11975/j.issn.1002-6819.2016.03.023. [16] 龙勇, 苏振宇, 汪於.基于季节调整和BP神经网络的月度负荷预测[J].系统工程理论与实践, 2018, 38(4): 1052-1060. DOI: 10.12011/1000-6788(2018).04-1052-09.Long Y, Su ZY, Wang Y. Monthly load forecasting based on seasonal adjustment and BP neural network[J]. Systems Engineering-Theory & Practice, 2018, 38(4): 1052-1060. DOI: 10.12011/1000-6788(2018).04-1052-09. [17] 温宁. HIV传播途径及机制[J].中国预防医学杂志, 2003, 4(1): 76-78. DOI: 10.3969/j.issn.1009-6639.2003.01.041.Wen N. The transmission route and mechanism of HIV[J]. Chin Prev Med, 2003, 4(1): 76-78. DOI: 10.3969/j.issn.1009-6639.2003.01.041. [18] 孙舒曼, 李智明, 张辉国, 等. 2011-2016年中国艾滋病疫情时空特征分析[J].中华疾病控制杂志, 2018, 22(12): 11-14, 19. DOI: 10.16462/j.cnki.zhjbkz.2018.12.002.Sun SM, Li ZM, Zhang HG, et al. Analysis of spatiotemporal characteristics of AIDS epidemic in China from 2011 to 2016[J]. Chin J Dis Control Prev, 2018, 22(12): 11-14, 19. DOI: 10.16462/j.cnki.zhjbkz.2018.12.002. [19] 张孟媛, 张强, 罗佳伟, 等.重庆市艾滋病发病人数的ARIMA时间序列分析[J].中国卫生统计, 2018, 35(5): 12-16. DOI: CNKI:SUN:ZGWT.0.2018-05-003.Zhang MY, Zhang Q, Luo JW, et al. ARIMA time series analysis of AIDS incidence in Chongqing[J]. Chin J Health Statistics, 2018, 35(5): 12-16. DOI: CNKI:SUN:ZGWT.0.2018-05-003.