Prediction for the outpatient amount of childhood common respiratory diseases based on multivariate LSTM model with lag effect
-
摘要:
目的 构建儿童常见呼吸道疾病日就诊人数的预测模型, 分析未来就诊人数的变化趋势, 为儿童常见呼吸道疾病的科学防控提供数据支撑。 方法 利用某医院2017年1月1日-2019年12月31日每日就诊病例及同期气象和大气污染物数据, 采用分布滞后非线性模型(distribution lag nonlinear models, DLNM)分别分析日均气温及污染物浓度对春、秋季学期日就诊人数的影响及滞后效应, 在此基础上构造多变量长短期记忆(long short-term memory, LSTM)模型对春、秋季学期日就诊人数进行预测。 结果 选取春、秋季学期日均气温的中位数进行研究, 发现日均气温对秋季学期日就诊人数的影响滞后7 d其后持续约10 d, 而对春季学期表现为即时效应且持续约4 d。结合滞后效应构造的多变量LSTM模型对春、秋季学期日就诊人数均能较好地预测, 测试集上的平均绝对百分比误差(mean absolute percentage error, MAPE)分别为4.59%和4.77%。 结论 考虑滞后效应的多变量LSTM模型能够较准确地对儿童常见呼吸道疾病日就诊人数进行预测, 为疾病的预防和控制提供科学依据。 Abstract:Objective To construct the prediction model of the daily outpatient amount of childhood common respiratory diseases and analyze the trend of the outpatient amount in the future, which will provide data support for the scientific prevention and control of common respiratory diseases in children. Methods Based on the daily outpatient cases of a hospital and meteorological and air pollutant data from January 1, 2017 to December 31 2019, the distributed lag nonlinear model (DLNM) was used to analyze the influence and lag effect of average daily temperature and pollutant concentration on the daily outpatient amount in spring and autumn semesters.A multivariate long and short-term memory (LSTM) model was constructed to predict daily outpatient amounts in the spring and autumn semesters. Results The median average daily temperature in the spring and autumn semesters was selected for research, and we found that the impact of average daily temperature on the daily outpatient amount in the autumn semester lagged 7 days and lasted for about 10 days, while the effect on the spring semester was immediate and lasted for about 4 days.The multivariate LSTM model combined with the lag effect can predict daily outpatient amount in spring and autumn semesters well, the mean absolute percentage error (MAPE) on the test set was 4.59% and 4.77%, respectively. Conclusion The multivariate LSTM model combined with the lag effect can accurately predict the daily outpatient amount, which provides a scientific basis for the prevention and control of diseases. -
表 1 春季学期数据描述性统计
Table 1. Descriptive statistics of spring semesters data
变量 日就诊人数 日均气温
(℃)PM2.5
(μg/m3)CO
(μg/m3)SO2
(μg/m3)NO2
(μg/m3)最小值 1 228 1.50 0.00 0.30 2.00 14.00 下四分位数 1 709 13.00 29.00 0.70 7.00 29.25 中位数 1 912 21.00 44.00 0.90 10.00 37.00 平均值 1 950 19.34 50.17 0.95 11.98 41.75 上四分位数 2 156 25.88 65.00 1.10 14.75 50.00 最大值 3 061 33.50 272.00 3.00 53.00 121.00 表 2 秋季学期数据描述性统计
Table 2. Descriptive statistics of autumn semesters data
变量 日就诊人数 日均气温
(℃)PM2.5
(μg/m3)CO
(μg/m3)SO2
(μg/m3)NO2
(μg/m3)最小值 847 -9.00 0.00 0.40 4.00 17.00 下四分位数 2 175 1.00 25.00 0.80 8.00 37.00 中位数 2 690 9.25 45.50 1.10 12.00 52.00 平均值 2 869 9.97 55.76 1.16 13.26 52.31 上四分位数 3 347 17.62 72.25 1.40 17.00 66.00 最大值 6 762 29.00 255.00 3.30 38.00 110.00 表 3 春、秋季预测模型APE
Table 3. The absolute percentage error of the spring and autumn semester prediction model
春季学期 秋季学期 日期 绝对百分比误差(%) 日期 绝对百分比误差(%) 6月30日 8.15 12月2日 4.14 7月1日 0.38 12月3日 4.64 7月2日 7.04 12月4日 4.85 7月3日 1.33 12月5日 1.35 7月4日 3.96 12月6日 2.70 7月5日 2.16 12月7日 8.37 7月6日 9.16 12月8日 7.35 -
[1] 田文华, 彭希哲, 梁鸿. 城市空气污染造成儿童健康经济损失的研究[J]. 中国卫生经济, 2002, 21(10): 16-17. DOI: 10.3969/j.issn.1003-0743.2002.10.006.Tian WH, Peng XZ, Liang H. A study on the health-economic loss of children due to the air pollution in City[J]. Chinese Health Economics, 2002, 21(10): 16-17. DOI: 10.3969/j.issn.1003-0743.2002.10.006. [2] 孙丽萍, 周丹, 王彩霞, 等. 2013-2017年某三级综合医院儿童法定传染病流行特征[J]. 中国感染控制杂志, 2019, 18(2): 153-157. DOI: 10.12138/j.issn.1671-9638.20193910.Sun LP, Zhou D, Wang CX, et al. Epidemic features of notifiable infectious diseases among children in a tertiary general hospital, 2013-2017[J]. Chin J Infect Control, 2019, 18(2): 153-157. DOI: 10.12138/j.issn.1671-9638.20193910. [3] 秦小平, 王传清, 王荔, 等. 儿童医院应对新型冠状病毒感染肺炎疫情的防控能力调查分析[J]. 中华医院感染学杂志, 2020, 30(11): 1606-1609. DOI: 10.11816/cn.ni.2020-200296.Qin XP, Wang CQ, Wang L, et al. Investigation and analysis of children's hospital's prevention and control ability against COVID-19[J]. Chin J Nosocomiol, 2020, 30(11): 1606-1609. DOI: 10.11816/cn.ni.2020-200296. [4] 廖华龙, 曾小茜, 李华凤, 等. 机器学习在疾病预测中的应用[J]. 生物医学工程研究, 2021, 40(2): 203-209. DOI: 10.19529/j.cnki.1672-6278.2021.02.17.Liao HL, Zeng XQ, Li HF, et al. Application of machine learning in disease prediction[J]. Journal of Biomedical Engineering Research, 2021, 40(2): 203-209. DOI: 10.19529/j.cnki.1672-6278.2021.02.17. [5] 李顺勇, 张钰嘉. LSTM和Prophet模型在肺结核发病数预测中的应用[J]. 河南科学, 2020, 38(2): 173-178. DOI: 10.3969/j.issn.1004-3918.2020.02.001.Li SY, Zhang YJ. Application of LSTM and Prophet Models in Predicting the Number of Tuberculosis Cases[J]. Henan Sci, 2020, 38(2): 173-178. DOI: 10.3969/j.issn.1004-3918.2020.02.001. [6] 徐佩, 樊重俊, 朱人杰, 等. 基于Prophet-LSTM-PSO组合模型的医院住院量预测研究[J]. 上海理工大学学报, 2021, 43(1): 68-72. DOI: 10.13255/j.cnki.jusst.20200308003.Xu P, Fan CJ, Zhu RJ, et al. Prediction of hospital inpatients based on combined Prophet-LSTM-PSO model[J]. J Univ Shanghai for Sci Technol, 2021, 43(1): 68-72. DOI: 10.13255/j.cnki.jusst.20200308003. [7] 高秋菊, 周宇畅, 赵树青, 等. ARIMA乘积季节模型和LSTM深度神经网络对石家庄市手足口病疫情预测效果的比较[J]. 中华疾病控制杂志, 2020, 24(1): 73-78. DOI: 10.16462/j.cnki.zhjbkz.2020.01.015.Gao QJ, Zhou YC, Zhao SQ, et al. Comparison on predictive capacity of ARIMA model and LSTM model for incidence of hand, foot and mouth disease in Shijiazhuang[J]. Chin J Dis Control Prev, 2020, 24(1): 73-78. DOI: 10.16462/j.cnki.zhjbkz.2020.01.015.cnki.zhjbkz.2020.01.015. [8] 洪也, 张莹, 马雁军, 等. 沈阳大气污染物与气象因素对呼吸疾病门诊数的影响[J]. 中国环境科学, 2020, 40(9): 4077-4090. DOI: 10.19674/j.cnki.issn1000-6923.2020.0454.Hong Y, Zhang Y, Ma YJ, et al. Effects of air pollutants and meteorological factors on outpatient visitors for respiratory diseases in Shenyang[J]. China Environmental Science, 2020, 40(9): 4077-4090. DOI: 10.19674/j.cnki.issn1000-6923.2020.0454. [9] 孙凤霞, 熊丽林, 杨华凤, 等. 2013-2019年南京市大气PM2.5短期暴露对人群超额死亡风险评估[J]. 中华疾病控制杂志, 2021, 25(11): 1257-1263. DOI: 10.16462/j.cnki.zhjbkz.2021.11.004.Sun FX, Xiong LL, Yang HF, et al. Assessing the excess mortality related to short-term exposure to PM2.5 in Nanjing from 2013 to 2019[J]. Chin J Dis Control Prev, 2021, 25(11): 1257-1263. DOI: 10.16462/j.cnki.zhjbkz.2021.11.004. [10] Gasparrini A, Armstrong B, Kenward MG. Distributed lag non-linear models[J]. Stat Med, 2010, 29(21): 2224-2234. DOI: 10.1002/sim.3940. [11] 朱晓娟. 基于机器学习的健康风险评估与预测[D]. 成都: 电子科技大学, 2020.Zhu XJ. Health Risk Assessment and Prediction Based on Machine Learning[D]. Chengdu: University of Electronic Science and Technology of China, 2020. [12] 黄钰姝, 宋和佳, 张睿, 等. ARIMAX与多变量LSTM模型在盐城市总死亡人数预测中的比较研究[J]. 公共卫生与预防医学, 2021, 32(5): 6-10. DOI: 10.3969/j.issn.1006-2483.2021.05.002.Huang YS, Song HJ, Zhang R, et al. Comparison of ARIMAX and multivariate LSTM model in predicting daily death toll in Yancheng City[J]. J Pub Health Prev Med, 2021, 32(5): 6-10. DOI: 10.3969/j.issn.1006-2483.2021.05.002. [13] 龚风云, 王凯. 基于LSTM神经网络的乌鲁木齐市流感样病例的预测研究[J]. 科技视界, 2019, 1(31): 20-22. DOI: 10.19694/j.cnki.issn2095-2457.2019.31.009.Gong FY, Wang K. Prediction of influenza-like cases in urumqi based on LSTM neural network[J]. Science & Technology Vision, 2019, 1(31): 20-22. DOI: 10.19694/j.cnki.issn2095-2457.2019.31.009. [14] 乐满, 王式功, 谢佳君, 等. 环境条件对遵义市呼吸系统疾病的影响及预测研究[J]. 中国环境科学, 2018, 38(11): 4334-4347. DOI: 10.19674/j.cnki.issn1000-6923.2018.0484.Yue M, Wang SG, Xie JJ, et al. Study about the impact of environmental conditions on respiratory diseases and prediction in Zunyi City[J]. China Environmental Science, 2018, 38(11): 4334-4347. DOI: 10.19674/j.cnki.issn1000-6923.2018.0484.