A neural network risk prediction model of coal workers′ pneumoconiosis-a hospital-based case-control study
-
摘要:
目的 旨在构建高性能煤工尘肺(coal workers′ pneumoconiosis, CWP)风险预测模型,促进CWP的早期预防。 方法 基于医院的病例对照研究,收集2017―2022年山西省某职业病医院的CWP患者和同期矿工非CWP患者病例资料,建立CWP数据库,采用随机森林筛选特征变量,基于反向传播(back propagation, BP)神经网络和logistic回归分析模型分别构建CWP预测模型,并利用受试者工作特征(receiver operating characteristic, ROC)曲线评价2个模型的CWP预测能力。 结果 BP神经网络模型灵敏度为88.6%,特异度为87.6%,准确率为87.12%;变量正态化重要性结果显示,影响煤矿工人发生CWP最重要的因素有1秒通气率(forceful expiratory volume in 1 second/ forceful vital capacity, FEV1/FVC)、工龄、工种。Logistic回归分析模型结果显示灵敏度80.7%,特异度84.1%,准确率82.7%。BP神经网络模型ROC曲线下面积(area under the curve, AUC)(AUC=0.918,95% CI:0.903~0.964)高于logistic回归分析模型(AUC=0.802,95% CI:0.750~0.850),BP神经网络模型的预测性能优于logistic回归分析模型。 结论 BP神经网络的预测性能高于logistic回归分析模型,将BP神经网络应用在CWP预测上有更高的准确性。FEV1/FVC、工龄、工种是影响煤矿工人发生CWP的重要因素。 -
关键词:
- 反向传播神经网络 /
- 煤工尘肺 /
- Logistic回归分析模型 /
- 预测模型
Abstract:Objective This study aims to construct a high-efficiency coal workers′ pneumoconiosis (CWP) risk prediction model to promote early prevention of CWP. Methods We conducted a case-control study based on hospital records, collected case data of coal workers diagnosed with CWP and non-CWP in an occupational disease hospital in Shanxi Province from 2017 to 2022 and established a database of CWP. Random forest method was used to screen the characteristic variables. The CWP prediction model was constructed based on back propagation (BP) neural network and Logistic regression respectively, and the CWP prediction ability of the two models was evaluated by receiver operating characteristic (ROC). Results The BP neural network model demonstrated a sensitivity of 88.6%, a specificity of 87.6%, and an accuracy rate of 87.12%. Based on variable normalization importance analysis, the most influential factors for CWP prevalence in coal workers were forceful expiratory volume in 1 second/ forceful vital capacity (FEV1/FVC), working age and work type. The logistic regression model showed a sensitivity of 80.7%, a specificity of 84.1%, and an accuracy rate of 82.7%. The BP neural network model exhibited a higher area under the curve (AUC) value (AUC=0.918, 95% CI: 0.903-0.964) compared to the logistic regression model (AUC=0.802, 95% CI: 0.750-0.850), indicating superior predictive performance. Conclusions The BP neural network model provides better predictive performance compared to the logistic regression model, and applying the BP neural network to CWP prediction has higher accuracy. FEV1/FVC, working age and work type are identified as significant factors influencing the occurrence of CWP in coal workers. -
图 1 反向传播神经网络模型预测结果
FEV1/FVC:1秒通气率;FVE1:1秒用力呼气量;FVC:用力肺活量;TC:总胆固醇;HDL-C:高密度脂蛋白胆固醇;TG:三酰甘油。
Figure 1. Prediction results of Back Propagation neural network model
FEV1/FVC: forceful expiratory volume in 1 second./forceful vital capacity; FVE1: forced expiratory volume in 1 second; FVC: forced vital capacity; TC: total Cholesterol; HDL-C: high-density lipoprotein cholesterol; TG: triglyceride.
表 1 研究对象基本特征
Table 1. Basic information of input variables
变量
Variable病例组
Case group ①
(n=553)对照组
Control group ①
(n=430)χ2/Z值 value P值 value 年龄组/岁 Age group/years 51.03±7.01 41.13±9.32 266.136 <0.001 <45 76(13.74) 273(63.49) 45~<60 437(79.03) 153(35.58) ≥60 40(7.23) 4(0.93) BMI/(kg·m-2) 52.864 <0.001 偏瘦 Thin (<18.5) 9(1.63) 4(0.93) 正常 Normal (18.5~<24.0) 229(41.41) 130(30.23) 超重 Overweight (24.0~<28.0) 279(50.45) 202(46.98) 肥胖 Fat (≥28.0) 36(6.51) 94(21.86) 吸烟 Smoking 42.914 <0.001 是 Yes 236(42.68) 274(63.72) 否 No 317(57.32) 156(36.28) 饮酒 Drinking 179.934 <0.001 是 Yes 100(18.08) 256(59.53) 否 No 453(81.92) 174(40.47) 高血压 Hypertension 7.581 0.007 是 Yes 154(27.85) 87(20.23) 否 No 399(72.15) 343(79.77) 血脂四项 Four aspects of blood lipids/(mmol·L-1) TC 4.63(4.12, 4.96) 4.63(4.09, 5.20) 1.113 0.266 TG 1.77(1.28, 2.03) 1.53(1.02, 2.25) -2.658 0.086 LDL-C 2.89(2.55, 3.16) 2.89(2.49, 3.28) 0.040 0.968 HDL-C 1.28(1.12, 1.39) 1.19(1.04, 1.33) -4.274 <0.05 肺功能检查 Pulmonary function test /% FVC 90.00(73.05, 106.20) 85.64(80.26, 94.30) -2.104 0.035 FEV1 93.00(77.15, 112.00) 90.97(83.68, 98.70) -1.908 0.056 FEV1/FVC 87.00(78.80, 89.98) 105.67(94.17, 111.82) 16.958 <0.001 职业暴露情况 Occupational exposure situation 工种 Type of work 68.472 <0.001 掘进工 Heading man 96(17.36) 84(19.53) 采煤工 Coal miner 187(33.81) 124(28.84) 支护工 Support worker 70(12.66) 34(7.91) 混合工 Mixed worker 152(27.49) 72(16.74) 辅助工 Auxiliary worker 48(8.68) 116(26.98) 工龄/年 Seniority/year 21.57±9.22 10.19±8.02 278.241 <0.001 <10 70(12.66) 255(59.30) 10~<20 178(32.19) 118(27.44) 20~<30 173(31.28) 40(9.30) ≥30 132(23.87) 17(3.95) 注:TC,总胆固醇;TG,三酰甘油;LDL-C,低密度脂蛋白胆固醇;HDL-C,高密度脂蛋白胆固醇;FVC,用力肺活量;FEV1,1秒钟用力呼气量; FEV1/FVC, 1秒通气率。
①以$\overline x \pm s$、M(P25, P75)或人数(占比/%)表示。
Note: TC, total cholesterol; TG, triglycerides; LDL-C, low-density lipoprotein cholesterol; HDL-C, high-density lipoprotein cholesterol; FVC, forceful vital capacity; FEV1, forceful expiratory volume in 1 second; FEV1/FVC:forceful expiratory volume in 1 second/forceful vital capacity; FEV1/FVC, forceful expiratory volume in 1 second/forceful vital capacity.
① $\overline x \pm s$, M(P25, P75) or number of people (proportion/%).表 2 前10位特征变量重要性
Table 2. Importance of top 10 characteristic variables
排序 Sort 变量 Variable 重要性 Importance 1 FEV1/FVC/% 0.257 2 工龄/年 Seniority/year 0.189 3 工种 Type of work 0.069 4 年龄/岁 Age/years 0.066 5 FEV1/% 0.065 6 FVC/% 0.050 7 TC/(mmol·L-1) 0.043 8 HDL-C/(mmol·L-1) 0.043 9 TG/(mmol·L-1) 0.043 10 饮酒 Drinking 0.041 注:FEV1/FVC, 1秒通气率;FEV1,1秒钟用力呼气量;FVC,用力肺活量;TC,总胆固醇;HDL-C,高密度脂蛋白胆固醇;TG,三酰甘油。
Note: FEV1/FVC, forceful expiratory volume in 1 second/forceful vital capacity; FEV1, forceful expiratory volume in 1 second; FVC, forceful vital capacity; TC, total cholesterol; HDL-C, high-density lipoprotein cholesterol; TG, triglycerides.表 3 前10位特征变量重要性
Table 3. Importance of top 10 characteristic variables
变量 Variable 系数
Coefficient回归标准差
${s_{\overline x }}$标准化系数
Standardized CoefficientP值 value 共线性统计 容差 Tolerance VIF 常量 Constant 1.421 0.152 0 FVC/% 0.000 0.000 0.048 0.292 0.249 4.019 FEV1/% -0.001 0.000 -0.100 0.029 0.250 4.003 FEV1/FVC /% 0.010 0.001 0.303 <0.001 0.742 1.347 饮酒 Drinking -0.188 0.027 -0.182 <0.001 0.789 1.267 TC/(mmol·L-1) 0.024 0.013 0.045 0.060 0.900 1.111 TG/(mmol·L-1) -0.014 0.010 -0.033 0.166 0.917 1.090 HDL-C/(mmol·L-1) -0.072 0.045 -0.038 0.108 0.942 1.062 辅助工 Auxiliary worker 0.023 0.008 0.065 0.005 0.961 1.040 工龄/年 Seniority/year -0.013 0.001 -0.270 <0.001 0.606 1.651 年龄/岁 Age/years -0.009 0.002 -0.171 <0.001 0.588 1.700 注:FFVC,用力肺活量;FEV1,1秒钟用力呼气量;EV1/FVC:1秒通气率; TC,总胆固醇;TG,三酰甘油;HDL-C,高密度脂蛋白胆固醇; VIF,方差膨胀因子;FEV1/FVC:1秒通气率。
Note: FVC, forceful vital capacity; FEV1, forceful expiratory volume in 1 second; FEV1/FVC:forceful expiratory volume in 1 second/forceful vital capacity; TC, total cholesterol; TG, triglycerides; HDL-C, high-density lipoprotein cholesterol; VIF, variance inflation factor; FEV1/FVC:forceful expiratory volume in 1 second./forceful vital capacity. -
[1] Castranova V, Vallyathan V. Silicosis and coal workers′ pneumoconiosis [J]. Environ Health Perspect, 2000, 108 Suppl 4(Suppl 4): 675-684. DOI: 10.1289/ehp.00108s4675. [2] Leung CC, Yu IT, Chen W. Silicosis [J]. Lancet, 2012, 379(9830): 2008-2018. DOI: 10.1016/s0140-6736(12)60235-9. [3] Weeks JL. The Mine Safety and Health Administration′s criterion threshold value policy increases miners′ risk of pneumoconiosis [J]. Am J Ind Med, 2006, 49(6): 492-498. DOI: 10.1002/ajim.20318. [4] Mukherjee AK, Bhattacharya SK, Saiyed HN. Assessment of respirable dust and its free silica contents in different Indian coalmines[J]. Ind Health, 2005, 43(2): 277-284. DOI: 10.2486/indhealth.43.277. [5] Xi ZL, Jiang MM, Yang JJ, et al. Experimental study on advantages of foam-Sol in coal dust control[J]. Process Saf Environ Prot, 2014, 92(6): 637-644. DOI: 10.1016/j.psep.2013.11.004. [6] Qi XM, Luo Y, Song MY, et al. Pneumoconiosis: current status and future prospects[J]. Chin Med J (Engl), 2021, 134(8): 898-907. DOI: 10.1097/cm9.0000000000001461. [7] Ge XY, Cui K, Ma HL, et al. Cost-effectiveness of comprehensive preventive measures for coal workers′ pneumoconiosis in China [J]. BMC Health Serv Res, 2022, 22(1): 266. DOI: 10.1186/s12913-022-07654-7. [8] Zhang L, Zhu L, Li ZH, et al. Analysis on the disease burden and its impact factors of coal worker′s pneumoconiosis inpatients [J]. J Peking Univ Health Sci, 2014, 46(2): 226-231. [9] Hao C, Jin N, Qiu C, et al. Balanced convolutional neural networks for pneumoconiosis detection[J]. Int J Environ Res Public Health, 2021, 18(17): 9091. DOI: 10.3390/ijerph18179091. [10] Moons KGM, Kengne AP, Woodward M, et al. Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker [J]. Heart, 2012, 98(9): 683-690. DOI: 10.1136/heartjnl-2011-301246. [11] Zhang Y, Zhang Y, Liu B, et al. Prediction of the length of service at the onset of coal workers′ pneumoconiosis based on neural network [J]. Arch Environ Occup Health, 2020, 75(4): 242-250. DOI: 10.1080/19338244.2019.1644278. [12] Knight D, Ehrlich R, Cois A, et al. Predictors of silicosis and variation in prevalence across mines among employed gold miners in South Africa[J]. BMC Public Health, 2020, 20(1): 829. DOI: 10.1186/s12889-020-08876-2. [13] Han B, Liu H, Zhai G, et al. Estimates and predictions of coal workers′ pneumoconiosis cases among redeployed coal workers of the Fuxin mining industry group in China: a historical cohort study [J]. PLoS One, 2016, 11(2): e0148179. DOI: 10.1371/journal.pone.0148179. [14] Zhou D, Zhu D, Li N, et al. Exploration of three incidence trend prediction models based on the number of diagnosed pneumoconiosis cases in China from 2000 to 2019 [J]. J Occup Environ Med, 2021, 63(7): e440-e444. DOI: 10.1097/jom.0000000000002258. [15] 王嵘冰, 徐红艳, 李波, 等. BP神经网络隐含层节点数确定方法研究[J]. 计算机技术与发展, 2018, 28(4): 31-35. DOI: 10.3969/j.issn.1673-629X.2018.04.007.Wang RB, Xu HY, Li B, et al. Research on method of determining hidden layer nodes in BP neural network[J]. Comput Technol Dev, 2018, 28(4): 31-35. DOI: 10.3969/j.issn.1673-629X.2018.04.007. [16] Li JM, Dong X, Ruan SM, et al. A parallel integrated learning technique of improved particle swarm optimization and BP neural network and its application [J]. Sci Rep, 2022, 12: 19325. DOI: 10.1038/s41598-022-21463-2. [17] Shao F, Huang Q, Wang C, et al. Artificial neural networking model for the prediction of early occlusion of bilateral plastic stent placement for inoperable hilar cholangiocarcinoma [J]. Surg Laparosc Endosc Percutan Tech, 2018, 28(2): e54-e58. DOI: 10.1097/sle.0000000000000502. [18] Liang Y, Li Q, Chen P, et al. Comparative study of back propagation artificial neural networks and logistic regression model in predicting poor prognosis after acute ischemic stroke [J]. Open Med (Wars), 2019, 14: 324-330. DOI: 10.1515/med-2019-0030. [19] Wu JH, Wang XH, Guo XL, et al. Forecasting incidence seniority of coal workers′ pneumoconiosis based on BP neural network[M]. Lecture Notes in Electrical Engineering. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013: 559-564. DOI: 10.1007/978-3-642-35440-3_73. [20] Zhang HY, Zou WM, Wu CR, et al. Influencing factors of pulmonary dysfunction in coal worker′s pneumoconiosis[J]. Zhonghua Lao Dong Wei Sheng Zhi Ye Bing Za Zhi, 2007, 25(1): 11-14. [21] Tong Y, Kong YY, Bian H, et al. Survival and disease burden trend analysis of occupational pneumoconiosis from 1963 to 2020 in Shizuishan City[J]. Zhonghua Lao Dong Wei Sheng Zhi Ye Bing Za Zhi, 2022, 40(5): 341-347. DOI: 10.3760/cma.j.cn121094-20210906-00439. [22] Han F, Chen YQ, Wu B, et al. Occupational health risk assessment of coal dust in coal industry chain[J]. Zhonghua Lao Dong Wei Sheng Zhi Ye Bing Za Zhi, 2018, 36(4): 291-294. DOI: 10.3760/cma.j.issn.1001-9391.2018.04.015. [23] Takigawa T, Kishimoto T, Nabe M, et al. The current state of workers′ pneumoconiosis in relationship to dusty working environments in Okayama Prefecture, Japan [J]. Acta Med Okayama, 2002, 56(6): 303-308. DOI: 10.18926/amo/31694. [24] Perret JL, Plush B, Lachapelle P, et al. Coal mine dust lung disease in the modern era[J]. Respirology, 2017, 22(4): 662-670. DOI: 10.1111/resp.13034.