Risk prediction of human lung ventilation dysfunction in coal miners based on machine learning
-
摘要:
目的 研究煤矿工人肺通气功能障碍的影响因素,通过机器学习算法构建矿工肺通气功能障碍发生的风险预测模型,为尽早识别肺通气功能障碍的高危人群、保护矿工健康状况提供重要的科学依据。 方法 选取2021年4月20日―5月3日在陕北某煤矿参加职业健康体检的679名矿工作为研究对象。通过非条件多因素logistic回归分析模型分析结果确定变量,构建逻辑回归(logistic regression, LR)、随机森林(random forest, RF)、支持向量机(support vector machines, SVM)和极端梯度提升树(extreme gradient boosting, XGBoost)模型并根据4种模型的准确度、灵敏度、特异性、阳性预测值、阴性预测值、F1评分、受试者工作特征曲线(receiver operating characteristic, ROC)下面积评估模型的性能。 结果 LR、RF、SVM和XGBoost模型的准确率分别为69.61%、70.59%、72.06%和75.49%。灵敏度分别为61.22%、58.16%、60.20%和64.29%。特异性分别为77.36%、82.08%、83.02%和85.85%。阳性预测值分别为71.42%、75.00%、76.62%和80.77%。阴性预测值分别为68.33%、67.97%、69.29%和72.22%。F1分数为0.66、0.66、0.67和0.72。ROC曲线下面积分别为0.78、0.78、0.78和0.81。XGBoost模型的预测性能优于其他模型,预测精度较高。 结论 运用XGBoost模型预测煤矿工人的肺通气功能障碍风险,为煤矿工人的健康管理提供相应的理论依据。 Abstract:Objective The objective of this study was to explore the factors influencing lung ventilation dysfunction in coal miners and establish a high-accuracy predictive model using machine learning algorithms. This would aid in early detection of high-risk individuals and ensure better health safety measures for miners. Methods A total of 679 miners from a northern Shaanxi coal mine who underwent occupational health examination between April 20 and May 3, 2021, were enrolled in the study. Using unconditional multivariate logistic regression analysis and Spearman correlation test to ascertain variables, we built logistic regression (LR), random forest (RF), support vector machines (SVM), and extreme gradient boosting (XGBoost) models. The models′ performance was evaluated on metrics such as accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and area under the receiver operating characteristic (ROC) curve. Results The accuracy rates of LR, RF, SVM and XGBoost models were 69.61%, 70.59%, 72.06% and 75.49%, respectively. The sensitivity was 61.22%, 58.16%, 60.20% and 64.29%, respectively. The specificities were 77.36%, 82.08%, 83.02% and 85.85%, respectively. The positive predictive values were 71.42%, 75.00%, 76.62% and 80.77%, respectively. The negative predictive values were 68.33%, 67.97%, 69.29% and 72.22%, respectively. F1 scores are 0.66, 0.66, 0.67 and 0.72. The areas under the ROC curve are 0.78, 0.78, 0.78 and 0.81, respectively. Among all models, the XGBoost model exhibited superior performance, and the prediction accuracy was high. Conclusions The XGBoost model proved to be an effective tool in predicting the risk of pulmonary ventilation dysfunction in coal miners. This model could form a corresponding theoretical basis for the health management of coal miners. -
Key words:
- Coal miners /
- Dysfunction of pulmonary ventilation /
- XGBoost /
- Predictive models
-
表 1 模型评价指标
Table 1. Model evaluation indicators
指标 意义 准确度 被准确预测的样本数量 灵敏度 研究参与者实际有肺通气障碍并准确被确定的百分比 特异性 研究参与者实际上没有肺通气障碍并准确被确定的百分比 阳性预测值 研究参与者实际有肺通气功能障碍占预测有肺通气功能障碍的百分比 阴性预测值 研究参与者实际没有肺通气功能障碍占预测没有肺通气功能障碍的百分比 F1分数 调和准确度跟召回率的平均值,是模型的综合性能评价 曲线下面积(area under curve,AUC) 受试者工作特征曲线(receiver operating characteristic, ROC)的AUC 表 2 煤矿工人肺通气功能障碍相关影响因素的分布特征
Table 2. Distribution characteristics of influencing factors related to pulmonary ventilation dysfunction in coal miners
变量 肺通气功能[人数(占比/%)]/[M(P25, P75)] P值 变量 肺通气功能[人数(占比/%)] P值 正常 肺通气功能障碍 正常 肺通气功能障碍 年龄/岁 33(30, 37) 36(32, 44) < 0.001 接尘工龄/年 < 0.001 BMI/(kg·m-2) 25.26(22.83, 27.05) 24.22(22.39, 26.26) 0.001 ≤5 115(37.25) 74(20.02) 受教育程度 0.001 >5~15 148(48.14) 180(48.46) 初中及以下 4(1.32) 22(6.03) >15 45(14.61) 117(31.52) 高中 124(40.25) 176(47.35) 煤尘浓度/(mg·m-3) < 0.001 大学及以上 180(58.43) 173(46.62) < 15 148(48.13) 72(19.35) 婚姻情况 0.001 ≥15~30 99(32.05) 163(43.91) 未婚 60(19.54) 39(10.51) >30 61(19.82) 136(36.74) 已婚 248(80.46) 332(89.49) CO浓度/(mg·m-3) < 0.001 家庭年均收入/万元 0.482 ≤14.00 114(37.01) 99(26.73) < 10 77(26.76) 108(27.48) >14.00~27.50 136(44.16) 111(29.91) 10~ < 15 156(54.51) 230(58.71) ≥27.50 58(18.83) 161(43.36) ≥15 54(18.73) 54(13.81) CO2浓度/(mg·m-3) < 0.001 吸烟情况 < 0.001 < 9 750 115(37.32) 104(28.01) 抽烟 148(48.13) 260(70.12) 9 750~≤15 700 129(41.93) 99(26.67) 不抽烟 160(51.87) 111(29.88) >15 700 64(20.75) 168(45.32) 饮酒情况 0.457 NO浓度/(mg·m-3) < 0.001 饮酒 193(62.69) 221(59.56) ≤0.10 116(37.74) 106(28.63) 不饮酒 115(37.31) 150(40.44) >0.10~0.20 144(46.65) 134(36.12) 锻炼情况 0.636 >0.20 48(15.61) 131(35.25) 锻炼 194(63.00) 226(60.86) NO2浓度/(mg·m-3) < 0.001 不锻炼 114(37.00) 145(39.14) < 0.18 109(35.37) 100(27.02) 睡眠时间/h < 0.001 0.18~ < 0.33 129(41.91) 108(29.13) < 8 179(58.11) 163(43.87) ≥0.33 70(22.72) 163(43.85) ≥8 129(41.89) 208(56.13) 轮班情况 0.025 没有轮班情况 134(43.53) 124(33.37) 现在处于轮班 156(50.66) 224(60.41) 以前有轮班情况 18(5.81) 23(6.22) 表 3 肺通气功能障碍影响因素的非条件多因素logistic回归分析
Table 3. Non-conditional multivariate logistic regression analysis of factors influencing pulmonary ventilation dysfunction
变量 β值 sx值 Wald χ2值 OR值 95% CI P值 下限 上限 年龄/岁 ≤35 1.00 >35 0.79 0.26 9.34 2.19 1.33 3.66 0.002 BMI/(kg·m-2) < 18.50 1.00 ≥18.50 -0.64 0.19 11.52 0.53 0.36 0.76 0.001 吸烟情况 否 1.00 是 0.63 0.20 10.01 1.88 1.27 2.78 0.002 轮班情况 没有轮班情况 1.00 现在处于轮班情况 0.64 0.20 10.01 1.90 1.28 2.83 0.002 曾经有轮班情况 0.57 0.43 1.76 1.76 0.77 4.13 0.184 煤尘浓度/(mg·m-3) < 15 1.00 >15~30 0.82 0.39 4.37 2.27 1.06 4.96 0.037 >30 2.14 0.32 45.02 8.50 4.64 16.28 < 0.001 NO浓度/(mg·m-3) ≤0.10 1.00 >0.10~0.20 -1.31 0.41 10.46 0.27 0.12 0.59 0.001 >0.20 -0.42 0.46 0.84 1.66 0.27 1.60 0.360 接尘工龄/年 ≤5 1.00 >5~15 1.10 0.30 13.14 3.00 1.67 5.49 < 0.001 >15 1.07 0.43 6.20 2.93 1.26 6.85 0.013 睡眠时间/h < 8 1.00 ≥8 0.63 0.19 11.16 1.88 1.30 2.73 0.001 表 4 4种模型的样本分类结果
Table 4. Sample classification results for four models
模型 预测值 实际值[人数(占比/%)] 合计 正常 肺通气障碍 LR 正常 60(61.22) 24(22.64) 84 肺通气障碍 38(38.78) 82(77.36) 120 RF 正常 57(58.16) 19(17.92) 76 肺通气障碍 41(41.84) 87(82.08) 128 SVM 正常 59(60.20) 18(16.98) 77 肺通气障碍 39(39.80) 88(83.02) 127 XGBoost 正常 63(64.29) 15(14.15) 78 肺通气障碍 35(35.71) 91(85.85) 126 注:1. LR:构建逻辑回归。2. RF:随机森林。3. SVM:支持向量机。4. XGBoost:极端梯度提升树。 表 5 4种模型的预测性能比较
Table 5. Comparison of the predictive performance of the four models
评价指标 LR RF SVM XGBoost 准确率/% 69.61 70.59 72.06 75.49 灵敏度/% 61.22 58.16 60.20 64.29 特异性/% 77.36 82.08 83.02 85.85 阳性预测值/% 71.42 75.00 76.62 80.77 阴性预测值/% 68.33 67.97 69.29 72.22 F1分数 0.66 0.66 0.67 0.72 AUC(95% CI) 0.78(0.72~0.85) 0.78(0.71~0.84) 0.78(0.71~0.84) 0.81(0.75~0.87) 注:1. LR:构建逻辑回归。2. RF:随机森林。3. SVM:支持向量机。4. XGBoost:极端梯度提升树。5. AUC:曲线下面积。 -
[1] 董吉良, 廖雍玲, 肖友立. 肺功能在煤工尘肺病劳动能力鉴定中的应用与探讨[J]. 中国职业医学, 2007, 34(6): 526-527. DOI: 10.3969/j.issn.1000-6486.2007.06.040.Dong JL, Liao YL, Xiao YL. Application and discussion on pulmonary function in assessment of labour ability for coal worker's pneumoconiosis[J]. Chin Occup Med, 2007, 34(6): 526-527. DOI: 10.3969/j.issn.1000-6486.2007.06.040. [2] Ullah R, Khan S, Ali H, et al. A comparative study of machine learning classifiers for risk prediction of asthma disease[J]. Photodiagnosis Photodyn Ther, 2019, 28: 292-296. DOI: 10.1016/j.pdpdt.2019.10.011. [3] 张博超, 杨朝, 郭立泉, 等. 基于机器学习的慢性阻塞性肺疾病急性加重预测模型的研究[J]. 中国康复理论与实践, 2022, 28(6): 678-683. DOI: 10.3969/j.issn.1006-9771.2022.06.008.Zhang BC, Yang Z, Guo LQ, et al. Prediction model of acute exacerbation of chronic obstructive pulmonary disease based on machine learning[J]. Chin J Rehabil Theory Pract, 2022, 28(6): 678-683. DOI: 10.3969/j.issn.1006-9771.2022.06.008. [4] Luo L, Li J, Lian S, et al. Using machine learning approaches to predict high-cost chronic obstructive pulmonary disease patients in China[J]. Health Informatics J, 2020, 26(3): 1577-1598. DOI: 10.1177/1460458219881335. [5] Sitting RF. American thoracic society[J]. Am Rev Respir Dis, 1987, 136: 1285-1298. DOI: https://doi.org/ 10.1164/ajrccm/136.5.1285. [6] 朱蕾, 沈勤军. 成人常规肺功能参数及其临床意义[J]. 中华结核和呼吸杂志, 2012, 35(1): 75-77. DOI: 10.3760/cma.j.issn.1001-0939.2012.01.026.Zhu L, Shen QJ. Routine pulmonary function parameters in adults and their clinical significance[J]. Chin J Tuberc Respir Dis, 2012, 35(1): 75-77. DOI: 10.3760/cma.j.issn.1001-0939.2012.01.026. [7] Guan X, Zhang B, Fu M, et al. Clinical and inflammatory features based machine learning model for fatal risk prediction of hospitalized COVID-19 patients: results from a retrospective cohort study[J]. Ann Med, 2021, 53(1): 257-266. DOI: 10.1080/07853890.2020.1868564. [8] Zheng Z, Chen Y, Yang Y, et al. A predictive model for abnormal bone density in male underground coal mine workers[J]. Int J Environ Res Public Health, 2022, 19(15): 9165. DOI: 10.3390/ijerph19159165. [9] Wu J, Qin S, Wang J, et al. Develop and evaluate a new and effective approach for predicting dyslipidemia in steel workers[J]. Front Bioeng Biotechnol, 2020, 8: 839. DOI: 10.3389/fbioe.2020.00839. [10] 袁亮. 煤矿粉尘防控与职业安全健康科学构想[J]. 煤炭学报, 2020, 45(1): 1-7. DOI: 10.13225/j.cnki.jccs.YG19.1790.Yuan L. Scientific conception of coal mine dust control and occupational safety[J]. J China Coal Soc, 2020, 45(1): 1-7. DOI: 10.13225/j.cnki.jccs.YG19.1790. [11] Zhao SY, He P, Yang CX, et al. Analysis of spirometer data of 5272 coal dust-exposed miners[J]. Chin J Ind Hyg Occup Dis, 2021, 39(7): 546-549. DOI: 10.3760/cma.j.cn121094-20200415-00197. [12] Roman MA, Rossiter HB, Casaburi R. Exercise, ageing and the lung[J]. Eur Respir J, 2016, 48(5): 1471-1486. DOI: 10.1183/13993003.00347-2016. [13] Sun Y, Milne S, Jaw JE, et al. BMI is associated with FEV1 decline in chronic obstructive pulmonary disease: a meta-analysis of clinical trials[J]. Respir Res, 2019, 20(1): 236. DOI: 10.1186/s12931-019-1209-5. [14] 冯钰, 曾晓丽, 董理, 等. 慢阻肺患者体质指数对肺功能的影响[J]. 中华健康管理学杂志, 2022, 16(4): 229-235. DOI: 10.3760/cma.j.cn115624-20211101-00652.Feng Y, Zeng XL, Dong L, et al. Effects of body mass index on lung function in patients with chronic obstructive pulmonary disease[J]. Chin J Health Manag, 2022, 16(4): 229-235. DOI: 10.3760/cma.j.cn115624-20211101-00652. [15] Wu Q, Han L, Xu M, et al. Effects of occupational exposure to dust on chest radiograph, pulmonary function, blood pressure and electrocardiogram among coal miners in an eastern Province, China[J]. BMC Public Health, 2019, 19(1): 1229. DOI: 10.1186/s12889-019-7568-5. [16] Mcnicholas WT, Hansson D, Schiza S, et al. Sleep in chronic respiratory disease: COPD and hypoventilation disorders[J]. Eur Respir Rev, 2019, 28(153): 190064. DOI: 10.1183/16000617.0064-2019.