Prediction of 3-mouth glycemic control in type 2 diabetes mellitus based on machine learning algorithm
-
摘要:
目的 评价Logistic回归算法和随机森林算法对2型糖尿病患者3个月后血糖控制情况的预测效果,并探究血糖控制的影响因素。 方法 收集顺义、通州区2型糖尿病患者的基线调查和随访信息,以患者3个月后糖化血红蛋白是否大于6.5%作为结局分类变量,使用随机森林算法和Logistic算法建立预测模型,通过受试者工作特征曲线下面积(area under the curve,AUC)、灵敏度等指标比较预测效果。 结果 患者血糖控制效果的影响因素有基线空腹血糖(P < 0.001)、病程(P < 0.001)、吸烟(P=0.026)、静态活动时间(P=0.006)、体重指数(超重P=0.002,肥胖P=0.011)、手环使用(P=0.028)和糖尿病饮食(P=0.002)7个因素;Logistic回归预测模型的AUC为0.738,灵敏度为72.9%,特异度68.1%,准确率71.2%,随机森林模型的AUC为0.756,灵敏度74.5%,特异度69.5%,准确率72.8%。 结论 随机森林算法预测效果优于Logistic回归预测模型,可应用于血糖控制效果预测,辅助糖尿病患者的管理。 -
关键词:
- 2型糖尿病 /
- 分类预测 /
- 随机森林算法 /
- Logistic回归算法
Abstract:Objective To evaluate the efficiency of Logistic regression algorithm and random forest algorithm in prediction of blood glucose control in patients with type 2 diabetes mellitus (T2DM) after 3 months, and explore the influencing factors of blood glucose control. Methods The data was extracted from baseline survey and follow-up information of patients with T2DM in Shunyi and Tongzhou Districts. The patient's 3-month glycosylated hemoglobin which was more than 6.5% was chosen as the outcome categorical variable. The random forest algorithm and Logistic algorithm were used to establish the prediction model. The predictive efficiency was evaluated with the area under receive operating characteristic curve (AUC) and accuracy rate. Results Factors affecting the patient's glycemic control included baseline fasting plasma glucose(P < 0.001), duration of disease(P < 0.001), smoking(P=0.026), static activity time(P=0.006), body mass index(overweight P=0.002, obesity P=0.011), bracelet use(P=0.028), and diabetes diet(P=0.002).The Logistic regression prediction model had an AUC of 0.738, a sensitivity of 72.9%, a specificity of 68.1%, and an accuracy of 71.2%. The random forest model had an AUC of 0.756, a sensitivity of 74.5%, a specificity of 69.5%, and an accuracy of 72.8%. Conclusions The efficiency of random forest is better than Logistic regression model, which can be applied to the prediction of blood glucose control and assist the management of diabetic patients. -
表 1 T2DM患者基本特征
Table 1. Basic characteristics of T2DM patients
变量 血糖控制达标[n(%)] 血糖控制未达标[n (%)] χ2值 P值 年龄(岁) 0.717 0.397 ≤60 97(42.4) 187(45.8) >60 132(57.6) 221(54.2) 性别 0.490 0.484 男 111(48.5) 186(54.4) 女 118(51.5) 222(45.6) 病程(年) 16.986 < 0.001 ≤5 145(63.3) 189(46.3) >5 84(36.7) 219(53.7) 文化程度 6.972 0.031 初中及以下 124(54.1) 231(56.6) 高中及中专 52(22.7) 115(28.2) 大学及以上 53(23.1) 62(15.2) 家庭人均月收入(元) 1.534 0.216 ≤3000 59(25.8) 124(30.4) > 3000 170(74.2) 284(69.6) 吸烟情况 48.904 < 0.001 吸烟 37(16.2) 47(27.5) 戒烟 112(20.5) 49(12.0) 从不吸烟 145(63.3) 247(60.5) 饮酒情况 2.784 0.249 饮酒 63(27.5) 117(28.7) 戒酒 23(10.0) 26(6.4) 从不饮酒 143(62.5) 265(64.9) BMI(kg/m2) 7.126 0.028 ≤23.9 68(29.7) 84(20.6) >24.0 100(43.7) 191(46.8) >28 61(26.6) 133(32.6) 中心性肥胖 1.566 0.211 否 138(60.3) 225(55.1) 是 91(39.7) 183(44.9) 糖尿病饮食 5.927 0.015 否 191(57.2) 217(46.8) 是 131(42.8) 98(53.2) 自我血糖监测 2.598 0.107 无 86(37.6) 180(44.1) 有 143(62.4) 228(55.9) 坚持使用手环 5.582 0.018 否 161(70.3) 321(78.7) 是 68(29.7) 87(21.3) 每日静态时间(h) 5.210 0.022 < 8 185(80.8) 357(87.5) ≥8 44(19.2) 51(12.5) 糖尿病家族史 0.577 0.447 无 111(48.5) 185(45.3) 有 118(51.5) 223(54.7) 运动情况 0.152 0.697 否 156(68.1) 284(69.6) 是 73(31.9) 124(30.4) 并发症数目(个) 1.884 0.597 0 84(36.7) 159(39.0) 1 74(32.3) 131(32.1) 2 37(16.2) 72(17.6) ≥3 34(14.8) 46(11.3) 高血压 7.363 0.007 否 39(17.0) 108(26.5) 是 190(83.0) 300(73.5) 药物治疗 3.825 0.051 否 35(15.3) 41(10.0) 是 194(84.7) 367(90.0) 胰岛素治疗 7.202 0.007 否 214(93.4) 353(86.5) 是 15(6.6) 55(13.5) 基线空腹血糖(mmol/L) 57.180 < 0.001 ≤6.1 60(26.2) 35(8.6) 6.2~ 117(51.1) 185(45.3) 7.0~ 41(17.9) 114(27.9) >8.4 11(4.8) 74(18.1) 表 2 糖尿病患者血糖控制效果多因素分析
Table 2. Logistic regression analysis of the effect of blood glucose control in diabetic patients
变量 β Wald χ2值 P值 OR(95% CI)值 基线空腹血糖(mmol/L) 0.691 9.940 < 0.001 2.00(1.53~2.61) 病程(年) 0.113 93.731 < 0.001 1.12(1.05~1.20) 戒烟/吸烟 -0.794 6.273 0.026 0.45(0.23~0.91) 静态活动时间(h) 0.827 9.058 0.006 2.29(1.26~4.13) 糖尿病饮食 -0.685 13.777 0.002 0.50(0.33~0.78) 超重(kg/m2) 0.855 11.746 0.002 2.35(1.39~3.99) 肥胖(kg/m2) 0.752 8.646 0.011 2.12(1.19~3.78) 坚持使用手环 -0.546 8.818 0.028 0.58(0.36~0.94) 表 3 Logistic回归预测模型在测试集上分类结果
Table 3. Classification results of Logistic regression prediction model on test set
预测结果 实际情况 合计 血糖控制未达标 血糖控制达标 血糖控制未达标 89 22 111 血糖控制达标 33 47 80 合计 122 69 191 表 4 随机森林预测模型在测试集上分类结果
Table 4. Classification results of random forest prediction model on test sets
预测结果 实际情况 合计 血糖控制未达标 血糖控制达标 血糖控制未达标 91 21 112 血糖控制达标 31 48 79 合计 122 69 191 -
[1] Group IDFDA. Update of mortality attributable to diabetes for the IDF Diabetes Atlas: Estimates for the year 2013[J]. Diabetes Res Clin Pract, 2015, 109(3): 461-465. DOI: 10.1016/j.diabres.2015.05.037. [2] Wang L, Gao P, Zhang M, et al. Prevalence and ethnic pattern of diabetes and prediabetes in China in 2013[J]. Jama, 2017, 317(24): 2515-2523. DOI: 10.1001/jama.2017.7596. [3] Ray KK, Seshasai SR, Wijesuriya S, et al. Effect of intensive control of glucose on cardiovascular outcomes and death in patients with diabetes mellitus: a meta-analysis of randomised controlled trials[J]. Lancet, 2009, 373(9677): 1765-1772. DOI: 10.1016/s0140-6736(09)60697-8. [4] 中华医学会糖尿病学分会. 中国2型糖尿病防治指南(2017年版)[J]. 中华糖尿病杂志, 2018, 10(1): 4-67. DOI: 10.3760/cma.j.issn.1674-5809.2018.01.003.Chinese Diabetes Society. Chinese guidelines for the prevention and treatment of type 2 diabetes (2017 Edition)[J]. Chin J Diabetes, 2018, 10(1): 4-67. DOI: 10.3760/cma.j.issn.1674-5809.2018.01.003. [5] Perry IJ, Wannamethee SG, Walker MK, et al. Prospective study of risk factors for development of non-insulin dependent diabetes in middle aged British men[J]. Bmj, 1995, 310(6979): 560-564. DOI: 10.1136/bmj.310.6979.560. [6] Joosten MM, Chiuve SE, Mukamal KJ, et al. Changes in alcohol consumption and subsequent risk of type 2 diabetes in men[J]. Diabetes, 2011, 60(1): 74-79. DOI: 10.2337/db10-1052. [7] 姚静静, 王海鹏, 黄小敏, 等. 山东省2型糖尿病患者家庭支持现状及对自我管理行为的影响[J]. 中华疾病控制杂志, 2019, 23(5): 573-577. DOI: 10.16462/j.cnki.zhjbkz.2019.05.016.Yao JJ, Wang HP, Huang XM, et al. Family support status of type 2 diabetes mellitus and its influence on self-management behavior in Shandong Province[J]. Chin J Dis Control Prev, 2019, 23(5): 573-577. DOI: 10.16462/j.cnki.zhjbkz.2019.05.016. [8] 金玲玲, 马雨杨, 叶青, 等. 规律服药的糖尿病患者体力活动水平与糖化血红蛋白控制状况的关系[J]. 中华疾病控制杂志, 2019, 23(5): 578-581, 587. DOI: 10.16462/j.cnki.zhjbkz.2019.05.017.Jin LL, Ma YY, Ye Q, et al. Relationship between physical activity level and hba1c control status in diabetic patients taking medication regularly[J]. Chin J Dis Control Prev, 2019, 23(5): 578-581, 587. DOI: 10.16462/j.cnki.zhjbkz.2019.05.017. [9] 贺媛, 曾强, 赵小兰. 中国成人肥胖、中心性肥胖与高血压和糖尿病的相关性研究[J]. 解放军医学杂志, 2015, 40(10): 803-808. DOI: 10.11855/j.issn.0577-7402.2015.10.07.He Y, Zeng Q, Zhao XL. Correlation between adult obesity, central obesity, hypertension, and diabetes in China[J]. Chinese Journal of Medicine, 2015, 40(10): 803-808. DOI: 10.11855/j.issn.0577-7402.2015.10.07. [10] 方匡南, 吴见彬, 朱建平, 等. 随机森林方法研究综述[J]. 统计与信息论坛, 2011, 26(3): 32-38. DOI: 10.3969/j.issn.1007-3116.2011.03.006.Fang KN, Wu JB, Zhu JP, et al. Review of random forest methods[J]. Statistics and information BBS, 2011, 26(3): 32-38. DOI: 10.3969/j.issn.1007-3116.2011.03.006. [11] Svetnik V, Liaw A, Tong C, et al. Random forest: a classification and regression tool for compound classification and QSAR modeling[J]. J Chem Inf Comput Sci, 2003, 43(6): 1947-1958. DOI: 10.1021/ci034160g. [12] Wolpert DH, Macready WG. An efficient method to estimate bagging's generalization error[J]. Machine Learning, 1999, 35(1): 41-55. DOI: 10.1023/a:1007519102914. [13] Lima RF, Fontbonne A, Carvalho EM, et al. Factors associated with glycemic control in people with diabetes at the Family Health Strategy in Pernambuco[J]. Rev Esc Enferm USP, 2016, 50(6): 937-945. DOI: 10.1590/s0080-623420160000700009. [14] 韩娜, 刘珏, 金楚瑶, 等. 2013-2017年北京市通州区34637例孕妇妊娠期糖尿病流行情况及其影响因素研究[J]. 中华疾病控制杂志, 2019, 23(2): 156-161. DOI: 10.16462/j.cnki.zhjbkz.2019.02.007.Han N, Liu J, Jin CY, et al. Prevalence and influencing factors of gestational diabetes in 34637 pregnant women from 2013 to 2017 in Tongzhou District, Beijing[J]. Chin J Dis Control Prev, 2019, 23(2): 156-161. DOI: 10.16462/j.cnki.zhjbkz.2019.02.007. [15] Diaz-Uriarte R, Alvarez De Andres S. Gene selection and classification of microarray data using random forest[J]. BMC Bioinformatics, 2006, 7: 3. DOI: 10.1186/1471-2105-7-3. [16] Li X, Yu S, Zhang Z, et al. Predictive modeling of hypoglycemia for clinical decision support in evaluating outpatients with diabetes mellitus[J]. Curr Med Res Opin, 2019: 1-7. DOI: 10.1080/03007995.2019.1636016. [17] Halladay JR, Dewalt DA, Wise A, et al. More extensive implementation of the chronic care model is associated with better lipid control in diabetes[J]. J Am Board Fam Med, 2014, 27(1): 34-41. DOI: 10.3122/jabfm.2014.01.130070.