Comparing performance of C5.0 decision tree and radial basis function neural network for predicting hemorrhagic transformation in patients with acute ischemic stroke
-
摘要:
目的 比较C5.0决策树与径向基函数(radial basis function,RBF)神经网络用于急性缺血性脑卒中(acute jschemic stroke,AIS)出血性转化(hemorrhagic transformation,HT)风险预测性能。 方法 将AIS住院患者作为研究对象,收集相关资料。根据入院2周内是否发生HT分为HT组与非HT组,建立C5.0决策树与RBF神经网络模型,比较两者的预测性能。 结果 共收集460份病历资料,按照训练集与测试集7 ∶3的比例分为训练集样本和测试集样本。C5.0决策树模型的训练集与测试集准确率分别为96.5%和80.1%,灵敏度为98.1%和82.6%,特异度为94.8%和77.9%,Kappa指数是0.93和0.60,AUC是0.97和0.80。RBF神经网络模型的训练集与测试集准确率分别为72.6%和74.7%,灵敏度为87.6%和88.4%,特异度为56.9%和62.3%,Kappa指数为0.45和0.50,AUC为0.72和0.75;在训练集中,C5.0决策树模型的预测性能优于RBF神经网络模型的预测性能。在测试集中,两者预测性能的差异无统计学意义。 结论 C5.0决策树模型的预测性能优于RBF神经网络模型的预测性能。 Abstract:Objective To compare performance of C5.0 decision tree models and radial basis function(RBF) neural network in predicting the risk of hemorrhagic transformation in acute ischemic stroke. Methods Patients with acute ischemic stroke admitted to hospital were enrolled. Hemorrhagic transformation group and non-hemorrhagic transformation group were divided according to whether hemorrhagic transformation occurred within 2 weeks after admission. Retrospectively collected patients' case information. C5.0 decision tree models and RBF neural network model were established with the ratio of 7:3 for training set and test set, and the prediction performance of the model was compared. Results A total of 460 patients' case information were collected and divided in 314 training set samples and 146 test set samples. Accuracy rates of the C5.0 decision tree model were 96.5% and 80.1%, sensitivities were 98.1% and 82.6%, specificities were 94.8% and 77.9%, Kappa index were 0.93 and 0.60, and AUC were 0.97 and 0.80. Accuracy rates of the neural network model were 72.6% and 74.7%, sensitivities were 87.6% and 88.4%, specificities were 56.9% and 62.3%, Kappa index were 0.45 and 0.50, and AUCs were 0.72 and 0.75. In the training set, the prediction performance of the C5.0 decision tree model was superior to the RBF neural network model. However, there was no statistical difference in the test set. Conclusion C5.0 decision tree model is better than RBF neural network model in risk prediction. -
表 1 急性缺血性脑卒中患者出血性转化单因素分析
Table 1. Univariate analysis of hemorrhagic transformation in patients with acute ischemic stroke (Enumeration data)
变量 病例[n(%)] χ2 P 非HT组(n=230) HT组(n=230) 性别 0.039 0.844 男 152(66.1) 150(65.2) 女 78(33.9) 80(34.8) 职业 1.887 0.389 农民 96(41.7) 82(35.7) 职工 91(39.6) 98(42.6) 其他 43(18.7) 50(21.7) 教育 1.839 0.399 初中及以下 108(47.0) 95(41.3) 高中 92(40.0) 106(46.1) 本科及以上 30(13.0) 29(12.6) 吸烟史 1.589 0.208 无 140(60.9) 153(66.5) 有 90(39.1) 77(33.5) 饮酒史 1.821 0.177 无 159(69.1) 172(74.5) 有 71(30.9) 58(25.2) 高血压病 13.939 <0.001 无 140(60.9) 100(43.5) 有 90(39.1) 130(56.5) 糖尿病 11.337 0.001 无 194(84.3) 164(71.3) 有 36(15.7) 66(28.7) 房颤 7.457 0.006 无 209(90.9) 189(82.2) 有 21(9.1) 41(17.8) 脑梗死史 7.578 0.006 无 175(76.1) 148(64.3) 有 55(23.9) 82(35.7) 脑出血史 0.069 0.793 无 222(97.7) 221(97.0) 有 8(3.5) 7(3.0) 抗凝史 0.620 0.431 无 224(97.4) 221(96.1) 有 6(2.6) 9(3.9) 抗血小板药物史 2.353 0.013 无 206(89.6) 195(84.8) 有 24(10.4) 35(15.2) 大面积脑梗死 6.922 0.009 无 198(86.1) 176(76.5) 有 32(13.9) 54(23.5) 脑白质疏松 4.005 0.045 无 213(93.6) 200(87.0) 有 17(7.4) 30(13.0) 早期CT低密度影 8.175 0.004 无 183(79.6) 156(67.8) 有 47(20.4) 74(32.2) 溶栓治疗 11.474 0.001 无 221(96.1) 201(87.4) 有 9(3.9) 29(12.6) 抗凝治疗 9.098 0.003 无 173(75.2) 143(62.2) 有 57(24.8) 87(37.8) 抗血小板治疗 0.514 0.474 无 64(27.8) 71(30.9) 有 166(72.2) 159(69.1) 表 2 急性缺血性脑卒中患者出血性转化单因素分析
Table 2. Univariate analysis of hemorrhagic transformation in patients with acute ischemic stroke (Measurement data)
变量 病例[M(P25,P75)/x±s] t/Z P 非HT组 HT组 年龄(岁) 63(56, 69) 62(56, 70) 0.102 0.918 体重指数(kg/m2) 26(24, 30) 25(23, 29) 1.114 0.265 发病至入院时间(h) 24(9, 24) 18(9, 25) 0.793 0.428 NIHSS评分 6(3, 14) 16(8, 20) 7.876 <0.001 收缩压(mmHg) 158.17±25.00 156.06±24.37 0.918 0.359 舒张压(mmHg) 90.23±14.00 88.37±13.83 1.413 0.153 白细胞(109/L) 7.10±1.85 8.73±3.22 6.653 <0.001 血小板(109/L) 216.82±69.81 210.79±68.47 0.936 0.350 单核细胞(109/L) 0.56±0.30 0.57±0.34 0.483 0.629 PT-INR 0.98(0.93, 1.03) 1.05(0.97, 1.10) 6.601 <0.001 纤维蛋白原(g/L) 3.67±1.253 3.86±1.305 1.607 0.109 白蛋白(g/L) 42.24±5.09 40.88±4.33 3.091 0.002 总胆固醇(mmol/L) 5.01±1.15 4.88±1.12 1.191 0.234 甘油三酯(mmol/L) 1.84±1.21 1.50±0.89 3.377 0.001 高密度脂蛋白(mmol/L) 1.20±0.29 1.16±0.31 1.511 0.132 低密度脂蛋白(mmol/L) 2.95±0.87 2.94±0.97 0.184 0.854 空腹血糖(mmol/L) 6.40±1.33 6.58±1.36 1.511 0.132 表 2 C5.0决策树和RBF神经网络模型训练集和测试集样本的分类结果[n(%)]
Table 2. The result of training andtest set in the C5.0 decision tree and RBF neural network model[n(%)]
模型 观测值 合计 是 否 决策树模型 训练集预测值 是 158(98.1) 8(5.2) 166 否 3(1.9) 145(94.8) 148 合计 161(100.0) 153(100.0) 314 测试集预测值 是 57(82.6) 17(22.1) 74 否 12(17.4) 60(77.9) 72 合计 69(100.0) 77(100.0) 146 RBF神经网络模型 训练集预测值 是 141(87.6) 66(43.1) 207 否 20(12.4) 87(56.9) 107 合计 161(100.0) 153(100.0) 314 测试集预测值 是 61(88.4) 29(37.7) 90 否 8(11.6) 48(62.3) 56 合计 69(100.0) 77(100.0) 146 表 3 两种风险模型在训练集和测试集的预测性能比较
Table 3. Predictive value of two risk models in the training and test set
评价指标 训练集 测试集 C5.0决策树模型 RBF神经网络模型 C5.0决策树模型 RBF神经网络模型 准确率(%) 96.50 72.60 80.10 74.70 平均正确性(%) 82.80 45.20 80.20 47.80 灵敏度(%) 98.10 87.60 82.60 88.40 特异度(%) 94.80 56.90 77.90 62.30 约登指数 0.93 0.45 0.61 0.51 符合率(%) 96.50 72.60 80.10 74.70 Kappa指数 0.93 0.45 0.60 0.50 阳性似然比 18.87 2.03 3.74 2.35 阴性似然比 0.02 0.22 0.22 0.19 阳性预测值(%) 95.20 68.10 77.00 67.80 阴性预测值(%) 98.00 81.30 83.30 85.70 AUC 0.97 0.72 0.80 0.75 AUC 95%CI lower 0.94 0.67 0.73 0.68 upper 0.98 0.77 0.86 0.82 表 4 训练集和测试集AUC的比较
Table 4. AUC comparison of the two risk models in the training and test set
模型 AUC差值 SE 95% CI Z值 P值 lower upper RBF神经网络模型vs C5.0决策树模型a 0.245 0 0.023 3 0.200 0 0.291 0 10.540 <0.001 RBF神经网络模型vs C5.0决策树模型b 0.048 9 0.025 2 -0.000 4 0.098 3 1.944 0.051 9 注:a训练集;b测试集。注:(a) 出血性转化的C5.0决策树模型图;(b) 训练集中两种预测模型的ROC曲线图;(c) 测试集中两种预测模型的ROC曲线图。 -
[1] 中华医学会神经病学分会脑血管病学组急性缺血性脑卒中诊治指南撰写组. 中国急性缺血性脑卒中诊治指南2010[J]. 中国医学前沿杂志(电子版), 2010, 2(4): 16-19. DOI: 10.3969/j.issn.1674-7372.2010.04.012.Guidelines for the diagnosis and treatment of acute ischemic stroke in the cerebrovascular group of the Chinese Medical Association Neurology Branch. Guide to Diagnosis and Treatment of Acute Ischemic Stroke in China 2010[J]. Chinese Journal of The Frontiers of Medical Science(Electronic Version), 2010, 2(4): 16-19. DOI: 10.3969/j.issn.1674-7372.2010.04.012. [2] 冯清春, 黄达, 胡少敏, 等. 高龄急性缺血性脑卒中患者认知功能障碍影响因素的Logistic回归分析[J]. 中华疾病控制杂志, 2017, 21(8): 822-826. DOI: 10.16462/j.cnki.zhjbkz.2017.08.017.Feng QC, Huang D, Hu SM, et al. Logistic regression analysis on the influencing factors of cognitive dysfunction in elderly patients with acute ischemic stroke[J]. Chin J Dis Control Prev, 2017, 21(8): 822-826. DOI: 10.16462/j.cnki.zhjbkz.2017.08.017. [3] 高丹丹. 急性脑梗死出血性转化相关危险因素研究[D]. 大连: 大连医科大学, 2011.Gao DD. Risk Factors related to Hemorrhagoc Transformation's patients with Acute Cerebral Infarctions[D]. Dalian: Dalian Medical University, 2011. [4] Berger C, Fiorelli M, Steiner T, et al. Hemorrhagic Transformation of Ischemic Brain Tissue Asymptomatic or Symptomatic[J]. Stroke, 2001, 32(6): 1330. DOI: 10.1161/01.STR.32.6.1330. [5] 吕远飞. 决策树在MDS和AA鉴别诊断中的应用[D]. 唐山: 华北理工大学, 2016.Lv YF. The Application of decision tree model in MDS and AA different diagnosis[D]. Tangshan: North China University of Science and Technolog, 2016. [6] 何跃, 邓唯茹, 刘司寰. 基于组合决策树的急诊等待时间预测[J]. 统计与决策, 2016, (6): 72-74. DOI: 10.13546/j.cnki.tjyjc.2016.06.019.He Y, Deng WR, Liu SH. Emergency waiting time prediction based on combined decision tree[J]. Statistics & Decision, 2016, (6): 72-74. DOI: 10.13546/j.cnki.tjyjc.2016.06.019. [7] 林莹, 梁宇光, 朱瑾. 分类决策树辅助盆腔超声诊断女童真性性早熟的方法研究[J]. 中华医学超声杂志: 电子版, 2016, 13(4): 316-318. DOI: 10.3877/cma.j.issn.1672-6448.2016.04.017.Lin Y, Liang YG, Zhu J. Study on the method of classification decision tree assisted pelvic ultrasound in the diagnosis of girls' true precocious puberty[J]. Chin J Med Ultrasound(Electronic Edition), 2016, 13(4): 316-318. DOI: 10.3877/cma.j.issn.1672-6448.2016.04.017. [8] 郭晓慧. 基于PROs和决策树方法构建慢性肾脏病证候诊断工具的研究[D]. 广州: 广州中医药大学, 2016.Guo XH. Developing a diagnostic tool for medicine syndrome in patients with chronic kidney disease based on PROs and decision tree method[D]. Guangzhou: Guangzhou University of Chinese Medicine, 2016. [9] 江明尹, 刘胜林, 程菊, 等. 基于决策树的医疗器械行业发展分析应用研究[J]. 医疗卫生装备, 2016, 37(3): 23-26. DOI: 10.7687/J.ISSN1003-8868.2016.03.023.Jiang MY, Liu SL, Cheng J, et al. Medical device industry development analysis based on decision tree[J]. Chinese Medical Equipment Journal, 2016, 37(3): 23-26. DOI: 10.7687/J.ISSN1003-8868.2016.03.023. [10] 林梅玉, 黄显. 3种抗菌药物治疗HAP决策树分析[J]. 海峡药学, 2016, 28(3): 94-97. DOI: 1006-3765(2016)-03-0985-0094-04.Lin MY, Huang X. Decision tree analysis of three kinds of antimicrobial agents for the treatment of HAP[J]. Strait Pharmaceutical Journal, 2016, 28(3): 94-97. DOI: 1006-3765(2016)-03-0985-0094-04. [11] 张琪, 周琳, 陈亮, 等. 决策树模型用于结核病治疗方案的分类和预判[J]. 中华疾病控制杂志, 2015, 19(5): 510-513. DOI: 10.16462/j.cnki.zhjbkz.2015.05.022.Zhang Q, Zhou L, Chen L, et al. A decision tree model for classification and prediction of tuberculosis treatment[J]. Chin J Dis Control Prev, 2015, 19(05): 510-513. DOI: 10.16462/j.cnki.zhjbkz.2015.05.022. [12] 吴疆, 肖红著, 夏丽娅, 等. 基于三期决策树分析平台建立护理质量综合评价体系[J]. 护理研究, 2016, 30(3): 798-803. DOI: 10.3969/j.issn.1009-6493.2016.07.009.Wu J, Xiao HZ, Xia LY, et al. Establishment of comprehensive evalution system of nursing quality based on three stage decision tree analysis platform[J]. Chinese Nursing Research, 2016, 30(3): 798-803. DOI: 10.3969/j.issn.1009-6493.2016.07.009. [13] 姚璐, 闫剑群, 刘建, 等. 决策树技术在留学生生理学成绩分析中的应用[J]. 西北医学教育, 2015, 23(6): 1041-1043. DOI: 10.13555/j.cnki.c.m.e.2015.06.045.Yao L, Yan JQ, Liu J, et al. Application of decision tree technology in analysis of physiological results of foreign students[J]. Northwest medical education, 2015, 23(6): 1041-1043. DOI: 10.13555/j.cnki.c.m.e.2015.06.045. [14] Chung Y, Moon Y, Yoon T. Analysis of two different sequences of old arena viruses by decision tree, apriori algorithm and shannon entropy[A]. SCIence and Engineering Institute (SCIEI). Proceedings of SCIEI 2015 Singapore Conference[C]. SCIence and Engineering Institute (SCIEI), 2015: 8. [15] Kim H, Yoo J, Yoon T. An analysis of the genomes of dengue virus using decision tree and apriori algorithm[A]. SCIence and Engineering Institute (SCIEI). Proceedings of SCIEI 2015 Singapore Conference[C]. SCIence and Engineering Institute (SCIEI), 2015: 8.