• 中国精品科技期刊
  • 《中文核心期刊要目总览》收录期刊
  • RCCSE 中国核心期刊(5/114,A+)
  • Scopus收录期刊
  • 美国《化学文摘》(CA)收录期刊
  • WHO 西太平洋地区医学索引(WPRIM)收录期刊
  • 《中国科学引文数据库(CSCD)》核心库期刊 (C)
  • 中国科技核心期刊
  • 中国科技论文统计源期刊
  • 《日本科学技术振兴机构数据库(中国)》(JSTChina)收录期刊
  • 美国《乌利希期刊指南》(UIrichsweb)收录期刊
  • 中华预防医学会系列杂志优秀期刊(2019年)

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

重采样分类模型在我国中老年糖尿病血糖控制中的预测研究

王萍 张乐 洪小瑞 朱素玲 赵学靖

王萍, 张乐, 洪小瑞, 朱素玲, 赵学靖. 重采样分类模型在我国中老年糖尿病血糖控制中的预测研究[J]. 中华疾病控制杂志, 2024, 28(9): 1005-1009. doi: 10.16462/j.cnki.zhjbkz.2024.09.003
引用本文: 王萍, 张乐, 洪小瑞, 朱素玲, 赵学靖. 重采样分类模型在我国中老年糖尿病血糖控制中的预测研究[J]. 中华疾病控制杂志, 2024, 28(9): 1005-1009. doi: 10.16462/j.cnki.zhjbkz.2024.09.003
WANG Ping, ZHANG Le, HONG Xiaorui, ZHU Suling, ZHAO Xuejing. Resampling classification model for predicting blood glucose control in middle-aged and elderly diabetic patients in China[J]. CHINESE JOURNAL OF DISEASE CONTROL & PREVENTION, 2024, 28(9): 1005-1009. doi: 10.16462/j.cnki.zhjbkz.2024.09.003
Citation: WANG Ping, ZHANG Le, HONG Xiaorui, ZHU Suling, ZHAO Xuejing. Resampling classification model for predicting blood glucose control in middle-aged and elderly diabetic patients in China[J]. CHINESE JOURNAL OF DISEASE CONTROL & PREVENTION, 2024, 28(9): 1005-1009. doi: 10.16462/j.cnki.zhjbkz.2024.09.003

重采样分类模型在我国中老年糖尿病血糖控制中的预测研究

doi: 10.16462/j.cnki.zhjbkz.2024.09.003
王萍和张乐为共同第一作者
基金项目: 

国家自然科学基金 11971214

详细信息
    通讯作者:

    朱素玲,E-mail:zhusl@lzu.edu.cn

  • 中图分类号: R587.1

Resampling classification model for predicting blood glucose control in middle-aged and elderly diabetic patients in China

WANG Ping and ZHANG Le contributed equally to this article
Funds: 

National Natural Science Foundation of China 11971214

More Information
  • 摘要:   目的  利用重采样算法提高糖尿病患者血糖控制分类模型的预测性能。  方法  对中国健康与养老追踪调查(China health and retirement longitudinal study, CHARLS)数据库中糖尿病患者血糖控制不平衡数据进行重采样,比较重采样前后logistic回归(logistic regression, LR)、支持向量机(support vector machines, SVM)和随机森林(random forest, RF)的分类性能,利用分层五折交叉验证和受试者工作特征(receiver operating characteristic, ROC)曲线下面积(area under curve, AUC)确定模型的最优参数,以准确率、灵敏度、特异度、精确率、几何均值(geometric mean, G-mean)、F1分数和AUC为评价指标,比较重采样前后分类模型的性能。  结果  几种重采样算法均可提高3种分类模型的灵敏度、G-mean和F1分数;重采样算法过采样(adaptive synthetic sampling, ADASYN)、组合采样[合成少数类过采样技术和编辑最近邻(synthetic minority over-sampling technique and edited nearest neighbors, SMOTE-ENN);合成少数类过采样技术和Tomek链接(synthetic minority over-sampling technique tomek, SMOTE-Tomek)]对3种分类模型的AUC值均有不同程度的提高,其中ADASYN使LR分类模型的AUC值提高2.13%,SMOTE-ENN使LR分类模型的AUC值提高3.05%,SMOTE-Tomek使RF分类模型的AUC值提高2.13%。  结论  ADASYN、SMOTE-ENN、SMOTE-Tomek能较好地处理糖尿病患者血糖控制不平衡数据的问题,提高糖尿病患者血糖控制分类模型的预测性能
  • 表  1  重采样分类模型性能比较

    Table  1.   Performance comparison of resampling classification models

    重采样算法
    Resampling algorithm
    分类模型
    Classification model
    准确率
    Accuracy
    灵敏度
    Sensitivity
    特异度
    Specificity
    精确率
    Precision
    几何均值
    Geometric mean
    F1分数
    score
    AUC值value
    (95% CI)
    不平衡数据Imbalanced data
    LR 83.67 12.50 97.56 50.00 0.349 0.200 0.692(0.547~0.898)
    SVM 83.67 0 100.00 0 0 0.692(0.056~0.866)
    RF 83.67 0 100.00 0 0 0.680(0.494~0.838)
    RUS LR 65.31 25.00 73.17 15.83 0.428 0.191 0.671(0.568~0.908)
    SVM 59.18 87.50 53.66 26.92 0.685 0.412 0.689(0.513~0.878)
    RF 59.18 75.00 56.10 25.00 0.649 0.375 0.668(0.264~0.800)
    SMOTE0.5 LR 69.39 62.50 70.73 29.41 0.665 0.400 0.732(0.534~0.863)
    SVM 81.63 37.50 90.24 42.86 0.582 0.143 0.729(0.405~0.735)
    RF 75.51 12.50 87.80 16.67 0.331 0.716 0.701(0.460~0.808)
    SMOTE0.7 LR 79.59 75.00 80.49 42.86 0.777 0.546 0.710(0.501~0.829)
    SVM 69.39 37.50 75.61 23.08 0.533 0.286 0.717(0.516~0.813)
    RF 61.22 75.00 58.54 26.09 0.663 0.387 0.686(0.453~0.804)
    SMOTE1 LR 55.10 87.50 48.78 25.00 0.653 0.389 0.695(0.497~0.826)
    SVM 67.34 50.00 70.73 25.00 0.595 0.333 0.698(0.279~0.721)
    RF 59.18 75.00 56.10 25.00 0.649 0.375 0.680(0.427~0.817)
    ADASYN LR 71.43 75.00 70.73 33.33 0.728 0.462 0.713(0.519~0.859)
    SVM 67.35 87.50 63.41 31.82 0.745 0.467 0.717(0.454~0.786)
    RF 61.22 75.00 58.54 26.09 0.663 0.387 0.707(0.431~0.807)
    SMOTE-ENN LR 66.75 74.14 66.47 7.78 0.702 0.141 0.737(0.667~0.804)
    SVM 69.16 72.41 69.00 8.19 0.707 0.147 0.746(0.679~0.813)
    RF 58.83 77.59 58.12 6.60 0.672 0.122 0.703(0.644~0.763)
    SMOTE-Tomek LR 71.12 67.24 71.27 8.19 0.692 0.146 0.752(0.683~0.821)
    SVM 77.07 65.52 77.51 10.00 0.713 0.174 0.748(0.725~0.769)
    RF 69.85 68.96 69.88 8.03 0.694 0.144 0.748(0.682~0.814)
    注:RUS,随机欠采样;SMOTE,合成少数类过采样技术;ADASYN,自适应合成采样方法;SMOTE-ENN,合成少数类过采样技术和编辑最近邻;SMOTE-Tomek,合成少数类过采样技术和Tomek链接;LR,逻辑回归;SVM,支持向量机;RF,随机森林。
    ①以百分数/%表示;②“—”表示无法获取。
    Note: RUS, random under-sampling; SMOTE, synthetic minority over-sampling technique; ADASYN, adaptive synthetic sampling; SMOTE-ENN, synthetic minority over-sampling technique and edited nearest neighbors; SMOTE-Tomek, synthetic minority over-sampling technique and tomek links; LR, logistic regression; SVM, support vector machines; RF, random forest.
    ① Percentage/%; ② "—" indicates that it cannot be obtained.
    下载: 导出CSV
  • [1] Chawla NV, Japkowicz N, Kolcx AR, et al. Session details: special issue on learning from imbalanced datasets[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 1-6. DOI: 10.1145/3262579.
    [2] Cateni S, Colla V, Vannucci M. A method for resampling imbalanced datasets in binary classification tasks for real-world problems[J]. Neurocomputing, 2014, 135(5): 32-41. DOI: 10.1016/j.neucom.2013.05.059.
    [3] 武海滨, 李康, 杨丽, 等. 非平衡分类技术在人群糖尿病疾病风险预测模型中的应用[J]. 中国卫生统计, 2019, 36(4): 502-506.

    Wu HB, Li K, Yang L, et al. Application of imbalance classification techniques in population disease diabetes risk prediction model[J]. Chinese Journal of Health Statistics, 2019, 36(4): 502-506.
    [4] 方德刚, 郑桃林, 杨柳, 等. 长沙地区老年2型糖尿病患者血糖控制情况及其影响因素[J]. 中国卫生工程学, 2021, 20(5): 766-767. DOI: 10.19937/j.issn.1671-4199.2021.05.021.

    Fang DG, Zheng TL, Yang L, et al. Blood glucose control and its influencing factors in elderly patients with type 2 diabetes in Changsha[J]. Chinese Journal of Public Health Engineering, 2021, 20(5): 766-767. DOI: 10.19937/j.issn.1671-4199.2021.05.021.
    [5] Lozovey NR, Lamback EB, Mota RB, et al. Glycemic control rate in type 2 diabetes mellitus patients at a public referral hospital in Rio de Janeiro, Brazil: demographic and clinical factors[J]. J Endocrinol Metab, 2017, 7(2): 61-67. DOI: 10.14740/jem390w.
    [6] 周小琦, 李芳, 刘新会, 等. 不同性别老年糖尿病患者血糖控制情况及影响因素分析[J]. 公共卫生与预防医学, 2022, 33(6): 80-85. DOI: 10.3969/j.issn.1006-2483.2022.06.019.

    Zhou XQ, Li F, Liu XH, et al. Glycemic control and influencing factors among male and female elderly diabetic patients[J]. J Pub Heal Prev Med, 2022, 33(6): 80-85. DOI: 10.3969/j.issn.1006-2483.2022.06.019.
    [7] 李巧娥, 胡晓斌, 车鑫垚, 等. 甘肃省15岁及以上糖尿病患者血糖管理状况及影响因素分析[J]. 公共卫生与预防医学, 2022, 33(3): 63-67. DOI: 10.3969/j.issn.1006-2483.2022.03.014.

    Li QE, Hu XB, Che XY, et al. Current situation and influencing factors of blood glucose management in diabetic patients aged 15 and above in Gansu[J]. J Pub Heal Prev Med, 2022, 33(3): 63-67. DOI: 10.3969/j.issn.1006-2483.2022.03.014.
    [8] 张乐, 王如意, 杨慧, 等. 重采样技术在中老年居民糖尿病不平衡数据分类中的应用[J]. 现代预防医学, 2023, 50(7): 1339-1344. DOI: 10.20043/j.cnki.MPM.202210439.

    Zhang L, Wang RY, Yang H, et al. Application of resampling technique in the classification of imbalanced diabetes data in middle-aged and elderly residents[J]. Modern Preventive Medicine, 2023, 50(7): 1339-1344. DOI: 10.20043/j.cnki.MPM.202210439.
    [9] Manal A, Mouaz A M, Steven K, et al. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: the Henry Ford exercIse testing (FIT) project[J]. PLoS ONE, 2017, 12(7): e0179805. DOI: 10.1371/journal.pone.0179805.
    [10] 周玉, 孙红玉, 房倩, 等. 不平衡数据集分类方法研究综述[J]. 计算机应用研究, 2022, 39(6): 1615-1621. DOI: 10.19734/j.issn.1001-3695.2021.10.0590.

    Zhou Y, Sun HY, Fang Q, et al. Review of imbalanced data classification methods[J]. Application Researchof Computers, 2022, 39(6): 1615-1621. DOI: 10.19734/j.issn.1001-3695.2021.10.0590.
  • 加载中
表(1)
计量
  • 文章访问数:  80
  • HTML全文浏览量:  18
  • PDF下载量:  154
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-11-07
  • 修回日期:  2024-05-15
  • 网络出版日期:  2024-10-24
  • 刊出日期:  2024-09-10

目录

    /

    返回文章
    返回