• 中国精品科技期刊
  • 《中文核心期刊要目总览》收录期刊
  • RCCSE 中国核心期刊(5/114,A+)
  • Scopus收录期刊
  • 美国《化学文摘》(CA)收录期刊
  • WHO 西太平洋地区医学索引(WPRIM)收录期刊
  • 《中国科学引文数据库(CSCD)》核心库期刊 (C)
  • 中国科技核心期刊
  • 中国科技论文统计源期刊
  • 《日本科学技术振兴机构数据库(中国)》(JSTChina)收录期刊
  • 美国《乌利希期刊指南》(UIrichsweb)收录期刊
  • 中华预防医学会系列杂志优秀期刊(2019年)

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于机器学习算法的小于胎龄儿风险预测

张瑞敏 王科科 李金波 陈转转 杨海澜 邬惟为 冯永亮 王素萍 张新日

张瑞敏, 王科科, 李金波, 陈转转, 杨海澜, 邬惟为, 冯永亮, 王素萍, 张新日. 基于机器学习算法的小于胎龄儿风险预测[J]. 中华疾病控制杂志, 2023, 27(8): 922-927. doi: 10.16462/j.cnki.zhjbkz.2023.08.009
引用本文: 张瑞敏, 王科科, 李金波, 陈转转, 杨海澜, 邬惟为, 冯永亮, 王素萍, 张新日. 基于机器学习算法的小于胎龄儿风险预测[J]. 中华疾病控制杂志, 2023, 27(8): 922-927. doi: 10.16462/j.cnki.zhjbkz.2023.08.009
ZHANG Ruimin, WANG Keke, LI Jinbo, CHEN Zhuanzhuan, YANG Hailan, WU Weiwei, FENG Yongliang, WANG Suping, ZHANG Xinri. Risk prediction of small for gestational age birth based on machine learning algorithms[J]. CHINESE JOURNAL OF DISEASE CONTROL & PREVENTION, 2023, 27(8): 922-927. doi: 10.16462/j.cnki.zhjbkz.2023.08.009
Citation: ZHANG Ruimin, WANG Keke, LI Jinbo, CHEN Zhuanzhuan, YANG Hailan, WU Weiwei, FENG Yongliang, WANG Suping, ZHANG Xinri. Risk prediction of small for gestational age birth based on machine learning algorithms[J]. CHINESE JOURNAL OF DISEASE CONTROL & PREVENTION, 2023, 27(8): 922-927. doi: 10.16462/j.cnki.zhjbkz.2023.08.009

基于机器学习算法的小于胎龄儿风险预测

doi: 10.16462/j.cnki.zhjbkz.2023.08.009
基金项目: 

山西省基础研究计划青年科学研究项目 20210302124581

详细信息
    通讯作者:

    张新日,E-mail: ykdzxr61@163.com

    WANG Suping, E-mail: supingwang@sxmu.edu.cn

  • 中图分类号: R173

Risk prediction of small for gestational age birth based on machine learning algorithms

Funds: 

Youth Scientific Research Project of Fundamental Research Program in Shanxi Province 20210302124581

More Information
  • 摘要:   目的  评价极端梯度提升(extreme gradient boosting, XGBoost)、支持向量机(support vector machine, SVM)和朴素贝叶斯等6种机器学习模型与传统logistic回归分析模型对小于胎龄儿(small for gestational age, SGA)的预测效能。  方法  选取2012年3月―2016年9月在山西医科大学第一医院产科住院分娩的9 972例孕妇作为研究对象,采用问卷调查及从医院信息系统收集数据。依据分娩结局分为SGA组(n=1 124)与非SGA组(n=8 848),按7.50∶2.50比例划分训练集与测试集。采用多因素logistic回归模型筛选危险因素,基于XGBoost、SVM、朴素贝叶斯、梯度提升决策树(gradient boosting decision tree, GBDT)、K最近邻(k-nearest neighbor, KNN)算法及传统logistic回归分析模型方法分别建立预测模型,使用受试者工作特征曲线的曲线下面积(area under the curve, AUC)、准确率和精确度等指标比较预测性能。  结果  Logistic回归模型结果显示,妊娠期高血压和子痫等7项变量是SGA的影响因素。将以上因素纳入预测模型,SVM算法构建的预测模型效能最佳,AUC达0.72,模型准确率为71%。传统logistic回归分析模型表现欠佳,AUC为0.71,准确率为66%。  结论  基于机器学习算法尤其是SVM算法建立的SGA风险预测模型具有较好的效能,能够有效预测山西省SGA的发生,为实现SGA的一级预防提供参考。
  • 图  1  SGA风险预测模型ROC曲线图

    1. SGA: 小于胎龄儿;2. ROC:受试者工作特征;3. AUC: 受试者工作特征曲线的曲线下面积;4. XGBoost:极端梯度提升;5. SVM:支持向量机;6. GBDT:梯度提升决策树;7. KNN:K最近邻;8. Naive Bayes:朴素贝叶斯。

    Figure  1.  ROC curve of risk prediction model of SGA

    1. SGA: small for gestational age; 2. ROC: receiver operating characteristic; 3. AUC, area under the receiver operating characteristic curve; 4. XGBoostC: extreme gradient boosting; 5. SVM: support vector machine; 6. GBDT: gradient boosting decision tree; 7. KNN: k-nearest neighbor; 8. Naive Bayes: Naive Bayes.

    表  1  SGA的一般情况及影响因素分析

    Table  1.   Analysis of general characteristics and influencing factors of SGA

    变量Variable 合计[人数(占比/%)]
    Total[Number of people (proportion/%)](n=9 972)
    SGA[人数(占比/%)]
    [Number of people (proportion/%)](n=1 124)
    非SGA[人数(占比/%)]
    Non-SGA[Number of people (proportion/%)](n=8 848)
    χ2
    value
    P
    value
    年龄组/岁Age group/years 33.915 <0.001
      <25 1 085(10.88) 177(15.75) 908(10.26)
      25~<30 4 468(44.81) 478(42.53) 3 990(45.10)
      30~<35 3 003(30.11) 305(27.13) 2 698(30.49)
      ≥35 1 416(14.20) 164(14.59) 1 252(14.15)
    民族Nationality 2.664 0.103
      汉族Han nationality 9 897(99.25) 1 120(99.64) 8 777(99.20)
      少数民族Minority nationality 75(0.75) 4(0.36) 71(0.80)
    文化程度Education level 141.997 <0.001
      初中及以下Junior high school and below 1 725(17.30) 321(28.56) 1 404(15.87)
      高中High school 1 199(12.02) 165(14.68) 1 034(11.69)
      大专及以上Junior college and above 7 048(70.68) 638(56.76) 6 410(72.44)
    家庭人均月收入/元Household per capita monthly income/yuan 87.580 <0.001
      <2 000 1 229(12.33) 224(19.93) 1 005(11.36)
      2 000~<4 000 6 097(61.14) 692(61.57) 5 405(61.09)
      ≥4 000 2 646(26.53) 208(18.50) 2 438(27.55)
    孕前BMI/(kg·m-2) Pre-pregnancy BMI /(kg·m-2) 3.014 0.389
      消瘦Emaciated 1 200(12.03) 146(12.99) 1 054(11.91)
      正常Normal 6 691(67.10) 733(65.21) 5 958(67.34)
      超重Overweight 1 521(15.25) 185(16.46) 1 336(15.10)
      肥胖Obese 560(5.62) 60(5.34) 500(5.65)
    妊娠高血压综合征家族史Family history of gestational hypertensive syndrome 2.466 0.116
      有Yes 84(0.84) 14(1.25) 70(0.79)
      无No 9 888(99.16) 1 110(98.75) 8 778(99.21)
    产检次数/次Number of prenatal examinations/time 88.808 <0.001
      <7 2 641(26.48) 429(38.17) 2 212(25.00)
      ≥7 7 331(73.52) 695(61.83) 6 636(75.00)
    接受羊水穿刺检查Accepted amniocentesis 0.438 0.508
      是Yes 57(0.57) 8(0.71) 49(0.55)
      否No 9 915(99.43) 1 116(99.29) 8 799(99.45)
    孕期吸烟Smoking during pregnancy 0.350 0.554
      是Yes 12(0.12) 2(0.18) 10(0.11)
      否No 9 960(99.88) 1 122(99.82) 8838(99.89)
    自然妊娠Natural pregnancy 0.376 0.540
      是Yes 8 146(81.72) 926(82.38) 7 223(81.63)
      否No 1 823(18.28) 198(17.62) 1 625(18.37)
    妊娠期高血压Gestational hypertension 434.368 <0.001
      是Yes 777(7.79) 264(23.49) 513(5.80)
      否No 9 195(92.21) 860(76.51) 8 335(94.20)
    子痫Eclampsia 75.359 <0.001
      是Yes 82(0.82) 34(3.02) 48(0.54)
      否No 9 890(99.18) 1 090(96.98) 8 800(99.46)
    妊娠期胆汁淤积Intrahepatic cholestasis of pregnancy 6.050 0.014
      是Yes 32(0.32) 8(0.71) 24(0.27)
      否No 9 940(99.68) 1 116(99.29) 8 824(99.73)
    被动吸烟Passive smoking 7.249 0.007
      是Yes 1 006(10.09) 139(12.37) 867(9.80)
      否No 8 966(89.91) 985(87.63) 7 981(90.20)
    孕早期发热Fever in the first trimester 9.140 0.003
      是Yes 180(1.81) 33(2.94) 147(1.66)
      否No 9 792(98.19) 1 091(97.06) 8 701(98.34)
    孕期消化不良Indigestion during pregnancy 0.526 0.468
      是Yes 11(0.11) 2(0.18) 9(0.10)
      否No 9 961(99.89) 1 122(99.82) 8 839(99.90)
    注:SGA, 小于胎龄儿。
    Note: SGA, small for gestational age.
    下载: 导出CSV

    表  2  SGA发生的多因素logistic回归分析模型

    Table  2.   Results of multivariate logistic regression of SGA

    变量Variable β
    value
    Wald
    value
    OR值(95% CI)
    OR value (95% CI)
    P
    value
    年龄组/岁Age group/years
      <25 1.000
      25~<30 -0.161 2.407 0.851(0.694~1.043) 0.236
      30~<35 -0.205 3.442 0.815(0.656~1.012) 0.064
      ≥35 -0.223 3.288 0.800(0.629~1.018) 0.070
    文化程度Education level
      初中及以下Junior high school and below 1.000
      高中High school -0.245 5.010 0.783(0.631~0.970) 0.025
      大专及以上Junior college and above -0.426 22.764 0.653(0.548~0.778) <0.001
    家庭人均月收入/元Household per capita monthly income /yuan
      <2 000 1.000
      2 000~<4 000 -0.168 3.159 0.846(0.703~1.017) 0.076
      ≥4 000 -0.403 11.823 0.668(0.531~0.841) 0.001
    产检次数/次Number of prenatal examinations /time
      <7 1.000
      ≥7 -0.261 12.722 0.771(0.668~0.889) <0.001
    妊娠期高血压Gestational hypertension
      否No 1.000
      是Yes 1.348 228.299 3.849(3.232~4.584) <0.001
    子痫Eclampsia
      否No 1.000
      是Yes 0.908 13.540 2.479(1.528~4.020) <0.001
    妊娠期胆汁淤积Intrahepatic cholestasis of pregnancy
      否No 1.000
      是Yes 0.944 4.909 2.570(1.115~5.922) 0.027
    被动吸烟Passive smoking
      否No 1.000
      是Yes -0.069 0.433 0.933(0.760~1.147) 0.511
    孕早期发热Fever in the first trimester
      否No 1.000
      是Yes 0.536 6.871 1.710(1.145~2.553) 0.009
    注:SGA, 小于胎龄儿。
    Note: SGA, small for gestational age.
    下载: 导出CSV

    表  3  机器学习模型的预测效能指标

    Table  3.   Predictive performance indicators of the machine learning prediction model

    模型
    Model
    AUC 精确度
    Accuracy
    准确率
    Precision
    灵敏度
    Sensitivity
    f1分数
    f1-score
    SVM 0.72 0.67 0.71 0.40 0.52
    GBDT 0.70 0.67 0.65 0.56 0.60
    XGBoost 0.70 0.67 0.66 0.57 0.61
    朴素贝叶斯
    Naive Bayes
    0.69 0.66 0.76 0.36 0.49
    KNN 0.62 0.60 0.58 0.48 0.53
    logistic 0.71 0.64 0.66 0.34 0.44
    注:SVM, 支持向量机;GBDT, 梯度提升决策树;XGBoost, 极端梯度提升;KNN, K最近邻;AUC, 受试者工作特征曲线的曲线下面积。
    Note: SVM, support vector machine; GBDT, gradient boosting decision tree; XGBoost, extreme gradient boosting; KNN, k-nearest neighbor; AUC, area under the receiver operating characteristic curve.
    下载: 导出CSV
  • [1] Physical status: the use and interpretation of anthropometry. Report of a WHO Expert Committee[J]. World Health Organ Tech Rep Ser, 1995, 854: 1-452.
    [2] Lee AC, Katz J, Blencowe H, et al. National and regional estimates of term and preterm babies born small for gestational age in 138 low-income and middle-income countries in 2010[J]. Lancet Glob Health, 2013, 1(1): e26-e36. DOI: 10.1016/S2214-109X(13)70006-8.
    [3] 沈忠周, 王雅文, 马帅, 等. 新生儿早产、低出生体重及小于胎龄的危险因素[J]. 中华流行病学杂志, 2019, 40(9): 1125-1129. DOI: 10.3760/cma.j.issn.0254-6450.2019.09.020.

    Shen ZZ, Wang YW, Ma S, et al. Risk factors for preterm birth, low birth weight and small for gestational age: a prospective cohort study[J]. Chin J Epidemiol, 2019, 40(9): 1125-1129. DOI: 10.3760/cma.j.issn.0254-6450.2019.09.020.
    [4] von Beckerath AK, Kollmann M, Rotky-Fast C, et al. Perinatal complications and long-term neurodevelopmental outcome of infants with intrauterine growth restriction[J]. Am J Obstet Gynecol, 2013, 208(2): 130. e1-130. e6. DOI: 10.1016/j.ajog.2012.11.014.
    [5] Eves R, Mendonça M, Bartmann P, et al. Small for gestational age-cognitive performance from infancy to adulthood: an observational study[J]. BJOG, 2020, 127(13): 1598-1606. DOI: 10.1111/1471-0528.16341.
    [6] Lindqvist PG, Molin J. Does antenatal identification of small-for-gestational age fetuses significantly improve their outcome?[J]. Ultrasound Obstet Gynecol, 2005, 25(3): 258-264. DOI: 10.1002/uog.1806.
    [7] D'Ascenzo F, De Filippo O, Gallone G, et al. Machine learning-based prediction of adverse events following an acute coronary syndrome (PRAISE): a modelling study of pooled datasets[J]. Lancet, 2021, 397(10270): 199-207. DOI: 10.1016/S0140-6736(20)32519-8.
    [8] 欧阳平, 李小溪, 冷芬, 等. 机器学习算法在体检人群糖尿病风险预测中的应用[J]. 中华疾病控制杂志, 2021, 25(7): 849-853, 868. DOI: 10.16462/j.cnki.zhjbkz.2021.07.020.

    Ouyang P, Li XX, Leng F, et al. Application of machine learning algorithm in diabetes risk prediction of physical examination population[J]. Chin J Dis Control Prev, 2021, 25(7): 849-853, 868. DOI: 10.16462/j.cnki.zhjbkz.2021.07.020.
    [9] 朱丽, 张蓉, 张淑莲, 等. 中国不同胎龄新生儿出生体重曲线研制[J]. 中华儿科杂志, 2015, 53(2): 97-103. DOI: 10.3760/cma.j.issn.0578-1310.2015.02.007.

    Zhu L, Zhang R, Zhang SL, et al. Chinese neonatal birth weight curve for different gestational age[J]. Chin J Pediatr, 2015, 53(2): 97-103. DOI: 10.3760/cma.j.issn.0578-1310.2015.02.007.
    [10] 中国肥胖问题工作组数据汇总分析协作组. 我国成人体重指数和腰围对相关疾病危险因素异常的预测价值: 适宜体重指数和腰围切点的研究[J]. 中华流行病学杂志, 2002, 23(1): 5-10. DOI: 10.3760/j.issn:0254-6450.2002.01.003.

    Coorperative Meta-analysis Group of China Obesity Task Force. Predictive values of body mass index and waist circumference to risk factors of related diseases in Chinese adult population[J]. Chin J Epidemiol, 2002, 23(1): 5-10. DOI: 10.3760/j.issn:0254-6450.2002.01.003.
    [11] Li Y, Guo H, Xiao L, et al. Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data[J]. Knowledge-Based Systems, 2016, 94: 88-104. DOI: 10.1016/j.knosys.2015.11.013.
    [12] Royal College of Obstetricians and Gynaecologists. The investigation and management of the small for gestational age fetus. Green-top Guideline No. 31[EB/OL]. (2013-03-22)[2023-02-16]. https://www.rcog.org.uk/en/guidelines-research-services/guidelines/gtg31/.
    [13] Papastefanou I, Wright D, Lolos M, et al. Competing-risks model for prediction of small-for-gestational-age neonate from maternal characteristics, serum pregnancy-associated plasma protein-A and placental growth factor at 11-13 weeks' gestation[J]. Ultrasound Obstet Gynecol, 2021, 57(3): 392-400. DOI: 10.1002/uog.23118.
    [14] Gürgen F, Zengin Z, Varol F. Intrauterine growth restriction (IUGR) risk decision based on support vector machines[J]. Expert Syst Appl, 2012, 39(3): 2872-2876. DOI: 10.1016/j.eswa.2011.08.147.
    [15] Gardosi J, Madurasinghe V, Williams M, et al. Maternal and fetal risk factors for stillbirth: population based study[J]. BMJ, 2013, 346: f108. DOI: 10.1136/bmj.f108.
    [16] Gurung S, Tong HH, Bryce E, et al. A systematic review on estimating population attributable fraction for risk factors for small-for-gestational-age births in 81 low-and middle-income countries[J]. J Glob Health, 2022, 12: 04024. DOI: 10.7189/jogh.12.04024.
    [17] Parihar S, Singh S. Perinatal outcomes and intrahepatic cholestasis of pregnancy: a prospective study[J]. Int J Reprod Contracept Obstet Gynecol, 2019, 8(3): 1177-1182. DOI: 10.18203/2320-1770.ijrcog20190901.
    [18] Natarajan V, Singh P, Vigneshwar NKV, et al. Maternal and placental risk factors for small gestational age and fetal malnutrition[J]. Curr Pediatr Rev, 2023, 19(2): 187-196. DOI: 10.2174/1573396318666220705154424.
    [19] Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review[J]. J Biomed Inform, 2002, 35(5-6): 352-359. DOI: 10.1016/S1532-0464(03)00034-0.
    [20] Vapnik VN, Kotz S. Estimation of dependences based on empirical data[M]. New York: Springer Science & Business Media, 2006: 232-457.
  • 加载中
图(1) / 表(3)
计量
  • 文章访问数:  112
  • HTML全文浏览量:  47
  • PDF下载量:  21
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-12-16
  • 修回日期:  2023-01-26
  • 网络出版日期:  2023-09-02
  • 刊出日期:  2023-08-10

目录

    /

    返回文章
    返回