Advanced Search

CN 34-1304/RISSN 1674-3679

Volume 29 Issue 6
Jun.  2025
Turn off MathJax
Article Contents
CHENG Linjie, YUAN Qing, LI Wansong, LIU Ying, YANG Lei. Construction and validation of a prediction model for nonalcoholic fatty liver disease based on machine learning[J]. CHINESE JOURNAL OF DISEASE CONTROL & PREVENTION, 2025, 29(6): 682-687. doi: 10.16462/j.cnki.zhjbkz.2025.06.009
Citation: CHENG Linjie, YUAN Qing, LI Wansong, LIU Ying, YANG Lei. Construction and validation of a prediction model for nonalcoholic fatty liver disease based on machine learning[J]. CHINESE JOURNAL OF DISEASE CONTROL & PREVENTION, 2025, 29(6): 682-687. doi: 10.16462/j.cnki.zhjbkz.2025.06.009

Construction and validation of a prediction model for nonalcoholic fatty liver disease based on machine learning

doi: 10.16462/j.cnki.zhjbkz.2025.06.009
Funds:

A Cohort Study of Natural Population Health Trends in Hebei Province 226Z7705G

More Information
  • Corresponding author: LIU Ying, E-mail: wayymbb@126.com; YANG Lei, E-mail: yanglei1127@hebmu.edu.cn
  • Received Date: 2025-01-06
  • Rev Recd Date: 2025-04-12
  • Available Online: 2025-07-07
  • Publish Date: 2025-06-10
  •   Objective  This study aimed to construct and validate machine learning (ML) models for predicting nonalcoholic fatty liver disease (NAFLD), screen out the optimal model, and interpret it through the SHapley Additive exPlanations (SHAP) framework.  Methods  The data in the National Health and Nutrition Examination Surve database from January 2017 to March 2020 were randomly divided into a training set and a test set at a ratio of 7∶3. The least absolute shrinkage and selection operator regression was employed for feature selection, and six algorithms were used to construct the prediction models. The models were evaluated using the area under curve (AUC) and interpreted by the calibration curves, the decision curve analysis, variable importance plot, and SHAP plot.  Results  Of the 6 918 participants, 3 974 (57.44%) were diagnosed with NAFLD. The overall performance of eXtreme gradient boosting (XGBoost) model was better than other models, with an AUC of 0.851, an accuracy of 0.757, a sensitivity of 0.760 and a specificity of 0.754 on the test set. The main predictors were body roundness index, waist circumference, triglyceride glucose index, alanine aminotransferase, glycated hemoglobin and high-density lipoprotein cholesterol. In terms of model application, a user interface was developed for use by medical staff.  Conclusions  In this study, six ML models for predicting NAFLD were constructed and validated, among which XGBoost was more advantageous and could provide a reliable reference for early clinical screening of high-risk patients with NAFLD.
  • loading
  • [1]
    Ji WD, Xue MY, Zhang YS, et al. A machine learning based framework to identify and classify non-alcoholic fatty liver disease in a large-scale population [J]. Front Public Health, 2022, 10: 846118. DOI: 10.3389/fpubh.2022.846118.
    [2]
    Perakakis N, Polyzos SA, Yazdani A, et al. Non-invasive diagnosis of non-alcoholic steatohepatitis and fibrosis with the use of omics and supervised learning: a proof of concept study [J]. Metabolism, 2019, 101: 154005. DOI: 10.1016/j.metabol.2019.154005.
    [3]
    Ma XF, Yang C, Liang K, et al. A predictive model for the diagnosis of non-alcoholic fatty liver disease based on an integrated machine learning method [J]. Am J Transl Res, 2021, 13(11): 12704-12713.
    [4]
    Peduzzi P, Concato J, Kemper E, et al. A simulation study of the number of events per variable in logistic regression analysis [J]. J Clin Epidemiol, 1996, 49(12): 1373-1379. DOI: 10.1016/s0895-4356(96)00236-3.
    [5]
    Zhao YP, Li HL. Association of serum vitamin C with liver fibrosis in adults with nonalcoholic fatty liver disease [J]. Scand J Gastroenterol, 2022, 57(7): 872-877. DOI: 10.1080/00365521.2022.2041085.
    [6]
    Zou HX, Zhao FR, Lyu XH, et al. Development and validation of a new nomogram to screen for MAFLD [J]. Lipids Health Dis, 2022, 21(1): 133. DOI: 10.1186/s12944-022-01748-1.
    [7]
    Yuan KC, Tsai LW, Lee KH, et al. The development an artificial intelligence algorithm for early sepsis diagnosis in the intensive care unit [J]. Int J Med Inform, 2020, 141: 104176. DOI: 10.1016/j.ijmedinf.2020.104176.
    [8]
    Yi FL, Yang H, Chen DR, et al. XGBoost-SHAP-based interpretable diagnostic framework for Alzheimer′s disease [J]. BMC Med Inform Decis Mak, 2023, 23(1): 137. DOI: 10.1186/s12911-023-02238-9.
    [9]
    Dong BT, Zhang H, Duan YY, et al. Development of a machine learning-based model to predict prognosis of alpha-fetoprotein-positive hepatocellular carcinoma [J]. J Transl Med, 2024, 22(1): 455. DOI: 10.1186/s12967-024-05203-w.
    [10]
    Zuo D, Yang LX, Jin Y, et al. Machine learning-based models for the prediction of breast cancer recurrence risk [J]. BMC Med Inform Decis Mak, 2023, 23(1): 276. DOI: 10.1186/s12911-023-02377-z.
    [11]
    Ferraioli G, Soares Monteiro LB. Ultrasound-based techniques for the diagnosis of liver steatosis [J]. World J Gastroenterol, 2019, 25(40): 6053-6062. DOI: 10.3748/wjg.v25.i40.6053.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(3)  / Tables(2)

    Article Metrics

    Article views (105) PDF downloads(25) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return