Application of continuous Bayesian networks in the association study between uric acid and chronic metabolic diseases
-
摘要:
目的 基于改进的偏相关(improved partial-correlation-based, IPCB)算法建立连续贝叶斯网络模型,探寻尿酸(uric acid, UA)的代谢性影响因素,并通过与传统的多重线性回归模型分析比较,分析连续贝叶斯网络模型对疾病影响因素的效果和优势。 方法 以2015年山西省慢性病监测的4 846例监测人群数据为研究对象,分别用多重线性回归模型和连续贝叶斯网络模型分析UA与其余9个代谢性疾病的特征指标的相关性,比较两种方法结果的优劣。 结果 经多重线性回归模型逐步分析,三酰甘油(triglyceride, TG)、SBP、DBP、低密度脂蛋白(low density lipoprotein, LDL)、高密度脂蛋白(high density lipoprotein, HDL)共5个变量与UA水平直接相关,解释了UA 9.5%的变异。连续贝叶斯网络模型共包含24条有向边,年龄、TG、LDL、HDL、SBP、DBP与UA水平直接相关,随着年龄、TG、LDL的增加和HDL的降低均会导致UA水平升高,而UA水平升高又导致SBP、DBP升高;总胆固醇(total cholesterol, TC)与UA间接相关。 结论 连续贝叶斯网络模型能发现更多UA的直接影响因素,还可以找到UA的间接影响因素,整体解释度更好。 Abstract:Objective The study aimed to construct a continuous Bayesian networks using improved partial-correlation-based (IPCB) algorithm, investigate the metabolic influencing factors of blood uric acid (UA). Meanwhile, traditional multiple linear regression model was compared to discuss the effect and advantages of Bayesian network in disease risk factor detection. Methods Chronic disease surveillance data involving 4 846 cases in Shanxi Province in 2015 was taken for study. Both continuous Bayesian networks and multiple linear regression models were utilized for correlation analysis between UA and the indexes of nine metabolic diseases. Afterwards, the advantages and disadvantages of the two models were compared. Results Multiple linear regression stepwise analysis showed that triglyceride (TG), systolic blood pressure (SBP), diastolic blood pressure (DBP), low density lipoprotein (LDL) and high density lipoprotein (HDL) were directly correlated with UA, explaining 9.5% of the variation. The continuous Bayesian networks comprised 24 directed edges, and age, TG, LDL, HDL, SBP, and DBP were directly related to UA. As age, TG, LDL increase and HDL decreases, UA levels rise, which in turn leads to an increase in SBP and DBP. Besides, total cholesterol (TC) was indirectly related to UA. Conclusions Continuous Bayesian networks allows for more direct and indirect influencing factors for UA, which features a better overall explanation. -
图 3 连续贝叶斯网络结构
TC: 总胆固醇;TG: 三酰甘油;LDL: 低密度脂蛋白;HDL: 高密度脂蛋白;UA: 尿酸;FPG: 空腹血糖;HbA1c: 糖化血红蛋白。
Figure 3. The structure of continuous Bayesian networks
TC: total cholesterol; TG: triglyceride; LDL: low density lipoprotein; HDL: high density lipoprotein; UA: uric acid; FPG: fasting plasma glucose; HbA1c: glycosylated hemoglobin type A1c.
表 1 Durbin-Watson统计量的计算结果
Table 1. Calculation results of Durbin-Watson statistics
RR R2 调整后R2
Adjusted R2s Durbin-Watson 0.308 0.095 0.094 73.567 2.034 表 2 代谢指标统计描述和正态性检验
Table 2. Statistical description and normality test of metabolic indicators
变量
Variable赋值
Assignmentx±s M 正态性检验P值
Normality test P valueUA/(μmol·L-1) Y 268.29±77.28 257.35 <0.001 年龄/岁Age/years X1 54.07±13.40 55.00 <0.001 TC/(mmol·L-1) X2 4.61±0.92 4.54 <0.001 TG/(mmol·L-1) X3 1.70±1.16 1.40 <0.001 LDL/(mmol·L-1) X4 2.91±0.80 2.84 <0.001 HDL/(mmol·L-1) X5 1.17±0.28 1.13 <0.001 SBP/mmHg X6 138.69±21.58 135.00 <0.001 DBP/mmHg X7 80.82 ±11.54 80.00 <0.001 FPG/(mmol·L-1) X8 5.64±1.52 5.31 <0.001 HbA1c/% X9 5.03±0.91 4.90 <0.001 注:UA, 尿酸; TC, 总胆固醇; TG, 三酰甘油; LDL, 低密度脂蛋白; HDL, 高密度脂蛋白; FPG, 空腹血糖; HbA1c, 糖化血红蛋白。
Note: UA, uric acid; TC, total cholesterol; TG, triglyceride; LDL, low density lipoprotein; HDL, high density lipoprotein; FPG, fasting plasma glucose; HbA1c, glycosylated hemoglobin type A1c.表 3 偏回归系数估计与检验
Table 3. Test of partial regression coefficient
变量
Variable偏回归系数
Partial regression coefficients 标准化偏回归系数
Standardized partial regression coefficientt值
valueP值
value常数 Constant 221.767 9.989 22.201 <0.001 TG/(mmol·L-1) 7.968 1.119 0.119 7.122 <0.001 HDL/(mmol·L-1) -35.051 4.318 -0.128 -8.118 <0.001 LDL/(mmol·L-1) 11.235 1.495 0.117 7.517 <0.001 DBP/mmHg 1.295 0.119 0.193 10.904 <0.001 SBP/mmHg -0.458 0.064 -0.128 -7.192 <0.001 注:1. TC, 三酰甘油;HDL, 高密度脂蛋白;LDL, 低密度脂蛋白。
2. “―”无数据。
Note: 1. TC, total cholesterol; HDL, high density lipoprotein; LDL, low density lipoprotein.
2. “―”No Date.表 4 UA与其直接相关变量的偏相关系数
Table 4. Partial correlation coefficient between blood uric acid and its directly related variables
变量
VariableUA 偏相关系数
Partial correlation coefficientP值
value年龄/岁Age/years 0.026 0.079 TG/(mmol·L-1) 0.083 < 0.001 LDL/(mmol·L-1) 0.028 0.057 HDL/(mmol·L-1) -0.048 0.001 SBP/mmHg 0.084 < 0.001 DBP/mmHg 0.147 < 0.001 注:TG, 三酰甘油; LDL, 低密度脂蛋白; HDL, 高密度脂蛋白; UA, 尿酸。
Note: TG, triglyceride; LDL, low density lipoprotein; HDL, high density lipoprotein; UA, uric acid.表 5 多重线性回归与连续型贝叶斯网络的比较
Table 5. Comparison between multiple linear regression and continuous Bayesian network
条目 Item 多重线性回归分析 Multiple linear regression analysis 连续贝叶斯网络 Continuous Bayesian network 建模方法
Modeling method以逐步回归法建立多重线性回归模型
Establishing multilinear regression model by stepwise law of return基于偏相关的结构学习算法,MDL评分确定节点之间的边及边的方向
Based on partial correlation structural learning algorithm, MDL score determines the edges and their directions between nodes模型复杂度
Model complexity发现了与UA水平直接相关的5个变量有统计学意义,相对简单
Five variables directly related to UA levels were found to be statistically significant and relatively simple与UA水平相关的关系网络共24条边,相对复杂
The relationship network related to UA level has a total of 24 edges, which is relatively complex直接相关因素
Direct related factorTG、SBP、DBP、LDL和HDL 5个变量与UA水平直接相关,解释了UA水平变异的9.5%,决定系数较小,从专业理论的角度看,尚显不足
The five variables of TG, SBP, DBP, LDL and HDL are directly related to the level of UA, which explains 9.5% of the variation of the level of UA. The Coefficient of determination is small, which is still insufficient from the perspective of professional theory连续型变量提供更多的信息,发现年龄、TG、LDL、HDL直接影响了UA的水平,而UA的水平直接影响了SBP和DBP的水平, 从专业理论的角度看,合理性更强
Continuous variables provide more information, and it is found that age, TG, LDL, and HDL directly affect the level of UA, while the level of UA directly affects the levels of SBP and DBP. From the perspective of professional theory, the rationality is stronger与直接相关因素的关联强度
The correlation strength with directly related factor自变量的偏回归系数反映对因变量的影响程度,标准化回归系数反映不同自变量在模型中的重要性
The partial regression coefficient of the independent variable reflects the degree of influence on the dependent variable, while the standardized regression coefficient reflects the importance of different independent variables in the model子节点与父节点间的偏相关系数,描述了与UA直接相关的6个因素间相关的程度与方向,区分出4个影响因素和2个结局因素
The partial correlation coefficient between the child node and the parent node describes the degree and direction of correlation between the six factors directly related to UA, distinguishing four influencing factors and two outcome factors间接相关因素
Indirect related factor多重线性回归模型无法筛选与UA水平间接相关的影响因素
Multiple linear regression models cannot screen for influencing factors indirectly related to UA levelsTC与UA是间接关系,主要体现在年龄以不同的方式影响着TC、TG、LDL、HDL,血脂各指标有着复杂的关系,从而间接影响了UA的水平
TC and UA are indirectly related, mainly reflected in the fact that age affects TC, TG, LDL, HDL in different ways, and there is a complex relationship between various indicators of blood lipids, thereby indirectly affecting the level of UA注:UA,尿酸;TG,三酰甘油;LDL,低密度脂蛋白;HDL,高密度脂蛋白;MDL,最小描述长度; TC,总胆固醇。
Note: UA, blood uric acid; TG, triglyceride; LDL, low density lipoprotein; HDL, high density lipoprotein; MDL, minimum description length; TC, total cholesterol. -
[1] 关宝生, 白雪, 王艳秋, 等. 痛风/高尿酸血症患者生活习惯的危险因素[J]. 中国老年学杂志, 2014, 34(2): 455-457. DOI: 10.3969/j.issn.1005-9202.2014.02.078.Guan BS, Bai X, Wang YQ, et al. Risk factors of living habits of in gout/hyperuricemia patients[J]. Chin J Gerontol, 2014, 34(2): 455-457. DOI: 10.3969/j.issn.1005-9202.2014.02.078. [2] 杨瑞华, 卢长林, 王广. 体质指数与血清尿酸水平的相关性研究[J]. 中国心血管杂志, 2019, 24(6): 532-535. DOI: 10.3969/j.issn.1007-5410.2019.06.010.Yang RH, Lu CL, Wang G. Research of the correlation between body mass index and uric acid levels[J]. Chin J Cardiovasc Med, 2019, 24(6): 532-535. DOI: 10.3969/j.issn.1007-5410.2019.06.010. [3] Soltani Z, Rasheed K, Kapusta DR, et al. Potential role of uric acid in metabolic syndrome, hypertension, kidney injury, and cardiovascular diseases: is it time for reappraisal?[J]. Curr Hypertens Rep, 2013, 15(3): 175-181. DOI: 10.1007/s11906-013-0344-5. [4] 魏珍. 基于贝叶斯网络在肝硬化并发肝性脑病相关因素及分类识别的应用研究[D]. 太原: 山西医科大学, 2017.Wei Z. Application of Bayesian network in the related factors and classification of hepatic encephalopathy complication of hepatic cirrhosis[D]. Taiyuan: Shanxi Medical University, 2017. [5] 张剑飞, 王辉, 周颜军, 等. 基于局部优化具有连续变量的贝叶斯网络结构学习[J]. 东北师大学报(自然科学版), 2006, 38(1): 27-30. DOI: 10.3321/j.issn:1000-1832.2006.01.006.Zhang JF, Wang H, Zhou YJ, et al. Learning Bayesian network structure with continuous variables based on local optimization[J]. J Northeast Norm Univ (Natural Science Edition), 2006, 38(1): 27-30. DOI: 10.3321/j.issn:1000-1832.2006.01.006. [6] Koch D, Eisinger RS, Gebharter A. A causal Bayesian network model of disease progression mechanisms in chronic myeloid leukemia[J]. J Theor Biol, 2017, 433: 94-105. DOI: 10.1016/j.jtbi.2017.08.023. [7] 杨静. 基于结构方程模型的因果发现研究[D]. 合肥: 合肥工业大学, 2013.Yang J. Causal discovery based on structural equation model[D]. Hefei: Hefei University of Technology, 2013. [8] 曾静, 何耀, 刘淼, 等. 社区老年人血脂异常分布及其影响因素分析[J]. 中华老年心脑血管病杂志, 2016, 18(10): 1026-1029. DOI: 10.3969/j.issn.1009-0126.2016.10.006.Zeng J, He Y, Liu M, et al. Prevalence of dyslipidemia and its influencing factors in community aged people[J]. Chin J Geriatr Heart Brain Vessel Dis, 2016, 18(10): 1026-1029. DOI: 10.3969/j.issn.1009-0126.2016.10.006. [9] 王权, 刘德平. 高尿酸血症与高血压[J]. 中华老年医学杂志, 2019, 38(7): 820-824. DOI: 10.3760/cma.j.issn.0254-9026.2019.07.025.Wang Q, Liu DP. Hyperuricemia and hypertension[J]. Chin J Geriatr, 2019, 38(7): 820-824. DOI: 10.3760/cma.j.issn.0254-9026.2019.07.025. [10] Choi HK, Ford ES. Haemoglobin A1c, fasting glucose, serum C-peptide and insulin resistance in relation to serum uric acid levels: the Third National Health and Nutrition Examination Survey[J]. Rheumatology (Oxford), 2008, 47(5): 713-717. DOI: 10.1093/rheumatology/ken066. [11] 潘金花. 基于Inter. iamb-Tabu混合算法的贝叶斯网络效果评价及在高脂血症相关因素研究中的应用[D]. 太原: 山西医科大学, 2019.Pan JH. Evaluation of Bayesian networks based on Inter. iamb-Tabu hybrid algorithm and its application in analyzing the relating factors of hyperlipidemia[D]. Taiyuan: Shanxi Medical University, 2019.