The effect of mis-adjusting instrumental variables on the estimation of causal effect in Logistic regression analysis model
-
摘要:
目的 通过统计模拟和实例数据分析,探索当存在不可观测的混杂因素时,Logistic回归分析模型中调整工具变量(instrumental variable, Ⅳ)对估计因果效应的影响。 方法 设定变量均服从二项分布,在Logistic回归分析模型中依次使用不同的参数进行统计模拟,以因果效应估计值的偏倚和标准误作为评价指标;实例数据分析是基于山东省多家医院健康体检中心的体检随访数据,以高血压为目标结局,构建纵向观察队列,筛选单核苷酸多态性(single nucleotide polymorphism, SNP)位点rs12149832作为Ⅳ,在Logistic回归分析模型中,采用不同策略(纳入/不纳入rs12149832协变量)来分析BMI与患高血压风险之间的关系。 结果 统计模拟结果显示在以Logistic回归分析模型估计暴露与结局间的效应时,协变量集中纳入Ⅳ会增大效应估计的偏倚和标准误,但增大程度较小;实例分析中,高血压队列共纳入1 240名女性,基线年龄为(37.7±10.5)岁,BMI为(22.1±3.1)kg/m2。纳入Ⅳ的模型所得的效应估计值为0.225(P<0.001),略小于不包含Ⅳ的回归模型所得的效应估计值(0.228, P<0.001),基本验证了关于纳入Ⅳ进行调整的统计模拟结果。 结论 观察性流行病学研究中,Logistic回归分析模型误纳入Ⅳ对效应估计值的偏倚和标准误均有影响。 -
关键词:
- 工具变量 /
- 因果推断 /
- 混杂因素 /
- Logistic回归分析模型
Abstract:Objective To explore the effects of adjusting for instrumental variables (Ⅳs) in a Logistic regression model through statistical simulation and real data analysis while there were unmeasured confounding factors. Methods Simulations were carried out by traversing the value of parameters in the Logistic regression model, and variables were all binomial distribution. Bias and standard error were used to evaluate the performance of estimators. As for the real data analysis, a longitudinal hypertension cohort was constructed based on the multi-center health management cohort of Shandong Province, and single nucleotide polymorphism (SNP) rs12149832 was selected as the Ⅳ. Logistic regression models with and without adjusting Ⅳ(rs12149832) were used to estimate the effect of body mass index (BMI) on hypertension. Results The statistical simulation results showed that adjusting for Ⅳs in a Logistic regression model would increase the confounding bias and the standard error of effect estimation, but these increases were generally small. As for the real data analysis, a total of 1 240 women were included in the Hypertension cohort. The baseline age was (37.7±10.5) years and the BMI was (22.1±3.1) kg/m2. The estimated value with adjusting for Ⅳ (0.225, P < 0.001) was slightly less than the estimated value without adjusting for the Ⅳ (0.228, P < 0.001), which basically verified the statistical simulation results about adjusting Ⅳs. Conclusion In observational epidemiological studies, the mistaken inclusion of Ⅳs in the Logistic regression model has an impact on both the bias and standard error of the effect estimates. -
Key words:
- Instrumental variable /
- Causal inference /
- Confounding factor /
- Logistic regression model
-
表 1 SNP位点与BMI的关联性
Table 1. The association between SNP and BMI
SNP位点 β值 sx值 t值 P值 rs12149832 0.433 0.196 2.208 0.027 表 2 三种策略下BMI对高血压的效应估计
Table 2. Estimation of the effect of BMI on hypertension under three strategies
模型 方法/自变量 估计值 sx值 OR(95% CI)值 Z值 P值 MR TSLS 1.066 0.433 2.904(1.212~6.656) 2.462 0.013 Logistic模型1 BMI 0.228 0.029 1.256(1.186~1.331) 7.764 <0.001 Logistic模型2 BMI+rs12149832 0.225 0.029 1.252(1.183~1.327) 7.653 <0.001 -
[1] Splawa-Neyman J, Dabrowska DM, Speed TP. On the application of probability theory to agricultural experiments. Essay on principles. Section 9[J]. Statist Sci, 1990, 5(4): 465-472. DOI: 10.1214/ss/1177012031. [2] Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies[J]. J Educ Psychol, 1974, 66(5): 688-701. DOI: 10.1307/h0037350. [3] Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects[J]. Biometrika, 1983, 70(1): 41-55. DOI: 10.1093/biomet/70.1.41. [4] 耿直. 观察性研究与混杂因素[J]. 统计与信息论坛, 2004, (5): 13-17. DOI: 10.3969/j.issn.1007-3116.2004.05.003.Geng Z. Observational studies and confounding factors[J]. Statistics & Information Forum, 2004, (5): 13-17. DOI: 10.3969/j.issn.1007-3116.2004.05.003. [5] D'Agostino RB. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group[J]. Stat Med, 1998, 17(19): 2265-2281. DOI:10.1002/(sici)1097-0258(19981015)17:19<2265:aid-sim918>3.0.co;2-b. [6] VanderWeele TJ, Shpitser I. A new criterion for confounder selection[J]. Biometrics, 2011, 67(4): 1406-1413. DOI: 10.1111/j.1541-0420.2011.01619.x. [7] Rubin DB. Should observational studies be designed to allow lack of balance in covariate distributions across treatment groups?[J]. Stat Med, 2010, 28(9): 1420-1423. DOI: 10.1002/sim.3565. [8] Hirano K, Imbens GW. Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization[J]. Heal Serv Outcomes Res Methodol, 2001, 2(3-4): 259-278. DOI: 10.1023/a:1020371312283. [9] Hitchcock C, Pearl J. Causality: models, reasoning and inference[J]. Philosophical Review, 2001, 110(4): 639. DOI: 10.2307/3182612. [10] Weinberg CR. Toward a clearer definition confounding[J]. Am J Epidemiol, 1993, 137(1): 1-8. DOI:1093/oxfordjourala.aje.a116591. [11] Bhattacharya J, Vogt WB. Do instrumental variables belong in propensity scores?[J]. Int J Stat Econ, 2012, 9(A12): 107-127. http://www.ams.org/mathscinet-getitem?mr=2967752 [12] Crown WH. Propensity-score matching in economic analyses: comparison with regression models, instrumental variables, residual inclusion, differences-in-differences, and decomposition methods[J]. Appl Health Econ Health Policy, 2014, 12(1): 7-18. DOI: 10.1007/s40258-013-0075-4. [13] James H, Salvador NL, et al. Using matching, instrumental variables, and control functions to estimate economic choice models[J]. Rev Econ Stat, 2004, 86(1): 30-57. DOI: 10.1162/003465304323023660. [14] Patrick AR, Schneeweiss S, Brookhart MA, et al. The implications of propensity score variable selection strategies in pharmacoepidemiology: an empirical illustration[J]. Pharmacoepidemiol Drug Saf, 2011, 20(6): 551-559. DOI: 10.1002/pds.2098. [15] Myers JA, Rassen JA, Gagne JJ, et al. Effects of adjusting for instrumental variables on bias and precision of effect estimates[J]. Am J Epidemiol, 2011, 174(11): 1213-1222. DOI: 10.1093/aje/kwr364. [16] Walker AM. Matching on provider is risky[J]. J Clin Epidemiol, 2013, 66(8): S65-S68. DOI: 10.1016/j.jclinepi.2013.02.012. [17] Brooks JM, Ohsfeldt RL. Squeezing the balloon: propensity scores and unmeasured covariate balance[J]. Health Serv Res, 2013, 48(4): 1487-1507. DOI: 10.1111/1475-6773.12020. [18] Ali MS, Groenwold RH, Klungel OH. Propensity score methods and unobserved covariate imbalance: comments on "squeezing the balloon"[J]. Health Serv Res, 2014, 49(3): 1074-1082. DOI: 10.1111/1475-6773.12152. [19] Pearl J. On a class of bias-amplifying variables that endanger effect estimates[J]. Computer ence, 2012: 417-424 http://www.oalib.com/paper/4031166 [20] 冯国双, 陈景武, 周春莲. Logistic回归应用中容易忽视的几个问题[J]. 中华流行病学杂志, 2004, 25(6): 544-545. DOI: 10.3760/j.issn:0254-6450.2004.06.022.Feng GS, Chen JW, Zhou CL. Several problems easy to be ignored in the application of logistic regression[J]. Chin J Epidemiol, 2004, 25(6): 544-545. DOI: 10.3760/j.issn:0254-6450.2004.06.022. [21] 刘启军, 曾庆, 周燕荣. 精确Logistic回归及其SAS应用程序[J]. 中华流行病学杂志, 2003, 24(8): 725-728. DOI: 10.3760/j.issn:0254-6450.2003.08.022.Liu QJ, Zeng Q, Zhou YR. Accurate logistic regression and its SAS application[J]. Chin J Epidemiol, 2003, 24(8): 725-728. DOI: 10.3760/j.issn:0254-6450.2003.08.022. [22] 刘娅飞, 邢娉, 徐秀琴, 等. 山东多中心健康管理纵向观察队列[J]. 山东大学学报(医学版), 2017(6): 30-36. DOI: 10.6040/j.issn.1671-7554.0.2017.376.Liu YF, Xing P, Xu XQ, et al. Multi-center health management cohort of Shandong Province[J]. Journal of Shandong University (Medical Sciences), 2017(6): 30-36. DOI: 10.6040/j.issn.1671-7554.0.2017.376. [23] 中国高血压防治指南修订委员会. 中国高血压防治指南2010[J]. 中华心血管病杂志, 2011, 39(7): 579-616. doi: 10.3760/cma.j.issn.0253-3758.2011.07.002Chinese Committee for the Revision of Hypertension Guidelines. 2010 Chinese guidelines for the management of hypertension[J]. Chin J Cardiol, 2011, 39(7): 579-616. doi: 10.3760/cma.j.issn.0253-3758.2011.07.002 [24] Loos R, Lindgren CM, Li S, et al. Common variants near MC4R are associated with fat mass, weight and risk of obesity[J]. Nat Genet, 2008, 40(6): 768. DOI: 10.1038/ng.140. [25] Chen B, Li Z, Chen J, et al. Association of fat mass and obesity-associated and retinitis pigmentosa guanosine triphosphatase (GTPase) regulator-interacting protein-1 like polymorphisms with body mass index in Chinese women[J]. Endocr J, 2018, 65(7). DOI: 10.1507/endocrj.ej17-0554. [26] Palmer TM, Lawlor DA, Harbord RM, et al. Using multiple genetic variants as instrumental variables for modifiable risk factors[J]. Stat Methods Med Res, 2012, 21(3): 223-242. DOI: 10.1177/0962280210394459. [27] Palmer TM, Nordestgaard BG, Benn M, et al. Association of plasma uric acid with ischaemic heart disease and blood pressure: Mendelian randomisation analysis of two large cohorts[J]. BMJ, 2013, 347: f4262. DOI: 10.1136/bmj.f4262. [28] Timpson NJ, Harbord R, Davey Smith G, et al. Does greater adiposity increase blood pressure and hypertension risk?: Mendelian randomization using the FTO/MC4R genotype[J]. Hypertension, 2009, 54(1): 84-90. DOI: 10.1161/hypertensionaha.109.130005. [29] Didelez V, Sheehan N. Mendelian randomization as an instrumental variable approach to causal inference[J]. Stat Methods Med Res, 2007, 16(4): 309-330. DOI: 10.1177/0962280206077743. [30] Ding P, Vanderweele TJ, Robins JM. Instrumental variables as bias amplifiers with general outcome and confounding[J]. Biometrika, 2017, 104(2): 291-302. DOI: 10.1093/biomet/asx009.