单效应汇总回归模型在多组学数据共定位分析中的应用

黄婷; 刘晋成; 李会琳; 吴怡雯; 郁尔; 季锴; 唐少文; 赵杨; 戴俊程; 易洪刚

doi:10.16462/j.cnki.zhjbkz.2024.01.019

单效应汇总回归模型在多组学数据共定位分析中的应用

doi: 10.16462/j.cnki.zhjbkz.2024.01.019

黄婷¹,
刘晋成¹,
李会琳¹,
吴怡雯¹,
郁尔¹,
季锴¹,
唐少文²,
赵杨^{1, 2},
戴俊程²,
易洪刚^3, ,

1.
南京医科大学公共卫生学院生物统计学系，南京 211166
2.
南京医科大学公共卫生学院流行病学系, 南京 211166
3.
南京医科大学生物医学大数据重点实验室，肿瘤个体化医学协同创新中心, 南京 211166

基金项目:

国家自然科学基金 81941020

大学生创新创业训练计划项目 202210312151

详细信息

通讯作者:
易洪刚，E-mail: honggangyi@njmu.edu.cn

中图分类号: R181.3
计量
- 文章访问数: 597
- HTML全文浏览量: 258
- PDF下载量: 52
- 被引次数: 0
出版历程
- 收稿日期: 2022-12-15
- 修回日期: 2023-03-23
- 网络出版日期: 2024-02-05
- 刊出日期: 2024-01-10

The application of the sum of single effects regression model for colocalization analysis in multi-omics data

1.
Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing 211166, China
2.
Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China
3.
Key Laboratory of Biomedical Big Data, Cancer Individualized Medicine Collaborative Innovation Center, Nanjing Medical University, Nanjing 211166, China

Funds:

National Natural Science Foundation of China 81941020

College Student Innovation and Entrepreneurship Training Program 202210312151

More Information

Corresponding author: YI Honggang, E-mail: honggangyi@njmu.edu.cn

摘要

摘要: 目的探讨单效应汇总(sum of single effects, SuSiE)回归模型在多组学数据共定位分析中的应用。方法以多组学模拟数据为例，介绍单效应汇总回归模型的基本原理和R软件分析。结果 SuSiE回归模型通过利用单核苷酸多态性(single nucleotide polymorphism，SNPs)位点之间因连锁不平衡(linkage disequilibrium，LD)产生的相关性，允许在有多个因果变异的情况下，正确识别两个组学数据与表型相关的共定位点。结论相对于传统方法，SuSiE回归模型拓展了单一因果变异假设这一适用条件，且计算效率较高，从而有助于利用多组学数据检测多个潜在与疾病相关联位点。
- 多组学 /
- 共定位 /
- 近似贝叶斯因子 /
- 因果变异
Abstract: Objective To explore the application of the sum of single effects (SuSiE) regression model for colocalization analysis with multi-omics data. Methods Taking the simulated data as an example, we introduced the basic principle of SuSiE regression model and the statistical analysis procedures using R software. Results The results showed that the SuSiE regression model could identify the shared casual variants as associated with traits through taking account the linkage disequilibrium (LD) between single nucleotide polymorphisms (SNPs). Despite the presence of multiple causal variants, the colocalization results were still stable. Conclusions Compared with those traditional approaches for colocalization, SuSiE regression model expands the applicability of the single causal variant hypothesis and it has higher computational efficiency, thus helping to detect multiple potential shared casual variants using multi-omics data.
- Multi-omics /
- Colocalization /
- Approximate Bayes factor /
- Casual variants

HTML全文

图 1 共定位分析后验概率及检验假设示意图

A为三元图，蓝色区域对应于共定位的高概率(PP₄＞50%)，橙色区域对应于两种表型为不同因果变异的高概率(PP₃＞50%)，灰色区域对应于未能确定或拒绝共定位的概率。B、C、D、E分别为H0~H4示意图。

Figure 1. The schematic diagram of the posteriori probabilities of colocalization analyse and the five hypotheses

A is a triplet plot, where the blue area corresponds to a high probability of colocalization (PP₄ > 50%), the orange area corresponds to a high probability of the two phenotypes having different causal variations (PP₃ > 50%), and the gray area corresponds to a probability of failing to determine or rejecting colocalization. B, C, D and E are H0-H4 respectively.

下载: 全尺寸图片幻灯片

图 2 SuSiE共定位分析方法基本思想示意图

Figure 2. The schematic diagram of the principle of SuSiE colocalization analysis

下载: 全尺寸图片幻灯片

图 3 数据处理和SuSiE共定位分析示意图

Figure 3. The schematic diagram of the data processing and SuSiE colocalization analysis

下载: 全尺寸图片幻灯片

图 4 SuSiE共定位分析先验概率p₁₂敏感性分析结果

左侧是基因组和转录组的局部曼哈顿图。右侧为不同p₁₂值时H0~H4假设的先验概率和后验概率，绿色框表示PP₄ > 0.9，虚线表示当前的p₁₂值，该值位于绿色框内，表明p₁₂设为当前值时PP₄ > 0.9的结论稳定。

Figure 4. The sensitivity analysis result of the prior probability p₁₂ for SuSiE colocalization analysis

On the left is a local Manhattan map of the genome and transcriptome. The right side is the prior probability and posterior probability of H0-H4 hypothesis with different p₁₂ values, and the green box represents PP₄ > 0.9, the dotted line indicates the current p₁₂ value. The value is in the green box, indicating that when the p₁₂ is set to the current value PP₄ > The conclusion of 0.9 is stable.

下载: 全尺寸图片幻灯片

参考文献(13)

[1]	Tam V, Patel N, Turcotte M, et al. Benefits and iimitations of genome-wide association studies[J]. Nat Rev Genet, 2019, 20(8): 467-484. DOI: 10.1038/s41576-019-0127-1.
[2]	Kia DA, Zhang D, Guelfi S, et al. Identification of candidate parkinson disease genes by integrating genome-wide association study, expression, and epigenetic data sets[J]. JAMA Neurol, 2021, 78(4): 464-472. DOI: 10.1001/jamaneurol.2020.5257.
[3]	Giambartolomei C, Vukcevic D, Schadt EE, et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet[J]. PLoS Genet, 2014, 10(5): e1004383. DOI: 10.1371/journal.pgen.1004383.
[4]	Wallace C. A more accurate method for colocalisation analysis allowing for multiple causal variants[J]. PLoS Genet, 2021, 17(9): e1009440. DOI: 10.1371/journal.pgen.1009440.
[5]	Barbeira AN, Dickinson SP, Bonazzola R, et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics[J]. Nat Commun, 2018, 9(1): 1825. DOI: 10.1038/s41467-018-03621-1.
[6]	Wang G, Sarkar A, Carbonetto P, et al. A simple new approach to variable selection in regression, with application to genetic fine mapping[J]. J R Stat Soc Series B Stat Methodol, 2020, 82(5): 1273-1300. DOI: 10.1111/rssb.12388.
[7]	Berisa T, Pickrell JK. Approximately independent linkage disequilibrium blocks in human populations[J]. Bioinformatics, 2016, 32(2): 283-285. DOI: 10.1093/bioinformatics/btv546.
[8]	Chung RH, Kang CY. A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification[J]. Gigascience, 2019, 8(5): giz045. DOI: 10.1093/gigascience/giz045.
[9]	Wallace C. Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses[J]. PLoS Genet, 2020, 16(4): e1008720. DOI: 10.1371/journal.pgen.1008720.
[10]	Lin JF, Zhou JW, Xu Y. Potential drug targets for multiple sclerosis identified through Mendelian randomization analysis[J]. Brain, 2023, 146(8): 3364-3372. DOI: 10.1093/brain/awad070.
[11]	Yuan MN, Wei LX, Zhou RS, et al. Four FCRL3 gene polymorphisms (FCRL3_3, _5, _6, _8) confer susceptibility to multiple sclerosis: results from a case-control study[J]. Mol Neurobiol, 2016, 53(3): 2029-2035. DOI: 10.1007/s12035-015-9149-7.
[12]	O'Connell P, Blake MK, Godbehere S, et al. SLAMF₇ modulates B cells and adaptive immunity to regulate susceptibility to CNS autoimmunity[J]. J Neuroinflammation, 2022, 19(1): 241. DOI: 10.1186/s12974-022-02594-9.
[13]	Albert FW, Kruglyak L. The role of regulatory variation in complex traits and disease[J]. Nat Rev Genet, 2015, 16(4): 197-212. DOI: 10.1038/nrg3891.