Regression with Summary Statistics (RSS) Fine-Mapping and TWAS with SuSiE#
Description#
Last, we include an option to conduct fine-mapping with SuSiE Regression using Summary Statistics (RSS) model and TWAS.
Input#
--ld-meta-data
: file with chrom, start, end and path columns. For example:
#chrom start end path
chr1 101384274 104443097 chr1/chr1_101384274_104443097.cor.xz,chr1/chr1_101384274_104443097.cor.xz.bim
chr1 104443097 106225286 chr1/chr1_104443097_106225286.cor.xz,chr1/chr1_104443097_106225286.cor.xz.bim
chr1 106225286 109761915 chr1/chr1_106225286_109761915.cor.xz,chr1/chr1_106225286_109761915.cor.xz.bim
chr1 109761915 111483530 chr1/chr1_109761915_111483530.cor.xz,chr1/chr1_109761915_111483530.cor.xz.bim
--gwas-meta-data
: file with information on GWAS. For example:
study_id chrom file_path column_mapping_file n_sample n_case n_control
AD_Bellenguez_2022 0 /data/GWAS/AD_GWAS/GCST90027158_buildGRCh38.tsv.gz /data/GWAS/column_mapping_file/Bellenguez.yml 0 111326 677663
AD_Jansen_2021 0 /data/GWAS/AD_GWAS/AD_sumstats_Jansenetal_2019sept.hg38.txt /data/GWAS/column_mapping_file/Jansen.yml 0 71880 383378
AD_Kunkle_Stage1_2019 0 /data/GWAS/AD_GWAS//Kunkle_etal_Stage1_results.txt_file_1_hg38.txt /data/GWAS/column_mapping_file/Kunkle_stage_1.yml 0 21982 41944
AD_Wightman_Full_2021 0 /data/GWAS/AD_GWAS/PGCALZ2full.hg38.txt /data/GWAS/column_mapping_file/AD_Wightman_Full_2021.yml0 90338 1036225
--qc_method
: set to rss_qc, dentist, or slalom.
--finemapping_method
: set to single_effect, susie_rss, or bayesian_conditional_regression.
--cwd
: output path
--skip_analysis_pip_cutoff
: defaults to 0.025
--skip_regions
: format as chr:start-end. For example: 6:25000000-35000000
--region_name
: format as chr:start-end. For example: 22:49355984-50799822
Output#
*.sumstats.tsv.gz
> less RSS_QC_RAISS_imputed.chr11_84267999_86714492.AD_Bellenguez_2022.sumstats.tsv.gz | head
chrom pos variant_id A1 A2 Var.x raiss_ld_score.x raiss_R2.x Var.y raiss_ld_score.y raiss_R2.y pvalue effect_allele_frequency odds_ratio ci_lower ci_upper n_case n_control het_isq het_pvalue beta se variants_id_original Var raiss_ld_score z
11 84267999 11:84267999:T:C C T 0.131232630771865 35.033674731767 0.868767369228135 0.462675463930821
11 84268181 11:84268181:CTG:C C CTG -1 Inf 0.6264 0.9965 1.039 0.891 1.211 83196 386481 0 0.4727 11:84268181:CTG:C -1 Inf -0.486555697823303
11 84268207 11:84268207:T:C C T -1 Inf 0.6347 0.9641 0.99 0.948 1.033 85934 401577 52.6 0.0164 11:84268207:T:C -1 Inf 0.475113122171946
11 84268259 11:84268259:C:T T C -1 Inf 0.5809 6e-04 0.875 0.544 1.407 69576 360279 69.8 0.06861 11:84268259:C:T -1 Inf -0.551980198019802
11 84268265 11:84268265:C:A A C -1 Inf 0.1207 0.035 0.966 0.924 1.009 85934 401577 0 0.6535 11:84268265:C:A -1 Inf -1.54910714285714
11 84268309 11:84268309:T:TC TC T -1 Inf 0.425 0.9995 0.833 0.532 1.305 69576 360279 0 0.5788 11:84268309:T:TC -1 Inf 0.79763986013986
11 84268394 11:84268394:A:T T A -1 Inf 0.8437 0.3875 1.002 0.985 1.018 85934 401577 2.1 0.4233 11:84268394:A:T -1 Inf -0.202380952380952
11 84268618 11:84268618:C:T T C -1 Inf 0.629 0.0104 1.02 0.941 1.106 85502 400497 0 0.497 11:84268618:C:T -1 Inf 0.483091787439614
11 84268622 11:84268622:C:T T C -1 Inf 0.6138 0.0356 1.011 0.968 1.056 85934 401577 51.5 0.01969 11:84268622:C:T -1 Inf 0.504504504504505
*.univariate_susie_rss.rds
> str(readRDS("/restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_protocol/output/rss_analysis/univariate_rss/RSS_QC_RAISS_imputed.chr11_84267999_86714492.univariate_susie_rss.rds"))
List of 1
$ chr11_84267999_86714492:List of 1
..$ AD_Bellenguez_2022:List of 2
.. ..$ susie_rss_RSS_QC_RAISS_imputed:List of 6
.. .. ..$ variant_names : chr [1:10779] "11:84267999:T:C" "11:84268181:CTG:C" "11:84268207:T:C" "11:84268259:C:T" ...
.. .. ..$ analysis_script : chr "library(pecotmr)\nlibrary(dplyr)\nlibrary(data.table)\nskip_region = c(\"6:25000000-35000000\")\nstudies = c(\""| __truncated__
.. .. ..$ sumstats :List of 1
.. .. .. ..$ z: num [1:10779] 0.463 -0.487 0.475 -0.552 -1.549 ...
.. .. ..$ top_loci :'data.frame': 10 obs. of 6 variables:
.. .. .. ..$ variant_id : chr [1:10] "11:86139201:C:T" "11:86141937:G:A" "11:86144030:ATCT:A" "11:86144034:AC:A" ...
.. .. .. ..$ z : num [1:10] 12 12 11.6 11.6 11.6 ...
.. .. .. ..$ pip : num [1:10] 0.2735 0.356 0.0339 0.0336 0.0339 ...
.. .. .. ..$ cs_coverage_0.95: num [1:10] 2 2 2 0 2 2 2 2 1 1
.. .. .. ..$ cs_coverage_0.7 : num [1:10] 2 2 0 0 0 0 0 2 1 1
.. .. .. ..$ cs_coverage_0.5 : num [1:10] 2 2 0 0 0 0 0 0 1 0
.. .. ..$ susie_result_trimmed:List of 9
.. .. .. ..$ pip : num [1:10779] 1.63e-10 1.72e-10 1.64e-10 1.65e-10 2.45e-10 ...
.. .. .. ..$ sets :List of 5
.. .. .. .. ..$ cs :List of 2
.. .. .. .. .. ..$ L1: int [1:2] 7797 7798
.. .. .. .. .. ..$ L2: int [1:7] 7725 7735 7742 7744 7754 7760 7770
.. .. .. .. ..$ purity :'data.frame': 2 obs. of 3 variables:
.. .. .. .. .. ..$ min.abs.corr : num [1:2] 0.994 0.954
.. .. .. .. .. ..$ mean.abs.corr : num [1:2] 0.994 0.982
.. .. .. .. .. ..$ median.abs.corr: num [1:2] 0.994 0.99
.. .. .. .. ..$ cs_index : int [1:2] 1 2
.. .. .. .. ..$ coverage : num [1:2] 0.992 0.958
.. .. .. .. ..$ requested_coverage: num 0.95
.. .. .. ..$ cs_corr : num [1:2, 1:2] 1 0.64 0.64 1
.. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. ..$ : chr [1:2] "L1" "L2"
.. .. .. .. .. ..$ : chr [1:2] "L1" "L2"
.. .. .. ..$ sets_secondary:List of 2
.. .. .. .. ..$ coverage_0.7:List of 2
.. .. .. .. .. ..$ sets :List of 5
.. .. .. .. .. .. ..$ cs :List of 2
.. .. .. .. .. .. .. ..$ L1: int [1:2] 7797 7798
.. .. .. .. .. .. .. ..$ L2: int [1:3] 7725 7735 7770
.. .. .. .. .. .. ..$ purity :'data.frame': 2 obs. of 3 variables:
.. .. .. .. .. .. .. ..$ min.abs.corr : num [1:2] 0.994 0.954
.. .. .. .. .. .. .. ..$ mean.abs.corr : num [1:2] 0.994 0.968
.. .. .. .. .. .. .. ..$ median.abs.corr: num [1:2] 0.994 0.96
.. .. .. .. .. .. ..$ cs_index : int [1:2] 1 2
.. .. .. .. .. .. ..$ coverage : num [1:2] 0.992 0.73
.. .. .. .. .. .. ..$ requested_coverage: num 0.7
.. .. .. .. .. ..$ cs_corr: num [1:2, 1:2] 1 0.64 0.64 1
.. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. .. .. ..$ : chr [1:2] "L1" "L2"
.. .. .. .. .. .. .. ..$ : chr [1:2] "L1" "L2"
.. .. .. .. ..$ coverage_0.5:List of 2
.. .. .. .. .. ..$ sets :List of 5
.. .. .. .. .. .. ..$ cs :List of 2
.. .. .. .. .. .. .. ..$ L1: int 7797
.. .. .. .. .. .. .. ..$ L2: int [1:2] 7725 7735
.. .. .. .. .. .. ..$ purity :'data.frame': 2 obs. of 3 variables:
.. .. .. .. .. .. .. ..$ min.abs.corr : num [1:2] 1 0.96
.. .. .. .. .. .. .. ..$ mean.abs.corr : num [1:2] 1 0.96
.. .. .. .. .. .. .. ..$ median.abs.corr: num [1:2] 1 0.96
.. .. .. .. .. .. ..$ cs_index : int [1:2] 1 2
.. .. .. .. .. .. ..$ coverage : num [1:2] 0.558 0.629
.. .. .. .. .. .. ..$ requested_coverage: num 0.5
.. .. .. .. .. ..$ cs_corr: num [1:2, 1:2] 1 0.64 0.64 1
.. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. .. .. ..$ : chr [1:2] "L1" "L2"
.. .. .. .. .. .. .. ..$ : chr [1:2] "L1" "L2"
.. .. .. ..$ alpha : num [1:2, 1:10779] 2.29e-16 1.63e-10 2.09e-16 1.72e-10 2.07e-16 ...
.. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. ..$ : NULL
.. .. .. .. .. ..$ : chr [1:10779] "11:84267999:T:C" "11:84268181:CTG:C" "11:84268207:T:C" "11:84268259:C:T" ...
.. .. .. ..$ lbf_variable : num [1:2, 1:10779] -1.97 -1.77 -2.06 -1.72 -2.07 ...
.. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. ..$ : NULL
.. .. .. .. .. ..$ : chr [1:10779] "11:84267999:T:C" "11:84268181:CTG:C" "11:84268207:T:C" "11:84268259:C:T" ...
.. .. .. ..$ V : num [1:2] 8.98e-05 5.29e-05
.. .. .. ..$ niter : int 13
.. .. .. ..$ max_L : int 5
.. .. .. ..- attr(*, "class")= chr "susie"
.. .. ..$ outlier_number : int 0
.. ..$ rss_data_analyzed :'data.frame': 10779 obs. of 26 variables:
.. .. ..$ chrom : chr [1:10779] "11" "11" "11" "11" ...
.. .. ..$ pos : int [1:10779] 84267999 84268181 84268207 84268259 84268265 84268309 84268394 84268618 84268622 84268650 ...
.. .. ..$ variant_id : chr [1:10779] "11:84267999:T:C" "11:84268181:CTG:C" "11:84268207:T:C" "11:84268259:C:T" ...
.. .. ..$ A1 : chr [1:10779] "C" "C" "C" "T" ...
.. .. ..$ A2 : chr [1:10779] "T" "CTG" "T" "C" ...
.. .. ..$ Var.x : num [1:10779] 0.131 NA NA NA NA ...
.. .. ..$ raiss_ld_score.x : num [1:10779] 35 NA NA NA NA ...
.. .. ..$ raiss_R2.x : num [1:10779] 0.869 NA NA NA NA ...
.. .. ..$ Var.y : num [1:10779] NA -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
.. .. ..$ raiss_ld_score.y : num [1:10779] NA Inf Inf Inf Inf ...
.. .. ..$ raiss_R2.y : num [1:10779] NA NA NA NA NA NA NA NA NA NA ...
.. .. ..$ pvalue : num [1:10779] NA 0.626 0.635 0.581 0.121 ...
.. .. ..$ effect_allele_frequency: num [1:10779] NA 0.9965 0.9641 0.0006 0.035 ...
.. .. ..$ odds_ratio : num [1:10779] NA 1.039 0.99 0.875 0.966 ...
.. .. ..$ ci_lower : num [1:10779] NA 0.891 0.948 0.544 0.924 0.532 0.985 0.941 0.968 0.982 ...
.. .. ..$ ci_upper : num [1:10779] NA 1.21 1.03 1.41 1.01 ...
.. .. ..$ n_case : num [1:10779] NA 83196 85934 69576 85934 ...
.. .. ..$ n_control : num [1:10779] NA 386481 401577 360279 401577 ...
.. .. ..$ het_isq : num [1:10779] NA 0 52.6 69.8 0 0 2.1 0 51.5 0 ...
.. .. ..$ het_pvalue : num [1:10779] NA 0.4727 0.0164 0.0686 0.6535 ...
.. .. ..$ beta : num [1:10779] NA NA NA NA NA NA NA NA NA NA ...
.. .. ..$ se : num [1:10779] NA NA NA NA NA NA NA NA NA NA ...
.. .. ..$ variants_id_original : chr [1:10779] NA "11:84268181:CTG:C" "11:84268207:T:C" "11:84268259:C:T" ...
.. .. ..$ Var : num [1:10779] NA -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
.. .. ..$ raiss_ld_score : num [1:10779] NA Inf Inf Inf Inf ...
.. .. ..$ z : num [1:10779] 0.463 -0.487 0.475 -0.552 -1.549 ..
Minimal Working Example Steps#
v. Run the Summary Statistics Fine-Mapping#
sos run pipeline/rss_analysis.ipynb univariate_rss \
--ld-meta-data data/ld_meta_file_with_bim.tsv \
--gwas-meta-data data/mnm_regression/gwas_meta_data.txt \
--qc_method "rss_qc" --impute \
--finemapping_method "susie_rss" \
--cwd output/rss_analysis \
--skip_analysis_pip_cutoff 0 \
--skip_regions 6:25000000-35000000 \
--region_name 22:49355984-50799822
Anticipated Results#
Summary statistics fine-mapping produces a results file for each region and gwas of interest.
RSS_QC_RAISS_imputed.chr22_49355984_50799822.univariate_susie_rss.rds
:
For each region in
region_name
and gwas in thegwas-meta-data
file:
RSS_QC_RAISS_imputed
:
a.variant_names
b.analysis_script
c.sumstats
d.susie_result_trimmed
e.outlier_number
a. chrom
b. pos
c. variant_id
d. A1
e. A2
f. var
g. raiss_ld_score
h. raiss_R2
i. pvalue
j. effect_allele_frequency
k. odds_ratio
l. ci_lower
m. ci_upper
n. beta
o. se
p. n_case
q. n_control
r. het_isq
s. het_pvalue
t. variant_alternate_id
u. z
A file for each gwas in gwas-meta-data
like:
RSS_QC_RAISS_imputed.chr22_49355984_50799822.AD_Bellenguez_EADB_2022.sumstats.tsv.gz
.
The contents of these are included in the .rds file above