Regression with Summary Statistics (RSS) Fine-Mapping and TWAS with SuSiE#

Description#

Last, we include an option to conduct fine-mapping with SuSiE Regression using Summary Statistics (RSS) model and TWAS.

Input#

  • --gwas-meta — TSV with columns study_id, path (GWAS sumstats, relative or absolute), and optional column_mapping (per-study YAML). One row per study:

study_id            path                                                           column_mapping
AD_Bellenguez_2022  input/rss_analysis/protocol_example.gwas_sumstats.chr22.tsv.gz  input/rss_analysis/protocol_example.gwas_column_mapping.fixed.yml
  • --ld-meta — LD-reference meta TSV with columns #chr, start, end, path (genotype prefix pointing to PLINK2/PLINK1 files; produced by rss_ld_sketch.ipynb):

#chr   start     end       path
chr22  49355984  50799822  input/ld_reference/chr22_49355984_50799822
  • --regions — one or more regions in chr:start-end format (e.g. chr22:49355984-50799822)

  • --cwd — output root directory

  • --modular-script-dir — path to code/script/ directory

Optional QC parameters (forwarded to summaryStatsQc()):

Parameter

Default

Description

--qc-method

none

QC method: none, slalom, or dentist

--impute

off

Enable RAISS sumstat imputation for missing variants

--qc-args

''

JSON object of extra summaryStatsQc() arguments, e.g. '{"mafCutoff":0.01}'

--min-abs-corr

0.8

Minimum absolute pairwise LD r² for credible set purity filter

--L

20

Maximum number of causal signals SuSiE-RSS searches for

--method-args

''

Nested per-method JSON, e.g. '{"susie":{"L":10}}'

Output#

Two output subdirectories are created under --cwd:

sumstats/ — one GwasSumStats S4 RDS per (study × LD block):

  • {study_id}.{chrom}_{start}_{end}.gwas_sumstats.rds

fine_mapping/ — one GwasFineMappingResult S4 RDS per (study × LD block):

  • {study_id}.{chrom}_{start}_{end}.gwas_finemap.rds

plots/ — PIP Manhattan plots:

  • {study_id}.{chrom}_{start}_{end}.pip_plot.png

Minimal Working Example Steps#

v. Run the Summary Statistics Fine-Mapping#

sos run pipeline/rss_analysis.ipynb \
    generate_manifest+generate_gwas_sumstats+gwas_fine_mapping+gwas_rss_plot \
    --cwd output/rss_analysis --modular-script-dir code/script \
    --gwas-meta input/rss_analysis/protocol_example.rss_mwe.gwas_meta.tsv \
    --regions chr22:49355984-50799822 \
    --ld-meta input/ld_reference/protocol_example.ld_meta_file.tsv

Anticipated Results#

Summary statistics fine-mapping produces a results file for each region and gwas of interest.

GwasFineMappingResult S4 object#

suppressPackageStartupMessages(library(pecotmr))
rds <- readRDS("output/rss_analysis/fine_mapping/AD_Bellenguez_2022.chr22_49355984_50799822.gwas_finemap.rds")
class(rds)    # "GwasFineMappingResult"

ld  <- attr(rds, "listData")
names(ld)     # "study"  "method"  "region_id"  "entry"
ld$study      # "AD_Bellenguez_2022"
ld$method     # "susie"

e1 <- ld$entry[[1]]
names(attributes(e1))  # "variantIds"  "susieFit"  "topLoci"  "cvResult"  "class"

topLoci table (one row per variant in the region):

tl <- attr(e1, "topLoci")
dim(tl)          # 24 × 23
names(tl)
# [1] "variant_id"     "chrom"          "pos"            "A1"             "A2"
# [6] "N"              "af"             "marginal_beta"  "marginal_se"    "marginal_z"
# [11] "marginal_p"     "pip"            "posterior_mean" "posterior_sd"   "cs_95"
# [16] "cs_70"          "cs_50"          "cs_95_purity"   "method"         "gene"
# [21] "event"          "grange_start"   "grange_end"

# Top 3 by PIP
tl_s <- tl[order(tl$pip, decreasing = TRUE), ]
tl_s[1:3, c("variant_id", "af", "marginal_z", "pip", "cs_95", "cs_95_purity")]
#            variant_id     af marginal_z       pip       cs_95 cs_95_purity
# chr22:49363017:G:A   0.317      2.753    0.382   susie_rss_0            0
# chr22:49784794:G:A   0.073     -2.280    0.292   susie_rss_0            0
# chr22:49514095:A:G   0.937      2.053    0.268   susie_rss_0            0

Toy data note: All variants fall in one large credible set (cs_95_purity = 0), indicating insufficient LD contrast to localize the signal. In real large-sample GWAS the top CS typically contains 1–10 variants with purity ≥ 0.5.

GwasSumStats S4 object#

ss  <- readRDS("output/rss_analysis/sumstats/AD_Bellenguez_2022.chr22_49355984_50799822.gwas_sumstats.rds")
class(ss)   # "GwasSumStats"

# main data — GRanges with GWAS columns
ld_ss  <- attr(ss, "listData")
ld_ss$study   # "AD_Bellenguez_2022"
e_ss   <- ld_ss$entry[[1]]   # GRanges
names(mcols(e_ss))   # "SNP"  "A1"  "A2"  "Z"  "N"  "MAF"  "BETA"  "SE"  "P"
length(e_ss)         # 24  (variants passing QC and matched to LD panel)

# QC audit
attr(ss, "qcInfo")$entryAudit[[1]]
# $variantsIn:  7609   (raw input variants)
# $contentFilters$nDropped:  2563
# $betaSeFromZ$nDerived:  5046   (beta/se imputed from z + N)
# $matchedAgainstSketch:  24   (variants matched to LD reference)
# $variantsOut:  24

qcInfo$entryAudit key fields:

Field

Description

variantsIn

Raw variants in the GWAS sumstats for this region

contentFilters$nDropped

Dropped by QC filters (strand-ambiguous, missing data, etc.)

betaSeFromZ$nDerived

Variants where β/SE were imputed from Z-score and N

matchedAgainstSketch

Variants successfully matched to LD reference panel

variantsOut

Final variant count used for fine-mapping