Univariate Fine-Mapping of Functional (Epigenomic) Data with fSuSiE

Univariate Fine-Mapping of Functional (Epigenomic) Data with fSuSiE#

Description#

Univariate fine-mapping for functional (epigenomic) data is conducted with fSuSiE. This is similar to the normal univariate fine-mapping, with the main difference being the use of epigonmic data.

Input#

--genoFile: path to a text file contatining information on genotype files. For example:

#id     #path
21      $PATH/protocol_example.genotype.chr21_22.21.bed
22      $PATH/protocol_example.genotype.chr21_22.22.bed

--phenoFile: a tab delimited file containing chr, start, end, ID and path for the regions. For example:

#chr    start   end     ID      path
chr21   0       14120807        TADB_1297       $PATH/protocol_example.ha.bed.gz
chr21   10840000        16880069        TADB_1298       $PATH/protocol_example.ha.bed.gz

--covFile: path to a gzipped file containing covariates in the rows, and sample ids in the columns.
--customized-association-windows: a tab delimited file containing chr, start, end, and ID regions. For example:

#chr    start   end     ID
chr21   0       14120807        TADB_1297
chr21   10840000        16880069        TADB_1298

--region-name: if you only wish to analyze one region, then include the ID of a region found in the customized-association-windows file

Output#

  • *_marks.dataset.rds

> str(readRDS("/restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_protocol/output/fsusie/fsus/Mic.chr7_139293693_145380632.16_marks.dataset.rds"))
List of 1
 $ chr7:139293693-145380632:List of 13
  ..$ residual_Y       :List of 1
  .. ..$ ROSMAP_Mic_snATACQTL: num [1:65, 1:166] -0.0444 -0.3137 -0.0634 0.0658 0.8817 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
  .. .. .. ..$ : NULL
  ..$ residual_X       :List of 1
  .. ..$ : num [1:65, 1:15857] -0.4025 -0.8767 -0.2054 -0.0908 0.6848 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
  .. .. .. ..$ : chr [1:15857] "chr7:139302775:AACACACACAC:AACACACACACAC" "chr7:139302775:AACACACACAC:AACACACACACACAC" "chr7:139304706:G:GGT" "chr7:139305695:G:T" ...
  ..$ residual_Y_scalar: num 1
  ..$ residual_X_scalar: num 1
  ..$ covar            :List of 1
  .. ..$ : num [1:65, 1:48] 1 1 1 0 0 0 0 0 1 0 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
  .. .. .. ..$ : NULL
  ..$ Y                :List of 1
  .. ..$ : num [1:65, 1:166] 1.99 2.19 2.02 2.48 3.42 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
  .. .. .. ..$ : NULL
  ..$ X_data           :List of 1
  .. ..$ : num [1:65, 1:15857] 0 0 1 1 1 0 2 0 1 1 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
  .. .. .. ..$ : chr [1:15857] "chr7:139302775:AACACACACAC:AACACACACACAC" "chr7:139302775:AACACACACAC:AACACACACACACAC" "chr7:139304706:G:GGT" "chr7:139305695:G:T" ...
  ..$ maf              :List of 1
  .. ..$ : Named num [1:15857] 0.2692 0.1538 0.0692 0.0923 0.4615 ...
  .. .. ..- attr(*, "names")= chr [1:15857] "chr7:139302775:AACACACACAC:AACACACACACAC" "chr7:139302775:AACACACACAC:AACACACACACACAC" "chr7:139304706:G:GGT" "chr7:139305695:G:T" ...
  ..$ grange           : chr [1:2] "139293693" "145380632"
  ..$ Y_coordinates    :List of 1
  .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	166 obs. of  3 variables:
  .. .. ..$ #chr : chr [1:166] "chr7" "chr7" "chr7" "chr7" ...
  .. .. ..$ start: num [1:166] 1.39e+08 1.39e+08 1.39e+08 1.39e+08 1.39e+08 ...
  .. .. ..$ end  : num [1:166] 1.39e+08 1.39e+08 1.39e+08 1.39e+08 1.39e+08 ...
  ..$ dropped_sample   :List of 3
  .. ..$ X    :List of 1
  .. .. ..$ : chr [1:3] "sample_71" "sample_6" "sample_46"
  .. ..$ Y    :List of 1
  .. .. ..$ : chr [1:5] "sample_1" "sample_6" "sample_7" "sample_46" ...
  .. ..$ covar:List of 1
  .. .. ..$ : chr [1:3] "sample_1" "sample_47" "sample_7"
  ..$ X                : num [1:65, 1:15857] 0 0 1 1 1 0 2 0 1 1 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
  .. .. ..$ : chr [1:15857] "chr7:139302775:AACACACACAC:AACACACACACAC" "chr7:139302775:AACACACACAC:AACACACACACACAC" "chr7:139304706:G:GGT" "chr7:139305695:G:T" ...
  ..$ chrom            : chr "chr7"
  • *_top_pc_weights.rds

> str(readRDS("/restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_protocol/output/fsusie/fsus/Mic.chr7_139293693_145380632.fsusie_mixture_normal_none__top_pc_weights.rds"),max.level = 3)
List of 1
 $ chr7:139293693-145380632:List of 1
  ..$ ROSMAP_Mic_snATACQTL:List of 10
  .. ..$ susie_on_top_pc           :List of 1
  .. ..$ susie_weights_intermediate:List of 6
  .. ..$ twas_weights              :List of 6
  .. ..$ twas_predictions          :List of 6
  .. ..$ twas_cv_result            :List of 4
  .. ..$ total_time_elapsed        : 'proc_time' Named num [1:5] 31992.49 40.47 32206.65 304.39 4.65
  .. .. ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
  .. ..$ fsusie_result             :List of 34
  .. .. ..- attr(*, "class")= chr "susiF"
  .. ..$ Y_coordinates             :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	166 obs. of  3 variables:
  .. ..$ fsusie_summary            :List of 5
  .. ..$ region_info               :List of 3

Minimal Working Example Steps#

iii. Run the Fine-Mapping with fSuSiE#

sos run pipeline/mnm_regression.ipynb fsusie \
    --cwd output/fsusie/ \
    --name   Mic  \
    --genoFile data/fsusie/mwe.genotype_by_chrom_files.txt \
    --phenoFile data/fsusie/mwe.pheno.region_list \
    --covFile   data/fsusie/mwe.chr7_139293693_145380632.Marchenko_PC.anon.gz \
    --cis-window 0 --max-cv-variants 5000 \
    --susie_top_pc 0 --phenotype-names ROSMAP_Mic_snATACQTL --maf 0.01 \
    --save-data \
    --numThreads 8 \
    --post_processing "none" --small-sample-correction 

Anticipated Results#

Univariate finemapping for functional data will produce a file containing results for the top hits and a file containing residuals from SuSiE.

Mic.chr7_139293693_145380632.fsusie_mixture_normal_none__top_pc_weights.rds:

  • For each region of interest, this file contains:

    1. susie_on_top_pc

    2. twas_weights - for each variant (for enet, lasso and mrash methods).

    3. twas predictions - for each sample (for enet, lasso, mrash methods)

    4. twas cross validation results - information on the best method. Data is split into five parts

    5. fsusie results

    6. Y coordinates

    7. fsusie summary

    8. total time elapsed

    9. region info - information on the region specified

Mic.chr7_139293693_145380632.16_marks.dataset.rds:

  • For each gene of interest, contains residuals for each sample and phenotype

  • see pecotmr code for description at fsusie uses the load_regional_functional_data function, an explanation of the arguments can be found at the similar load_regional_association_data function