Univariate Fine-Mapping of Functional (Epigenomic) Data with fSuSiE#
Description#
Univariate fine-mapping for functional (epigenomic) data is conducted with fSuSiE. This is similar to the normal univariate fine-mapping, with the main difference being the use of epigonmic data.
Input#
--genoFile
: path to a text file contatining information on genotype files. For example:
#id #path
21 $PATH/protocol_example.genotype.chr21_22.21.bed
22 $PATH/protocol_example.genotype.chr21_22.22.bed
--phenoFile
: a tab delimited file containing chr, start, end, ID and path for the regions. For example:
#chr start end ID path
chr21 0 14120807 TADB_1297 $PATH/protocol_example.ha.bed.gz
chr21 10840000 16880069 TADB_1298 $PATH/protocol_example.ha.bed.gz
--covFile
: path to a gzipped file containing covariates in the rows, and sample ids in the columns.
--customized-association-windows
: a tab delimited file containing chr, start, end, and ID regions. For example:
#chr start end ID
chr21 0 14120807 TADB_1297
chr21 10840000 16880069 TADB_1298
--region-name
: if you only wish to analyze one region, then include the ID of a region found in the customized-association-windows
file
Output#
*_marks.dataset.rds
> str(readRDS("/restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_protocol/output/fsusie/fsus/Mic.chr7_139293693_145380632.16_marks.dataset.rds"))
List of 1
$ chr7:139293693-145380632:List of 13
..$ residual_Y :List of 1
.. ..$ ROSMAP_Mic_snATACQTL: num [1:65, 1:166] -0.0444 -0.3137 -0.0634 0.0658 0.8817 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
.. .. .. ..$ : NULL
..$ residual_X :List of 1
.. ..$ : num [1:65, 1:15857] -0.4025 -0.8767 -0.2054 -0.0908 0.6848 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
.. .. .. ..$ : chr [1:15857] "chr7:139302775:AACACACACAC:AACACACACACAC" "chr7:139302775:AACACACACAC:AACACACACACACAC" "chr7:139304706:G:GGT" "chr7:139305695:G:T" ...
..$ residual_Y_scalar: num 1
..$ residual_X_scalar: num 1
..$ covar :List of 1
.. ..$ : num [1:65, 1:48] 1 1 1 0 0 0 0 0 1 0 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
.. .. .. ..$ : NULL
..$ Y :List of 1
.. ..$ : num [1:65, 1:166] 1.99 2.19 2.02 2.48 3.42 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
.. .. .. ..$ : NULL
..$ X_data :List of 1
.. ..$ : num [1:65, 1:15857] 0 0 1 1 1 0 2 0 1 1 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
.. .. .. ..$ : chr [1:15857] "chr7:139302775:AACACACACAC:AACACACACACAC" "chr7:139302775:AACACACACAC:AACACACACACACAC" "chr7:139304706:G:GGT" "chr7:139305695:G:T" ...
..$ maf :List of 1
.. ..$ : Named num [1:15857] 0.2692 0.1538 0.0692 0.0923 0.4615 ...
.. .. ..- attr(*, "names")= chr [1:15857] "chr7:139302775:AACACACACAC:AACACACACACAC" "chr7:139302775:AACACACACAC:AACACACACACACAC" "chr7:139304706:G:GGT" "chr7:139305695:G:T" ...
..$ grange : chr [1:2] "139293693" "145380632"
..$ Y_coordinates :List of 1
.. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 166 obs. of 3 variables:
.. .. ..$ #chr : chr [1:166] "chr7" "chr7" "chr7" "chr7" ...
.. .. ..$ start: num [1:166] 1.39e+08 1.39e+08 1.39e+08 1.39e+08 1.39e+08 ...
.. .. ..$ end : num [1:166] 1.39e+08 1.39e+08 1.39e+08 1.39e+08 1.39e+08 ...
..$ dropped_sample :List of 3
.. ..$ X :List of 1
.. .. ..$ : chr [1:3] "sample_71" "sample_6" "sample_46"
.. ..$ Y :List of 1
.. .. ..$ : chr [1:5] "sample_1" "sample_6" "sample_7" "sample_46" ...
.. ..$ covar:List of 1
.. .. ..$ : chr [1:3] "sample_1" "sample_47" "sample_7"
..$ X : num [1:65, 1:15857] 0 0 1 1 1 0 2 0 1 1 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
.. .. ..$ : chr [1:15857] "chr7:139302775:AACACACACAC:AACACACACACAC" "chr7:139302775:AACACACACAC:AACACACACACACAC" "chr7:139304706:G:GGT" "chr7:139305695:G:T" ...
..$ chrom : chr "chr7"
*_top_pc_weights.rds
> str(readRDS("/restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_protocol/output/fsusie/fsus/Mic.chr7_139293693_145380632.fsusie_mixture_normal_none__top_pc_weights.rds"),max.level = 3)
List of 1
$ chr7:139293693-145380632:List of 1
..$ ROSMAP_Mic_snATACQTL:List of 10
.. ..$ susie_on_top_pc :List of 1
.. ..$ susie_weights_intermediate:List of 6
.. ..$ twas_weights :List of 6
.. ..$ twas_predictions :List of 6
.. ..$ twas_cv_result :List of 4
.. ..$ total_time_elapsed : 'proc_time' Named num [1:5] 31992.49 40.47 32206.65 304.39 4.65
.. .. ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
.. ..$ fsusie_result :List of 34
.. .. ..- attr(*, "class")= chr "susiF"
.. ..$ Y_coordinates :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 166 obs. of 3 variables:
.. ..$ fsusie_summary :List of 5
.. ..$ region_info :List of 3
Minimal Working Example Steps#
iii. Run the Fine-Mapping with fSuSiE#
sos run pipeline/mnm_regression.ipynb fsusie \
--cwd output/fsusie/ \
--name Mic \
--genoFile data/fsusie/mwe.genotype_by_chrom_files.txt \
--phenoFile data/fsusie/mwe.pheno.region_list \
--covFile data/fsusie/mwe.chr7_139293693_145380632.Marchenko_PC.anon.gz \
--cis-window 0 --max-cv-variants 5000 \
--susie_top_pc 0 --phenotype-names ROSMAP_Mic_snATACQTL --maf 0.01 \
--save-data \
--numThreads 8 \
--post_processing "none" --small-sample-correction
Anticipated Results#
Univariate finemapping for functional data will produce a file containing results for the top hits and a file containing residuals from SuSiE.
Mic.chr7_139293693_145380632.fsusie_mixture_normal_none__top_pc_weights.rds
:
For each region of interest, this file contains:
susie_on_top_pc
twas_weights - for each variant (for enet, lasso and mrash methods).
twas predictions - for each sample (for enet, lasso, mrash methods)
twas cross validation results - information on the best method. Data is split into five parts
fsusie results
Y coordinates
fsusie summary
total time elapsed
region info - information on the region specified
Mic.chr7_139293693_145380632.16_marks.dataset.rds
:
For each gene of interest, contains residuals for each sample and phenotype
see pecotmr code for description at fsusie uses the
load_regional_functional_data
function, an explanation of the arguments can be found at the similarload_regional_association_data
function