Univariate Fine-Mapping and TWAS with SuSiE

Univariate Fine-Mapping and TWAS with SuSiE#

Description#

Our pipeline is capable of performing univariate fine-mapping with SuSiE with TWAS weights. The TWAS portion makes use of TWAS weights, linkage disequilibrium data and GWAS summary statistics. Preset variants used are taken from the linkage disequilibrium data and used only for TWAS. TWAS cross validation tell us which of the four methods (enet, lasso, mrash, SuSiE) are best to use. By default, we limit to under 5000 variants for cross validation. In cross validation, the data is split into five parts. Training is done on four parts, and prediction is done on the fifth. Linear regression is used to assess the results and get r squared and pvalues.

Fine mapping with SuSiE follows the formulat y=xb+e where x has many highly correlated variables due to linkage disequilibrium. Therefore, true effects (b), are very sparse. The SuSiE wrapper looks for five independent signals in each region to increase convergence speed. However, if five signals are found, then the the upper limit is increased. SuSiE does not allow for the inclusion of covariates. Therefore, covariates are regressed in.

Input:#

--genoFile: path to a plink bed file containing genotypes.
--phenoFile: at least one tab delimited file containing chr, start, end, ID and path for the regions. For example:

#chr    start   end     ID      path
chr12   389272  389273  ENSG00000120647 $PATH/Ast.log2cpm.bed.gz
chr12   389319  389320  ENSG00000073614 $PATH/Ast.log2cpm.bed.gz

--covFile: at least one tab delimited file containing covariates in the rows and samples in the columns.
--customized-association-windows: a tab delimited file containing chr, start, end, and gene_id. For example:

#chr    start   end     gene_id
chr1    0       6480000 ENSG00000008128
chr1    0       6480000 ENSG00000008130

----phenotype-names: the names of the phenotypes if multiple are included. There should be one for each phenotype file you include.
--max-cv-variants: maximum number of variants for cross-validation.
--ld_reference_meta_file: path to file containing chrom, start, end and path for linkage disequilibrium region information. For example:

#chrom  start   end     path
chr1    101384274       104443097       $PATH/chr1_101384274_104443097.cor.xz.bim
chr1    104443097       106225286       $PATH/chr1_104443097_106225286.cor.xz.bim

--region-name: if you only wish to analyze one region, then include the gene_id of a region found in the customized-association-windows file

Output#

*.univariate_bvrs.rds

> str(readRDS("/restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_protocol/output/mnm_regression/susie_twas/fine_mapping/test_susie_twas.chr11_ENSG00000073921.univariate_bvsr.rds"))
List of 1
 $ ENSG00000073921:List of 1
  ..$ test_pheno_ENSG00000073921:List of 10
  .. ..$ variant_names         : chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  .. ..$ analysis_script       : chr "options(warn=1)\nlibrary(pecotmr)\nphenotype_files = c(\"/restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_prot"| __truncated__
  .. ..$ other_quantities      :List of 1
  .. .. ..$ dropped_samples:List of 3
  .. .. .. ..$ X    : NULL
  .. .. .. ..$ y    : NULL
  .. .. .. ..$ covar: NULL
  .. ..$ sumstats              :List of 5
  .. .. ..$ betahat  : Named num [1:9970] 0.00479 0.00573 0.09947 0.02035 0.02035 ...
  .. .. .. ..- attr(*, "names")= chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  .. .. ..$ sebetahat: Named num [1:9970] 0.0563 0.0568 0.0791 0.0551 0.0551 ...
  .. .. .. ..- attr(*, "names")= chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  .. .. ..$ z_scores : Named num [1:9970] 0.085 0.101 1.258 0.369 0.369 ...
  .. .. .. ..- attr(*, "names")= chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  .. .. ..$ p_values : Named num [1:9970] 0.932 0.92 0.208 0.712 0.712 ...
  .. .. .. ..- attr(*, "names")= chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  .. .. ..$ q_values : Named num [1:9970] 0.922 0.92 0.633 0.859 0.859 ...
  .. .. .. ..- attr(*, "names")= chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  .. ..$ sample_names          : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. ..$ top_loci              :'data.frame':	3 obs. of  9 variables:
  .. .. ..$ variant_id      : chr [1:3] "chr11:85179671:G:A" "chr11:85266101:G:C" "chr11:85314573:G:A"
  .. .. ..$ betahat         : num [1:3] -0.743 -0.747 -0.747
  .. .. ..$ sebetahat       : num [1:3] 0.147 0.147 0.147
  .. .. ..$ z               : num [1:3] -5.06 -5.1 -5.1
  .. .. ..$ maf             : num [1:3] 0.0235 0.0233 0.0233
  .. .. ..$ pip             : num [1:3] 0.292 0.349 0.349
  .. .. ..$ cs_coverage_0.95: num [1:3] 1 1 1
  .. .. ..$ cs_coverage_0.7 : num [1:3] 1 1 1
  .. .. ..$ cs_coverage_0.5 : num [1:3] 0 1 1
  .. ..$ susie_result_trimmed  :List of 12
  .. .. ..$ pip                   : num [1:9970] 0.000431 0.00043 0.000673 0.000431 0.000431 ...
  .. .. ..$ sets                  :List of 5
  .. .. .. ..$ cs                :List of 1
  .. .. .. .. ..$ L1: int [1:3] 653 910 1069
  .. .. .. ..$ purity            :'data.frame':	1 obs. of  3 variables:
  .. .. .. .. ..$ min.abs.corr   : num 1
  .. .. .. .. ..$ mean.abs.corr  : num 1
  .. .. .. .. ..$ median.abs.corr: num 1
  .. .. .. ..$ cs_index          : int 1
  .. .. .. ..$ coverage          : num 0.989
  .. .. .. ..$ requested_coverage: num 0.95
  .. .. ..$ cs_corr               : logi NA
  .. .. ..$ sets_secondary        :List of 2
  .. .. .. ..$ coverage_0.7:List of 2
  .. .. .. .. ..$ sets   :List of 5
  .. .. .. .. .. ..$ cs                :List of 1
  .. .. .. .. .. .. ..$ L1: int [1:3] 653 910 1069
  .. .. .. .. .. ..$ purity            :'data.frame':	1 obs. of  3 variables:
  .. .. .. .. .. .. ..$ min.abs.corr   : num 1
  .. .. .. .. .. .. ..$ mean.abs.corr  : num 1
  .. .. .. .. .. .. ..$ median.abs.corr: num 1
  .. .. .. .. .. ..$ cs_index          : int 1
  .. .. .. .. .. ..$ coverage          : num 0.989
  .. .. .. .. .. ..$ requested_coverage: num 0.7
  .. .. .. .. ..$ cs_corr: logi NA
  .. .. .. ..$ coverage_0.5:List of 2
  .. .. .. .. ..$ sets   :List of 5
  .. .. .. .. .. ..$ cs                :List of 1
  .. .. .. .. .. .. ..$ L1: int [1:2] 910 1069
  .. .. .. .. .. ..$ purity            :'data.frame':	1 obs. of  3 variables:
  .. .. .. .. .. .. ..$ min.abs.corr   : num 1
  .. .. .. .. .. .. ..$ mean.abs.corr  : num 1
  .. .. .. .. .. .. ..$ median.abs.corr: num 1
  .. .. .. .. .. ..$ cs_index          : int 1
  .. .. .. .. .. ..$ coverage          : num 0.697
  .. .. .. .. .. ..$ requested_coverage: num 0.5
  .. .. .. .. ..$ cs_corr: logi NA
  .. .. ..$ alpha                 : num [1:8, 1:9970] 1.52e-07 6.35e-05 6.05e-05 5.93e-05 5.99e-05 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  .. .. ..$ lbf_variable          : num [1:8, 1:9970] -1.697 -0.402 -0.437 -0.451 -0.444 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  .. .. ..$ V                     : num [1:8] 0.019 0.00097 0.0011 0.00115 0.00112 ...
  .. .. ..$ niter                 : int 10
  .. .. ..$ max_L                 : int 8
  .. .. ..$ X_column_scale_factors: Named num [1:9970] 0.517 0.512 0.366 0.528 0.528 ...
  .. .. .. ..- attr(*, "names")= chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  .. .. ..$ mu                    : num [1:8, 1:9970] 0.00227 0.00643 0.00672 0.00683 0.00677 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  .. .. ..$ mu2                   : num [1:8, 1:9970] 0.000638 0.000432 0.000455 0.000464 0.00046 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  .. .. ..- attr(*, "class")= chr "susie"
  .. ..$ total_time_elapsed    : 'proc_time' Named num [1:5] 32.189 0.339 43.517 0 0
  .. .. ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
  .. ..$ region_info           :List of 3
  .. .. ..$ region_coord:'data.frame':	1 obs. of  3 variables:
  .. .. .. ..$ chrom: chr "11"
  .. .. .. ..$ start: int 85957174
  .. .. .. ..$ end  : int 86069881
  .. .. ..$ grange      :'data.frame':	1 obs. of  3 variables:
  .. .. .. ..$ chrom: chr "11"
  .. .. .. ..$ start: int 84957175
  .. .. .. ..$ end  : int 87360000
  .. .. ..$ region_name : chr [1:2] "ENSG00000073921" "ENSG00000073921"
  .. ..$ preset_variants_result:List of 9
  .. .. ..$ susie_fitted        :List of 22
  .. .. .. ..$ alpha                 : num [1:8, 1:6454] 0.00015 0.000148 0.000148 0.000148 0.000149 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ mu                    : num [1:8, 1:6454] 0.000134 0.000169 0.000177 0.000169 0.000152 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ mu2                   : num [1:8, 1:6454] 5.39e-05 6.83e-05 7.19e-05 6.85e-05 6.13e-05 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ Xr                    : num [1:150] 0.000696 -0.000919 -0.002423 -0.00093 0.000305 ...
  .. .. .. ..$ KL                    : num [1:8] 0.0345 0.0445 0.047 0.0446 0.0395 ...
  .. .. .. ..$ lbf                   : num [1:8] 0.000194 0.000317 0.000353 0.000319 0.000253 ...
  .. .. .. ..$ lbf_variable          : num [1:8, 1:6454] -0.0334 -0.0427 -0.045 -0.0428 -0.0381 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ sigma2                : num 0.124
  .. .. .. ..$ V                     : num [1:8] 5.76e-05 7.44e-05 7.87e-05 7.46e-05 6.61e-05 ...
  .. .. .. ..$ pi                    : num [1:6454] 0.000155 0.000155 0.000155 0.000155 0.000155 ...
  .. .. .. ..$ null_index            : num 0
  .. .. .. ..$ correct_zR_discrepancy:List of 5
  .. .. .. .. ..$ to_correct          : logi FALSE
  .. .. .. .. ..$ outlier_index       : logi(0) 
  .. .. .. .. ..$ is_init             : logi TRUE
  .. .. .. .. ..$ outlier_stabilize   : num 5
  .. .. .. .. ..$ outlier_stable_count: num 0
  .. .. .. ..$ force_iterate         : logi FALSE
  .. .. .. ..$ converged             : logi TRUE
  .. .. .. ..$ elbo                  : num [1:3] -56.5 -56.5 -56.5
  .. .. .. ..$ niter                 : int 3
  .. .. .. ..$ intercept             : num -0.00273
  .. .. .. ..$ fitted                : Named num [1:150] 0.000696 -0.000919 -0.002423 -0.00093 0.000305 ...
  .. .. .. .. ..- attr(*, "names")= chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. ..$ sets                  :List of 3
  .. .. .. .. ..$ cs                : NULL
  .. .. .. .. ..$ coverage          : NULL
  .. .. .. .. ..$ requested_coverage: num 0.95
  .. .. .. ..$ pip                   : Named num [1:6454] 0.0012 0.00124 0.0012 0.0012 0.0012 ...
  .. .. .. .. ..- attr(*, "names")= chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ X_column_scale_factors: Named num [1:6454] 0.664 0.438 0.677 0.702 0.677 ...
  .. .. .. .. ..- attr(*, "names")= chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ time_elapsed          : 'proc_time' Named num [1:5] 0.851 0.008 0.861 0 0
  .. .. .. .. ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
  .. .. .. ..- attr(*, "class")= chr "susie"
  .. .. ..$ variant_names       : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. ..$ analysis_script     : chr "options(warn=1)\nlibrary(pecotmr)\nphenotype_files = c(\"/restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_prot"| __truncated__
  .. .. ..$ other_quantities    :List of 1
  .. .. .. ..$ dropped_samples:List of 3
  .. .. .. .. ..$ X    : NULL
  .. .. .. .. ..$ y    : NULL
  .. .. .. .. ..$ covar: NULL
  .. .. ..$ sumstats            :List of 5
  .. .. .. ..$ betahat  : Named num [1:6454] 0.0029 0.06935 0.01237 -0.00911 0.01237 ...
  .. .. .. .. ..- attr(*, "names")= chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ sebetahat: Named num [1:6454] 0.0438 0.0661 0.0429 0.0414 0.0429 ...
  .. .. .. .. ..- attr(*, "names")= chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ z_scores : Named num [1:6454] 0.0662 1.0485 0.288 -0.22 0.288 ...
  .. .. .. .. ..- attr(*, "names")= chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ p_values : Named num [1:6454] 0.947 0.294 0.773 0.826 0.773 ...
  .. .. .. .. ..- attr(*, "names")= chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ q_values : Named num [1:6454] 0.998 0.892 0.991 0.998 0.991 ...
  .. .. .. .. ..- attr(*, "names")= chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. ..$ sample_names        : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. ..$ susie_result_trimmed:List of 12
  .. .. .. ..$ pip                   : num [1:6454] 0.0012 0.00124 0.0012 0.0012 0.0012 ...
  .. .. .. ..$ sets                  :List of 3
  .. .. .. .. ..$ cs                : NULL
  .. .. .. .. ..$ coverage          : NULL
  .. .. .. .. ..$ requested_coverage: num 0.95
  .. .. .. ..$ cs_corr               : logi NA
  .. .. .. ..$ sets_secondary        :List of 2
  .. .. .. .. ..$ coverage_0.7:List of 2
  .. .. .. .. .. ..$ sets   :List of 3
  .. .. .. .. .. .. ..$ cs                : NULL
  .. .. .. .. .. .. ..$ coverage          : NULL
  .. .. .. .. .. .. ..$ requested_coverage: num 0.7
  .. .. .. .. .. ..$ cs_corr: logi NA
  .. .. .. .. ..$ coverage_0.5:List of 2
  .. .. .. .. .. ..$ sets   :List of 3
  .. .. .. .. .. .. ..$ cs                : NULL
  .. .. .. .. .. .. ..$ coverage          : NULL
  .. .. .. .. .. .. ..$ requested_coverage: num 0.5
  .. .. .. .. .. ..$ cs_corr: logi NA
  .. .. .. ..$ alpha                 : num [1:8, 1:6454] 0.00015 0.000148 0.000148 0.000148 0.000149 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ lbf_variable          : num [1:8, 1:6454] -0.0334 -0.0427 -0.045 -0.0428 -0.0381 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ V                     : num [1:8] 5.76e-05 7.44e-05 7.87e-05 7.46e-05 6.61e-05 ...
  .. .. .. ..$ niter                 : int 3
  .. .. .. ..$ max_L                 : int 8
  .. .. .. ..$ X_column_scale_factors: Named num [1:6454] 0.664 0.438 0.677 0.702 0.677 ...
  .. .. .. .. ..- attr(*, "names")= chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ mu                    : num [1:8, 1:6454] 0.000134 0.000169 0.000177 0.000169 0.000152 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..$ mu2                   : num [1:8, 1:6454] 5.39e-05 6.83e-05 7.19e-05 6.85e-05 6.13e-05 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. ..- attr(*, "class")= chr "susie"
  .. .. ..$ total_time_elapsed  : 'proc_time' Named num [1:5] 10.991 0.012 11.046 0 0
  .. .. .. ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
  .. .. ..$ region_info         :List of 3
  .. .. .. ..$ region_coord:'data.frame':	1 obs. of  3 variables:
  .. .. .. .. ..$ chrom: chr "11"
  .. .. .. .. ..$ start: int 85957174
  .. .. .. .. ..$ end  : int 86069881
  .. .. .. ..$ grange      :'data.frame':	1 obs. of  3 variables:
  .. .. .. .. ..$ chrom: chr "11"
  .. .. .. .. ..$ start: int 84957175
  .. .. .. .. ..$ end  : int 87360000
  .. .. .. ..$ region_name : chr [1:2] "ENSG00000073921" "ENSG00000073921"

*.univariate_data.rds

> str(readRDS("output/mnm_regression/susie_twas/data/test_susie_twas.chr11_ENSG00000073921.univariate_data.rds"))
List of 1
 $ ENSG00000073921:List of 10
  ..$ residual_Y       :List of 1
  .. ..$ test_pheno: num [1:150, 1] 0.1241 0.2981 -0.6054 0.0435 0.0388 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. ..$ : chr "ENSG00000073921"
  ..$ residual_X       :List of 1
  .. ..$ : num [1:150, 1:9970] -0.2386 -0.8293 0.4173 0.7408 -0.0231 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. ..$ : chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  ..$ residual_Y_scalar: num 1
  ..$ residual_X_scalar: num 1
  ..$ dropped_sample   :List of 3
  .. ..$ X    :List of 1
  .. .. ..$ : chr(0) 
  .. ..$ Y    :List of 1
  .. .. ..$ : chr "strand"
  .. ..$ covar:List of 1
  .. .. ..$ : chr(0) 
  ..$ maf              :List of 1
  .. ..$ : Named num [1:9970] 0.33 0.327 0.103 0.343 0.343 ...
  .. .. ..- attr(*, "names")= chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  ..$ X                : num [1:150, 1:9970] 0 0 1 1 1 0 0 1 1 1 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. ..$ : chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...
  ..$ chrom            : chr "chr11"
  ..$ grange           : chr [1:2] "85957174" "86069881"
  ..$ X_variance       :List of 1
  .. ..$ : Named num [1:9970] 0.267 0.262 0.134 0.279 0.279 ...
  .. .. ..- attr(*, "names")= chr [1:9970] "chr11:84957209:G:C" "chr11:84957879:A:G" "chr11:84958236:C:A" "chr11:84958954:T:C" ...

*.univariate_twas_weights.rds

> str(readRDS("/restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_protocol/output/mnm_regression/susie_twas/twas_weights/test_susie_twas.chr11_ENSG00000073921.univariate_twas_weights.rds"))
List of 1
 $ ENSG00000073921:List of 1
  ..$ test_pheno_ENSG00000073921:List of 7
  .. ..$ susie_weights_intermediate:List of 4
  .. .. ..$ mu                    : num [1:8, 1:6454] 0.000134 0.000169 0.000177 0.000169 0.000152 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. ..$ lbf_variable          : num [1:8, 1:6454] -0.0334 -0.0427 -0.045 -0.0428 -0.0381 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. ..$ X_column_scale_factors: Named num [1:6454] 0.664 0.438 0.677 0.702 0.677 ...
  .. .. .. ..- attr(*, "names")= chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. ..$ pip                   : Named num [1:6454] 0.0012 0.00124 0.0012 0.0012 0.0012 ...
  .. .. .. ..- attr(*, "names")= chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. ..$ twas_weights              :List of 6
  .. .. ..$ enet_weights   : num [1:6454, 1] 0 0 0 0 0 0 0 0 0 0 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. ..$ lasso_weights  : num [1:6454, 1] 0 0 0 0 0 0 0 0 0 0 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. ..$ bayes_r_weights: num [1:6454, 1] 4.31e-05 4.73e-04 2.44e-04 -8.48e-05 2.30e-04 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. ..$ bayes_l_weights: num [1:6454, 1] 2.04e-05 3.66e-04 1.03e-04 -1.54e-04 3.47e-04 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. ..$ mrash_weights  : num [1:6454, 1] 6.01e-05 2.05e-04 7.02e-05 -4.71e-05 7.02e-05 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. ..$ susie_weights  : num [1:6454, 1] 2.55e-07 5.89e-06 1.00e-06 -7.53e-07 1.00e-06 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. .. .. .. ..$ : chr "ENSG00000073921"
  .. ..$ twas_predictions          :List of 6
  .. .. ..$ enet_predicted   : num [1:150, 1] 0 0 0 0 0 0 0 0 0 0 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. ..$ lasso_predicted  : num [1:150, 1] 0 0 0 0 0 0 0 0 0 0 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. ..$ bayes_r_predicted: num [1:150, 1] 0.1971 0.1333 -0.0176 0.1234 0.2194 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. ..$ bayes_l_predicted: num [1:150, 1] 0.20151 0.13565 -0.00533 0.13823 0.23318 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. ..$ mrash_predicted  : num [1:150, 1] 0.08183 0.05465 -0.00379 0.05346 0.09283 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. ..$ susie_predicted  : num [1:150, 1] 0.003423 0.001808 0.000304 0.001797 0.003032 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. .. ..$ : chr "ENSG00000073921"
  .. ..$ twas_cv_result            :List of 4
  .. .. ..$ sample_partition:'data.frame':	150 obs. of  2 variables:
  .. .. .. ..$ Sample: chr [1:150] "sample108" "sample52" "sample103" "sample29" ...
  .. .. .. ..$ Fold  : int [1:150] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ prediction      :List of 4
  .. .. .. ..$ bayes_r_predicted: num [1:150, 1] 0.3606 -0.0263 0.0687 0.0989 0.1158 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. .. ..$ bayes_l_predicted: num [1:150, 1] 0.2274 -0.0389 0.0379 0.0513 0.0706 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. .. ..$ mrash_predicted  : num [1:150, 1] 0.0692 -0.0021 0.0183 0.0231 0.0353 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. .. ..$ susie_predicted  : num [1:150, 1] 0.05794 -0.00105 -0.01131 0 0 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:150] "sample0" "sample1" "sample10" "sample100" ...
  .. .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. ..$ performance     :List of 4
  .. .. .. ..$ bayes_r_performance: num [1, 1:6] 0.022483 0.000505 -0.006248 0.784784 0.40514 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. .. .. .. ..$ : chr [1:6] "corr" "rsq" "adj_rsq" "pval" ...
  .. .. .. ..$ bayes_l_performance: num [1, 1:6] -8.04e-03 6.47e-05 -6.69e-03 9.22e-01 3.91e-01 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. .. .. .. ..$ : chr [1:6] "corr" "rsq" "adj_rsq" "pval" ...
  .. .. .. ..$ mrash_performance  : num [1, 1:6] 0.05206 0.00271 -0.00403 0.52697 0.35394 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. .. .. .. ..$ : chr [1:6] "corr" "rsq" "adj_rsq" "pval" ...
  .. .. .. ..$ susie_performance  : num [1, 1:6] -0.03878 0.0015 -0.00524 0.6375 0.35463 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr "ENSG00000073921"
  .. .. .. .. .. ..$ : chr [1:6] "corr" "rsq" "adj_rsq" "pval" ...
  .. .. ..$ time_elapsed    : 'proc_time' Named num [1:5] 188.023 0.581 189.175 0 0
  .. .. .. ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
  .. ..$ total_time_elapsed        : 'proc_time' Named num [1:5] 242.761 0.972 249.471 0.013 0.024
  .. .. ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
  .. ..$ variant_names             : chr [1:6454] "chr11:84957209:G:C" "chr11:84958236:C:A" "chr11:84958954:T:C" "chr11:84959602:T:C" ...
  .. ..$ region_info               :List of 3
  .. .. ..$ region_coord:'data.frame':	1 obs. of  3 variables:
  .. .. .. ..$ chrom: chr "11"
  .. .. .. ..$ start: int 85957174
  .. .. .. ..$ end  : int 86069881
  .. .. ..$ grange      :'data.frame':	1 obs. of  3 variables:
  .. .. .. ..$ chrom: chr "11"
  .. .. .. ..$ start: int 84957175
  .. .. .. ..$ end  : int 87360000
  .. .. ..$ region_name : chr [1:2] "ENSG00000073921" "ENSG00000073921"

Minimal Working Example Steps#

i. Run the Fine-Mapping and TWAS with SuSiE #

sos run pipeline/mnm_regression.ipynb susie_twas \
    --name test_susie_twas \
    --genoFile output/genotype_by_chrom/wgs.merged.plink_qc.1.bed \
    --phenoFile output/phenotype/phenotype_by_chrom_for_cis/bulk_rnaseq.phenotype_by_chrom_files.region_list.txt \
    --covFile output/covariate/bulk_rnaseq_tmp_matrix.low_expression_filtered.outlier_removed.tmm.expression.covariates.wgs.merged.plink_qc.plink_qc.prune.pca.Marchenko_PC.gz \
    --customized-association-windows reference_data/TAD/TADB_enhanced_cis.bed \
    --phenotype-names test_pheno \
    --max-cv-variants 5000 --ld_reference_meta_file data/ld_meta_file_with_bim.tsv \
    --region-name ENSG00000049246 ENSG00000054116 ENSG00000116678 \
    --save-data \
    --cwd output/mnm_regression/susie_twas

INFO: Running get_analysis_regions: 
Loading customized association analysis window from reference_data/TAD/TADB_enhanced_cis.bed
INFO: get_analysis_regions is completed.
INFO: get_analysis_regions output:   regional_data
INFO: Running susie_twas: 
INFO: susie_twas (index=0) is completed.
INFO: susie_twas (index=1) is completed.
INFO: susie_twas (index=2) is completed.
INFO: susie_twas output:   /restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_protocol/output/mnm_regression/susie_twas/fine_mapping/test_susie_twas.chr1_ENSG00000049246.univariate_bvsr.rds /restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_protocol/output/mnm_regression/susie_twas/twas_weights/test_susie_twas.chr1_ENSG00000049246.univariate_twas_weights.rds... (6 items in 3 groups)
INFO: Workflow susie_twas (ID=w487c342397d09ccb) is executed successfully with 2 completed steps and 4 completed substeps.

Anticipated Results#

Univariate finemapping will produce a file containing results for the top hits and a file containing twas weights produced by susie.

ROSMAP_mega_eQTL.chr11_ENSG00000073921.univariate_bvrs.rds:

For each gene of interest and phenotype, this file contains:
1. susie_fitted
2. variant_names
3. analysis_script
4. other quantities - information on dropped samples, if any
5. summary statistics (beta and se) for each variant
6. sample names
7. summary statistics for top loci
8. susie results trimmed - includes pip, sets, cs_corr, sets_secondary, alpha, lbf_variable, V, niter, X_column_scale_factors, mu, mu2
9. total time elapsed
10. region info
11. preset_variants_results

ROSMAP_mega_eQTL.chr11_ENSG00000073921.univariate_data.rds (from the –save-data argument):

For each gene of interest, contains residuals for each sample and phenotype
see pecotmr code for description at load_regional_association_data function

ROSMAP_mega_eQTL.chr11_ENSG00000073921.univariate_twas_weights.rds:

For each gene of interest and phenotype, this file contains:
1. twas_weights - weights for enet, lasso, bayes r, mrash and susie methods
2. twas_predictions - twas predictions for enet, lasso, bayes r, mrash and susie methods
3. twas_cv_result
4. total_time_elapsed
5. variant_names
6. region_info