Multivariate Fine-Mapping for multiple genes#
Description#
Multi gene fine-mapping and TWAS may also be conducted with our pipeline. This considers multiple genes jointly within specific TAD windows.
This step is similar to the multivariate fine-mapping with two main differences. 1) TAD windows with multiple genes need to be defined. The --pheno_id_map_file parameter is used for this. 2) To speed things up, the genes are filtered out if they don’t have a univariate fine mapped region. Genes may also be filtered out if they do have a univariate fine-mapped signal, but the signal is nowhere close to that of other genes. The --skip-analysis-pip-cutoff parameter is used for this.
Input#
--genoFile: path to a plink bed file containin genotypes. Include the .bed
--phenoFile: a tab delimited file containing chr, start, end, ID and path for the regions. For example:
#chr start end ID path
chr12 389319 389320 ENSG00000073614 $PATH/snuc_pseudo_bulk.Mic.mega.normalized.log2cpm.bed.gz
chr12 752578 752579 ENSG00000060237 $PATH/snuc_pseudo_bulk.Mic.mega.normalized.log2cpm.bed.gz
--covFile: path to a gzipped file containing covariates in the rows, and sample ids in the columns.
--customized-association-windows: a tab delimited file containing chr, start, end, and ID regions. For example:
#chr start end TAD_id
chr1 0 10985501 chr1_0_10985501
chr1 5101787 11630758 chr1_5101787_11630758
--phenotype-names: the names of the phenotypes if multiple are included. There should be one for each phenotype file you include.
--max-cv-variants: maximum number of variants for cross-validation.
--ld_reference_meta_file: path to file containing chrom, start, end and path for linkage disequilibrium region information. For example:
#chrom start end path
chr1 101384274 104443097 chr1/chr1_101384274_104443097.cor.xz,chr1/chr1_101384274_104443097.cor.xz.bim
chr1 104443097 106225286 chr1/chr1_104443097_106225286.cor.xz,chr1/chr1_104443097_106225286.cor.xz.bim
--independent_variant_list: a gzipped file containing variant information. These should be independent from one another in terms of linkage disequilibrium.
For example:
chrom pos alt ref variant_id
chr1 16206 T A chr1:16206:T:A
chr1 16433 C G chr1:16433:C:G
--fine_mapping_meta: A file containg a list of gene and region information and other conditions.
For example:
#chr start end region_id TSS original_data combined_data combined_data_sumstats conditions conditions_top_loci
chr1 0 6480000 ENSG00000008128 1724356 KNIGHT_pQTL.ENSG00000008128.univariate_susie_twas_weights.rds,MiGA_eQTL.ENSG00000008128.univariate_susie_twas_weights.rds,MSBB_eQTL.ENSG00000008128.univariate_susie_twas_weights.rds,ROSMAP_Bennett_Klein_pQTL.ENSG00000008128.univariate_susie_twas_weights.rds,ROSMAP_DeJager_eQTL.ENSG00000008128.univariate_susie_twas_weights.rds,ROSMAP_Kellis_eQTL.ENSG00000008128.univariate_susie_twas_weights.rds,ROSMAP_mega_eQTL.ENSG00000008128.univariate_susie_twas_weights.rds,STARNET_eQTL.ENSG00000008128.univariate_susie_twas_weights.rds $PATH/Fungen_xQTL.ENSG00000008128.cis_results_db.export.rds $PATH/Fungen_xQTL.ENSG00000008128.cis_results_db.export_sumstats.rds Knight_eQTL_brain,MiGA_GFM_eQTL,MiGA_GTS_eQTL,MiGA_SVZ_eQTL,MiGA_THA_eQTL,BM_10_MSBB_eQTL,BM_22_MSBB_eQTL,BM_36_MSBB_eQTL,BM_44_MSBB_eQTL,monocyte_ROSMAP_eQTL,Mic_DeJager_eQTL,Ast_DeJager_eQTL,Oli_DeJager_eQTL,Exc_DeJager_eQTL,Inh_DeJager_eQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL,AC_DeJager_eQTL,Mic_Kellis_eQTL,Ast_Kellis_eQTL,Oli_Kellis_eQTL,OPC_Kellis_eQTL,Exc_Kellis_eQTL,Inh_Kellis_eQTL,Ast_mega_eQTL,Exc_mega_eQTL,Inh_mega_eQTL,Oli_mega_eQTL,STARNET_eQTL_Mac Knight_eQTL_brain,MiGA_GFM_eQTL,MiGA_GTS_eQTL,MiGA_SVZ_eQTL,MiGA_THA_eQTL,BM_10_MSBB_eQTL,BM_22_MSBB_eQTL,BM_36_MSBB_eQTL,BM_44_MSBB_eQTL,monocyte_ROSMAP_eQTL,Mic_DeJager_eQTL,Ast_DeJager_eQTL,Oli_DeJager_eQTL,Exc_DeJager_eQTL,Inh_DeJager_eQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL,AC_DeJager_eQTL,Mic_Kellis_eQTL,Ast_Kellis_eQTL,Oli_Kellis_eQTL,OPC_Kellis_eQTL,Exc_Kellis_eQTL,Inh_Kellis_eQTL,Ast_mega_eQTL,Exc_mega_eQTL,Inh_mega_eQTL,Oli_mega_eQTL,STARNET_eQTL_Mac
--phenoIDFile: A bed file containing a list of genes and their LD region.
For example:
TAD_id ID
chr19_0_13957223 ENSG00000172270
chr19_0_13957223 ENSG00000099864
chr19_0_13957223 ENSG00000011304
--skip-analysis-pip-cutoff: A number of the pip cutoff.
--coverage
--maf
--pheno_id_map_file: A file containing IDs and genes.
For example:
ID gene
chr20:50940933:50941105:clu_44490_-:ENSG00000000419 ENSG00000000419
chr20:50940933:50941129:clu_44490_-:ENSG00000000419 ENSG00000000419
chr20:50936262:50942031:clu_44490_-:ENSG00000000419 ENSG00000000419
--prior-canonical-matrices
--save-data: whether to save intermediate data or not
--twas-cv-folds: Perform K folds valiation CV for TWAS. Set this to zero to skip
--trans-analysis: Include this if doing trans-analysis (not using phenotypic coordinate information)
--region-name: if you only wish to analyze one region, then include the ID of a region found in the customized-association-windows file
--cwd: output file path
--genoFile: a PLINK.bedgenotype file (include the.bedextension); the.bim/.famfiles must sit alongside it.--phenoFile: a tab-delimited region file with columnschr,start,end,ID, andpath, one row per region, wherepathpoints to that region’s molecular-phenotype data.--region-name: the region/gene group to fine-map jointly (e.g.ROSMAP_Ast_mega_eQTL).
{region_name}.{chr}_{region_id}.multigene_bvsr.rds— one RDS per region, the fitted multivariate (mvSuSiE / mr.mash) Bayesian variable-selection model jointly fine-mapping all genes in that region.
Inspecting the object shows the per-region fit (posterior effect estimates, PVE, credible sets and convergence trace):
> str(readRDS("ROSMAP_Ast_mega_eQTL.chr11_chr11_77324757_82556425.multigene_bvsr.rds"))
List of 1
$ chr11_77324757_82556425:List of 12
..$ mrmash_fitted :List of 14
.. ..$ mu1 : num [1:14830, 1:5] 3.02e-06 3.81e-06 -1.59e-05 -1.59e-05 -1.36e-05 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:14830] "chr11:77325990:C:T" "chr11:77326116:G:A" "chr11:77326354:A:G" "chr11:77326475:G:C" ...
.. .. .. ..$ : chr [1:5] "ENSG00000074201" "ENSG00000048649" "ENSG00000159063" "ENSG00000188997" ...
.. ..$ S1 : num [1:5, 1:5, 1:14830] 6.07e-07 1.55e-09 1.78e-09 1.62e-09 1.44e-09 ...
.. .. ..- attr(*, "dimnames")=List of 3
.. .. .. ..$ : chr [1:5] "ENSG00000074201" "ENSG00000048649" "ENSG00000159063" "ENSG00000188997" ...
.. .. .. ..$ : chr [1:5] "ENSG00000074201" "ENSG00000048649" "ENSG00000159063" "ENSG00000188997" ...
.. .. .. ..$ : chr [1:14830] "chr11:77325990:C:T" "chr11:77326116:G:A" "chr11:77326354:A:G" "chr11:77326475:G:C" ...
.. ..$ w1 : num [1:14830, 1:50] 0.999 0.998 0.997 0.997 0.999 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:14830] "chr11:77325990:C:T" "chr11:77326116:G:A" "chr11:77326354:A:G" "chr11:77326475:G:C" ...
.. .. .. ..$ : chr [1:50] "null" "singleton1_grid1" "singleton1_grid2" "singleton1_grid3" ...
.. ..$ V : num [1:5, 1:5] 0.40797 0.00349 0.016 -0.01287 0.00417 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:5] "ENSG00000074201" "ENSG00000048649" "ENSG00000159063" "ENSG00000188997" ...
.. .. .. ..$ : chr [1:5] "ENSG00000074201" "ENSG00000048649" "ENSG00000159063" "ENSG00000188997" ...
.. ..$ w0 : Named num [1:50] 9.98e-01 7.08e-05 1.12e-04 2.20e-04 4.38e-04 ...
.. .. ..- attr(*, "names")= chr [1:50] "null" "singleton1_grid1" "singleton1_grid2" "singleton1_grid3" ...
.. ..$ S0 : num [1:5, 1:5, 1:50] 1e-08 0e+00 0e+00 0e+00 0e+00 0e+00 1e-08 0e+00 0e+00 0e+00 ...
.. .. ..- attr(*, "dimnames")=List of 3
.. .. .. ..$ : chr [1:5] "ENSG00000074201" "ENSG00000048649" "ENSG00000159063" "ENSG00000188997" ...
.. .. .. ..$ : chr [1:5] "ENSG00000074201" "ENSG00000048649" "ENSG00000159063" "ENSG00000188997" ...
.. .. .. ..$ : chr [1:50] "null" "singleton1_grid1" "singleton1_grid2" "singleton1_grid3" ...
.. ..$ intercept : Named num [1:5] 0.0913 -0.0112 -0.3494 -0.1176 -0.1019
... (structure truncated; inspect the full object in R) ...
Step 1: Run the fine-mapping with mvSuSiE#
Run the multi-gene fine-mapping with mvSuSiE. See the mnm_regression documentation for method details.
# Step 1: build the QtlDataset RDS for the joint-gene region.
sos run pipeline/qtl_dataset.ipynb qtl_dataset_construct \
--cwd output/qtl_dataset \
--study ROSMAP_Ast_mega_eQTL \
--genotype-prefix data/mnm_genes/ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.11 \
--phenotype-manifest data/mnm_genes/snuc_pseudo_bulk.Ast.mega.pheno_manifest.tsv
# Step 2: per-region mvSuSiE joint fine-mapping. Use a region span that
# covers multiple genes (multi-trait joint mvSuSiE).
sos run pipeline/multivariate_fine_mapping.ipynb multivariate_fine_mapping \
--cwd output/mnm_genes \
--study ROSMAP_Ast_mega_eQTL \
--qtl-dataset output/qtl_dataset/ROSMAP_Ast_mega_eQTL.qtl_dataset.rds \
--regions chr11:65000000-67000000
For each gene and region, multivariate multigene finemapping will produce a file containing results for the top hits and a file containing twas weights produced by susie.
Output#
ROSMAP_Ast_DeJager_eQTL.chr11_77324757_86627922.multigene_bvrs.rds:
for each region name, includes:
mrmash_fitted
reweighted_mixture_prior
reweighted_mixture_prior_cv
mvsusie_fitted
variant_names
analysis_script
other_quantitites
analysis_script
top_loci
susie_result_trimmed
total_time_elapsed
region_info
ROSMAP_Ast_DeJager_eQTL.chr11_77324757_86627922.multigene_data.rds:(from the –save-data argument)
see pecotmr code for description
ROSMAP_Ast_DeJager_eQTL.chr11_77324757_86627922.multigene_twas_weights.rds:
for each region name and for each gene within that region, includes:
twas_weights - from mrmash and mvsusie
twas_predictions - from mrmash and mvsusie
variant_names
total_time_elapsed
region_info