Integrative Analysis with High-Dimensional Regression#

This notebook shows the various fine-mapping, prediction, multivariate analysis and colocalization methods available in our pipeline.

Miniprotocol Timing#

This represents the total duration for all miniprotocol phases. While module-specific timings are provided separately on their respective pages, they are also included in this overall estimate.

Timing < X minutes

Overview#

Each of these parts are independent of one another.

  1. mnm_regression.ipynb susie_twas: Univariate Fine-Mapping and TWAS with SuSiE

  2. mnm_regression.ipynb mnm_genes: Multivariate Fine-Mapping for multiple genes

  3. mnm_regression.ipynb fsusie: Univariate Fine-Mapping of Functional (Epigenomic) Data with fSuSiE

  4. mnm_regression.ipynb mnm: Multivariate Fine-Mapping with mvSuSiE and mr.mash

  5. rss_analysis.ipynb univariate_rss: Regression with Summary Statistics (RSS) Fine-Mapping and TWAS with SuSiE

Steps#

i. Univariate Fine-Mapping and TWAS with SuSiE#

sos run pipeline/mnm_regression.ipynb susie_twas \
    --name test_susie_twas \
    --genoFile output/genotype_by_chrom/wgs.merged.plink_qc.1.bed \
    --phenoFile output/phenotype/phenotype_by_chrom_for_cis/bulk_rnaseq.phenotype_by_chrom_files.region_list.txt \
    --covFile output/covariate/bulk_rnaseq_tmp_matrix.low_expression_filtered.outlier_removed.tmm.expression.covariates.wgs.merged.plink_qc.plink_qc.prune.pca.Marchenko_PC.gz \
    --customized-association-windows reference_data/TAD/TADB_enhanced_cis.bed \
    --phenotype-names test_pheno \
    --max-cv-variants 5000 --ld_reference_meta_file data/ld_meta_file_with_bim.tsv \
    --region-name ENSG00000049246 ENSG00000054116 ENSG00000116678 \
    --save-data \
    --cwd output/mnm_regression/susie_twas

ii. Multivariate Fine-Mapping for multiple genes#

sos run pipeline/mnm_regression.ipynb mnm_genes \
    --name ROSMAP_mega_eQTL --cwd output/mnm_regression/mnm_genes \
    --genoFile output/genotype_by_chrom/wgs.merged.plink_qc.genotype_by_chrom_files.txt \
    --phenoFile output/phenotype/phenotype_by_chrom_for_cis/bulk_rnaseq.phenotype_by_chrom_files.region_list.txt \
    --covFile output/covariate/bulk_rnaseq_tpm_matrix.low_expression_filtered.outlier_removed.tmm.expression.covariates.wgs.merged.plink_qc.plink_qc.prune.pca.Marchenko_PC.gz \
    --customized-association-windows reference_data/TAD/TADB_enhanced_cis.bed \
    --region-name ENSG00000049246 ENSG00000054116 ENSG00000116678 ENSG00000073921 \
    --phenotype-names test_pheno \
    --max_cv_variants 5000 --skip-analysis-pip-cutoff 0.025  \
	--ld_reference_meta_file data/ld_meta_file_with_bim.tsv \
    --pheno-id-map-file data/mnm_regression/pheno_id_map_file.tsv \
    --fine-mapping-meta data/mnm_regression/fine_mapping_meta.tsv

iii. Univariate Fine-Mapping of Functional (Epigenomic) Data with fSuSiE#

sos run pipeline/mnm_regression.ipynb fsusie \
    --cwd output/fsusie/ \
    --name test_fsusie \
    --genoFile output/genotype_by_chrom/wgs.merged.plink_qc.genotype_by_chrom_files.txt \
    --phenoFile output/phenotype/phenotype_by_chrom_for_cis/bulk_rnaseq.phenotype_by_chrom_files.region_list.txt \
    --covFile output/covariate/bulk_rnaseq_tpm_matrix.low_expression_filtered.outlier_removed.tmm.expression.covariates.wgs.merged.plink_qc.plink_qc.prune.pca.Marchenko_PC.gz \
    --numThreads 8 \
    --customized-association-windows reference_data/TAD/TADB_enhanced_cis.bed \
    --save-data \
    --region-name ENSG00000049246 ENSG00000054116 ENSG00000116678 ENSG00000073921 ENSG00000186891

iv. Multivariate Fine-Mapping with mvSuSiE and mr.mash#

sos run pipeline/mnm_regression.ipynb mnm \
    --name test_mnm --cwd output/mnm \
    --genoFile output/genotype_by_chrom/wgs.merged.plink_qc.genotype_by_chrom_files.txt \
    --phenoFile output/phenotype/phenotype_by_chrom_for_cis/bulk_rnaseq.phenotype_by_chrom_files.region_list.txt \
    --covFile output/covariate/bulk_rnaseq_tpm_matrix.low_expression_filtered.outlier_removed.tmm.expression.covariates.wgs.merged.plink_qc.plink_qc.prune.pca.Marchenko_PC.gz \
    --customized-association-windows reference_data/TAD/TADB_enhanced_cis.bed \
    --region-name ENSG00000073921 --save-data --no-skip-twas-weights \
    --phenotype-names test_pheno \
    --mixture_prior output/multivariate_mixture/MWE_ed_bovy.EE.prior.rds \
    --max_cv_variants 5000 \
	--ld_reference_meta_file data/ld_meta_file.tsv 

v. Regression with Summary Statistics (RSS) Fine-Mapping and TWAS with SuSiE#

sos run pipeline/rss_analysis.ipynb univariate_rss \
    --ld-meta-data data/ld_meta_file_with_bim.tsv \
    --gwas-meta-data data/mnm_regression/gwas_meta_data.txt \
    --qc_method "rss_qc" --impute \
    --finemapping_method "susie_rss" \
    --cwd output/rss_analysis \
    --skip_analysis_pip_cutoff 0 \
    --skip_regions 6:25000000-35000000 \
    --region_name 22:49355984-50799822

Anticipated Results#

Univariate finemapping will produce a file containing results for the top hits and a file containing twas weights produced by susie. Multigene finemapping with mvSuSiE will produce a file for each gene and region containing results for the top hits and a file containing twas weights produced by susie. Univariate finemapping for functional data with fSuSiE will produce a file containing results for the top hits and a file containing residuals from SuSiE. Multivariate finemapping will produce a file containing results for the top hits for each gene and a file containing twas weights produced by susie. Summary statistics fine-mapping produces a results file for each region and gwas of interest.