Integrative Analysis with High-Dimensional Regression#
This notebook shows the various fine-mapping, prediction, multivariate analysis and colocalization methods available in our pipeline.
Miniprotocol Timing#
This represents the total duration for all miniprotocol phases. While module-specific timings are provided separately on their respective pages, they are also included in this overall estimate.
Timing < X minutes
Overview#
Each of these parts are independent of one another.
mnm_regression.ipynb susie_twas
: Univariate Fine-Mapping and TWAS with SuSiEmnm_regression.ipynb mnm_genes
: Multivariate Fine-Mapping for multiple genesmnm_regression.ipynb fsusie
: Univariate Fine-Mapping of Functional (Epigenomic) Data with fSuSiEmnm_regression.ipynb mnm
: Multivariate Fine-Mapping with mvSuSiE and mr.mashrss_analysis.ipynb univariate_rss
: Regression with Summary Statistics (RSS) Fine-Mapping and TWAS with SuSiE
Steps#
i. Univariate Fine-Mapping and TWAS with SuSiE#
sos run pipeline/mnm_regression.ipynb susie_twas \
--name test_susie_twas \
--genoFile output/genotype_by_chrom/wgs.merged.plink_qc.1.bed \
--phenoFile output/phenotype/phenotype_by_chrom_for_cis/bulk_rnaseq.phenotype_by_chrom_files.region_list.txt \
--covFile output/covariate/bulk_rnaseq_tmp_matrix.low_expression_filtered.outlier_removed.tmm.expression.covariates.wgs.merged.plink_qc.plink_qc.prune.pca.Marchenko_PC.gz \
--customized-association-windows reference_data/TAD/TADB_enhanced_cis.bed \
--phenotype-names test_pheno \
--max-cv-variants 5000 --ld_reference_meta_file data/ld_meta_file_with_bim.tsv \
--region-name ENSG00000049246 ENSG00000054116 ENSG00000116678 \
--save-data \
--cwd output/mnm_regression/susie_twas
ii. Multivariate Fine-Mapping for multiple genes#
sos run $PATH/protocol/pipeline/mnm_regression.ipynb mnm_genes \
--name ROSMAP_Ast_DeJager_eQTL \
--genoFile $PATH/ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.11.bed \
--phenoFile $PATH/snuc_pseudo_bulk.Ast.mega.normalized.log2cpm.region_list.txt \
--covFile $PATH/snuc_pseudo_bulk.Ast.mega.normalized.log2cpm.rosmap_cov.ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.snuc_pseudo_bulk_mega.related.plink_qc.extracted.pca.projected.Marchenko_PC.gz \
--customized-association-windows $PATH/windows/TADB_sliding_window.bed \
--phenotype-names Ast_DeJager_eQTL \
--max-cv-variants 5000 --ld_reference_meta_file $PATH/ldref/ld_meta_file.tsv \
--independent_variant_list $PATH/ld_pruned_variants.txt.gz \
--fine_mapping_meta $PATH/Fungen_xQTL.cis_results_db.new.tsv \
--phenoIDFile $PATH/phenoIDFile_cis_extended_region.bed \
--skip-analysis-pip-cutoff 0 \
--coverage 0.95 \
--maf 0.01 \
--pheno_id_map_file $PATH/pheno_id_map_file.txt \
--prior-canonical-matrices \
--save-data \
--twas-cv-folds 0 \
--trans-analysis \
--region-name chr11_77324757_86627922 \
--cwd $PATH/output/ -s force
iii. Univariate Fine-Mapping of Functional (Epigenomic) Data with fSuSiE#
sos run pipeline/mnm_regression.ipynb fsusie \
--cwd output/fsusie/ \
--name test_fsusie \
--genoFile output/genotype_by_chrom/wgs.merged.plink_qc.genotype_by_chrom_files.txt \
--phenoFile output/phenotype/phenotype_by_chrom_for_cis/bulk_rnaseq.phenotype_by_chrom_files.region_list.txt \
--covFile output/covariate/bulk_rnaseq_tmp_matrix.low_expression_filtered.outlier_removed.tmm.expression.covariates.wgs.merged.plink_qc.plink_qc.prune.pca.Marchenko_PC.gz \
--numThreads 8 \
--customized-association-windows reference_data/TAD/TADB_enhanced_cis.bed \
--save-data \
--region-name ENSG00000186891
iv. Multivariate Fine-Mapping with mvSuSiE and mr.mash#
sos run pipeline/mnm_regression.ipynb mnm \
--name test_mnm --cwd output/mnm \
--genoFile output/genotype_by_chrom/wgs.merged.plink_qc.genotype_by_chrom_files.txt \
--phenoFile output/phenotype/phenotype_by_chrom_for_cis/bulk_rnaseq.phenotype_by_chrom_files.region_list.txt \
--covFile output/covariate/bulk_rnaseq_tmp_matrix.low_expression_filtered.outlier_removed.tmm.expression.covariates.wgs.merged.plink_qc.plink_qc.prune.pca.Marchenko_PC.gz \
--customized-association-windows reference_data/TADB_enhanced_cis.coding.bed \
--region-name ENSG00000073921 --save_data --no-skip-twas-weights \
--phenotype-names test_pheno \
--mixture_prior output/multivariate_mixture/MWE_ed_bovy.EE.prior.rds \
--max_cv_variants 5000 \
--ld_reference_meta_file data/ld_meta_file.tsv
v. Regression with Summary Statistics (RSS) Fine-Mapping and TWAS with SuSiE#
sos run pipeline/rss_analysis.ipynb univariate_rss \
--ld-meta-data data/ld_meta_file_with_bim.tsv \
--gwas-meta-data data/mnm_regression/gwas_meta_data.txt \
--qc_method "rss_qc" --impute \
--finemapping_method "susie_rss" \
--cwd output/rss_analysis \
--skip_analysis_pip_cutoff 0 \
--skip_regions 6:25000000-35000000 \
--region_name 22:49355984-50799822
Anticipated Results#
Univariate finemapping will produce a file containing results for the top hits and a file containing twas weights produced by susie. Multigene finemapping with mvSuSiE will produce a file for each gene and region containing results for the top hits and a file containing twas weights produced by susie. Univariate finemapping for functional data with fSuSiE will produce a file containing results for the top hits and a file containing residuals from SuSiE. Multivariate finemapping will produce a file containing results for the top hits for each gene and a file containing twas weights produced by susie. Summary statistics fine-mapping produces a results file for each region and gwas of interest.