Hierarchical Multiple Testing#
Description [FIXME]#
This protocol implements a three-step procedure:
Local adjustment: p-values of all cis-variants adjusted within each gene
Global adjustment: minimum adjusted p-values from Step 1 further adjusted across all genes
Global informed identification of significant xQTL: xQTL with locally adjusted p-value below the threshold
It also reorganizes intermediate files produced in tensorQTL analysis to specified archive folder for book-keeping or deletion, to save space.
Input#
Output from TensorQTL in the form of
*.cis_qtl.regional.tsv.gz
filesA gene coordinates tsv file with
chr
,start
,end
andgene_id
columns
Output#
*significant_qtl.filtered_bonferroni_BH_adjusted.tsv.gz
*significant_qtl.original_bonferroni_BH_adjusted.tsv.gz
*significant_qtl.permutation_adjusted.tsv.gz
*significant_qtl.q_beta_adjusted_events_qvalue.tsv.gz
*significant_events.filtered_bonferroni_BH_adjusted.tsv.gz
*significant_events.original_bonferroni_BH_adjusted.tsv.gz
*significant_events.permutation_adjusted.tsv.gz
*summary.txt
*multiple_testing_consolidated.rds
Minimal Working Example Steps#
i. Genetic Effect#
With
permutation testing
recounted number of variants limited to MAF and cis-windows for Bonferroni correction
We recommend reassessing significance of Bonferroni correction if previous analysis use too large of window or contain many low frequency variants. Below we reassess with MAF > 0.01 and cis-window size 1000000 up and downstreams of TSS/TES.
When maf_cutoff
and cis_window
are not zero, the program will first compute the number of variants after filtering by these metric and write files with n_variants_suffix
in the same folder as QTL data, then use those numbers to create filtered list of variants for Bonferroni adjusted p-value.
sos run pipeline/qtl_association_postprocessing.ipynb default \
--cwd output/tensorqtl_cis \
--gene-coordinates data/look_up_gene_id.tsv \
--sub-dir . \
--tss_dist_col tss_distance \
--tes_dist_col tes_distance \
--pecotmr-path pecotmr \
--maf-cutoff 0.01 --cis-window 1000000 \
--regional-pattern "*.cis_qtl.regional.tsv.gz$" \
--output-dir output/heirarchical_multi_test/output \
--archive-dir output/heirarchical_multi_test/archive \
-s force
ii. Interaction Effect#
With EigenMT method to get us the effective number of tests, it is recommended setting maf_cutoff
and cis_window
both to zero to tell the program to not recompute counts for Bonferroni correction,
The above approach seems recommended by tensorqtl
package because it is their default option for interaction QTL analysis. We will also adopt it as a recommended practice.
To keep using variant count based Bonferroni correct, simply remove --regional-pattern
,
sos run ~/GIT/xqtl-protocol/code/association_scan/qtl_association_postprocessing.ipynb default \
--gene-coordinates look_up_gene_id.tsv \
--cwd ~/Downloads/snuc_DeJager_Ast_tensorQTL_MWE --sub-dir "interaction/msex" \
--maf-cutoff 0 --cis-window 0 \
--pecotmr-path /path/to/pecotmr \
--regional-pattern "*.cis_qtl_top_assoc.txt.gz$" \
--pvalue-pattern "pvalue_.*_interaction" \
--qvalue-pattern "qvalue_interaction" \
--output-dir ~/output --archive-dir ~/archive -s force \
--fdr-threshold 0.3 # set it large for illustration purpose
iii. Quantile QTL#
Here we summarize the significant QTL using both original variants as well as additionally considering MAF>0.05 and cis-window 1Mb around a gene, for Bonferroni adjusted p-value,
sos run ~/GIT/xqtl-protocol/code/association_scan/qtl_association_postprocessing.ipynb default \
--gene-coordinates look_up_gene_id.tsv \
--cwd ~/Downloads/QR_ROSMAP_Ast_mega_quantile_tensorQTL_MWE --sub-dir "." \
--maf-cutoff 0.05 --cis-window 1000000 \
--pecotmr-path /path/to/pecotmr \
--af-col maf --tss-dist-col feature_tss --tes-dist-col feature_tes \
--qvalue-pattern "^qvalue$" \
--output-dir ~/output --archive-dir ~/archive -s force
Command interface#
sos run qtl_association_postprocessing.ipynb -h
usage: sos run /restricted/projectnb/xqtl/xqtl_protocol/xqtl-protocol/code/association_scan/qtl_association_postprocessing.ipynb
[workflow_name | -t targets] [options] [workflow_options]
workflow_name: Single or combined workflows defined in this script
targets: One or more targets to generate
options: Single-hyphen sos parameters (see "sos run -h" for details)
workflow_options: Double-hyphen workflow-specific parameters
Workflows:
default
Global Workflow Options:
--cwd . (as path)
--pecotmr-path VAL (as path, required)
path to pecotmr repo
--gene-coordinates VAL (as path, required)
--output-dir VAL (as path, required)
--archive-dir VAL (as path, required)
--sub-dir VAL (as path, required)
--maf-cutoff 0.01 (as float)
--cis-window 1000000 (as int)
--tss-dist-col 'start_distance'
--tes-dist-col 'end_distance'
--af-col af
--molecular-id-col 'molecular_trait_object_id'
--pvalue-cutoff 0.05 (as float)
This is for selecting the subset of data to process on protential signals assuming we drop those above this threshold This might lead to underestimates in
qvalue method since qvalue < 0.05 may contain pvalue > 0.05
--fdr-threshold 0.05 (as float)
This is used for both event and variant level significance filter
--pvalue-pattern '^pvalue$'
eg "pvalue" for pvalue, "pvalue_.*_interaction" for interaction
--qvalue-pattern '^qvalue$'
eg "qvalue" for pvalue, "qvalue_interaction" for interaction
--regional-pattern NULL
eg "*.cis_qtl.regional.tsv.gz$" for genetic effect via our pipeline TensorQTL.ipynb "*.cis_qtl_top_assoc.txt.gz$" for interaction genetic effect
--qtl-pattern '*.cis_qtl.pairs.tsv.gz$'
--n-variants-suffix 'cis_n_variants_stats.tsv.gz'
Sections
default:
Workflow implementation#
[global]
parameter: cwd = path(".")
#path to pecotmr repo
parameter: pecotmr_path = path
parameter: gene_coordinates = path
parameter: output_dir = path
parameter: archive_dir = path
parameter: sub_dir = path
parameter: maf_cutoff = 0.01
parameter: cis_window = 1000000
parameter: tss_dist_col = "start_distance"
parameter: tes_dist_col = "end_distance"
parameter: af_col = "af"
parameter: molecular_id_col = "molecular_trait_object_id"
# This is for selecting the subset of data to process on protential signals
# assuming we drop those above this threshold
# This might lead to underestimates in qvalue method since qvalue < 0.05 may contain pvalue > 0.05
parameter: pvalue_cutoff = 0.05
# This is used for both event and variant level significance filter
parameter: fdr_threshold = 0.05
# eg "pvalue" for pvalue, "pvalue_.*_interaction" for interaction
parameter: pvalue_pattern = "^pvalue$"
# eg "qvalue" for pvalue, "qvalue_interaction" for interaction
parameter: qvalue_pattern = "^qvalue$"
# eg "*.cis_qtl.regional.tsv.gz$" for genetic effect via our pipeline TensorQTL.ipynb
# "*.cis_qtl_top_assoc.txt.gz$" for interaction genetic effect
parameter: regional_pattern = "NULL"
parameter: qtl_pattern = "*.cis_qtl.pairs.tsv.gz$"
parameter: n_variants_suffix = "cis_n_variants_stats.tsv.gz"
work_dir = f"{cwd:a}/{sub_dir}"
if sub_dir == path("."):
output_dir = f"{output_dir:a}/{cwd:b}"
archive_dir = f"{archive_dir:a}/{cwd:b}"
work_dir = f"{cwd:a}"
else:
output_dir = f"{output_dir:a}/{cwd:b}/{sub_dir}"
archive_dir = f"{archive_dir:a}/{cwd:b}/{sub_dir}"
[default]
output: f"{output_dir}/{cwd:b}_multiple_testing_consolidated.rds"
task: trunk_workers = 1, tags = f'tensorqtl_postprocessing_{_output:n}'
R: expand = "${ }"
params <- list()
params$workdir <- "${work_dir}"
params$maf_cutoff <- ${maf_cutoff}
params$cis_window <- ${cis_window}
params$pvalue_cutoff <- ${pvalue_cutoff}
params$fdr_threshold <- ${fdr_threshold}
params$gene_coordinates <- "${gene_coordinates:a}"
params$output_dir <- "${output_dir}"
params$archive_dir <- "${archive_dir}"
params$regional_pattern <- "${regional_pattern}"
params$n_variants_suffix <- "${n_variants_suffix}"
params$qtl_pattern <- "${qtl_pattern}"
params$pvalue_pattern <- "${pvalue_pattern}"
params$qvalue_pattern <- "${qvalue_pattern}"
params$start_distance_col <- "${tss_dist_col}"
params$end_distance_col <- "${tes_dist_col}"
params$af_col <- "${af_col}"
params$molecular_id_col <- "${molecular_id_col}"
convert_null_strings <- function(params) {
if (is.list(params)) {
# Apply the function to each element in the list
result <- lapply(params, convert_null_strings)
return(result)
} else {
# For non-list elements, check if it's the string "NULL"
if (is.character(params) && params == "NULL") {
return(NULL)
} else {
return(params)
}
}
}
params <- convert_null_strings(params)
source("${pecotmr_path}/inst/code/tensorqtl_postprocessor.R")
results <- hierarchical_multiple_testing_correction(params)
write_results(results, params$output_dir, params$workdir, to_cwd = "regional")
archive_files(params)
saveRDS(results, ${_output:r})