Hierarchical Multiple Testing

Hierarchical Multiple Testing#

This protocol implements a three-step procedure:

  1. Local adjustment: p-values of all cis-variants adjusted within each gene

  2. Global adjustment: minimum adjusted p-values from Step 1 further adjusted across all genes

  3. Global informed identification of significant xQTL: xQTL with locally adjusted p-value below the threshold

It also reorganizes intermediate files produced in tensorQTL analysis to specified archive folder for book-keeping or deletion, to save space.

Example commands#

Genetic effect#

With

  1. permutation testing

  2. recounted number of variants limited to MAF and cis-windows for Bonferroni correction

We recommend reassessing significance of Bonferroni correction if previous analysis use too large of window or contain many low frequency variants. Below we reassess with MAF > 0.01 and cis-window size 1000000 up and downstreams of TSS/TES.

sos run ~/GIT/xqtl-protocol/code/association_scan/qtl_association_postprocessing.ipynb default \
    --gene-coordinates look_up_gene_id.tsv \
    --cwd ~/Downloads/snuc_DeJager_Ast_tensorQTL_MWE --sub-dir LR \
    --maf-cutoff 0.01 --cis-window 1000000 \
    --regional-pattern "*.cis_qtl.regional.tsv.gz$" \
    --output-dir ~/output --archive-dir ~/archive -s force

When maf_cutoff and cis_window are not zero, the program will first compute the number of variants after filtering by these metric and write files with n_variants_suffix in the same folder as QTL data, then use those numbers to create filtered list of variants for Bonferroni adjusted p-value.

Interaction effect#

With EigenMT method to get us the effective number of tests, it is recommended setting maf_cutoff and cis_window both to zero to tell the program to not recompute counts for Bonferroni correction,

sos run ~/GIT/xqtl-protocol/code/association_scan/qtl_association_postprocessing.ipynb default \
    --gene-coordinates look_up_gene_id.tsv \
    --cwd ~/Downloads/snuc_DeJager_Ast_tensorQTL_MWE --sub-dir "interaction/msex" \
    --maf-cutoff 0 --cis-window 0 \
    --regional-pattern "*.cis_qtl_top_assoc.txt.gz$" \
    --pvalue-pattern "pvalue_.*_interaction" \
    --qvalue-pattern "qvalue_interaction" \
    --output-dir ~/output --archive-dir ~/archive -s force \
    --fdr-threshold 0.3 # set it large for illustration purpose

The above approach seems recommended by tensorqtl package because it is their default option for interaction QTL analysis. We will also adopt it as a recommended practice.

To keep using variant count based Bonferroni correct, simply remove --regional-pattern,

sos run ~/GIT/xqtl-protocol/code/association_scan/qtl_association_postprocessing.ipynb default \
    --gene-coordinates look_up_gene_id.tsv \
    --cwd ~/Downloads/snuc_DeJager_Ast_tensorQTL_MWE --sub-dir "interaction/msex" \
    --pvalue-pattern "pvalue_.*_interaction" \
    --qvalue-pattern "qvalue_interaction" \
    --output-dir ~/output --archive-dir ~/archive -s force \
    --fdr-threshold 0.3 # set it large for illustration purpose

Quantile QTL#

Here we summarize the significant QTL using both original variants as well as additionally considering MAF>0.05 and cis-window 1Mb around a gene, for Bonferroni adjusted p-value,

sos run ~/GIT/xqtl-protocol/code/association_scan/qtl_association_postprocessing.ipynb default \
    --gene-coordinates look_up_gene_id.tsv \
    --cwd ~/Downloads/QR_ROSMAP_Ast_mega_quantile_tensorQTL_MWE --sub-dir "." \
    --maf-cutoff 0.05 --cis-window 1000000 \
    --af-col maf --tss-dist-col feature_tss --tes-dist-col feature_tes \
    --qvalue-pattern "^qvalue$" \
    --output-dir ~/output --archive-dir ~/archive -s force
[global]
parameter: cwd = path(".")
parameter: gene_coordinates = path
parameter: output_dir = path
parameter: archive_dir = path
parameter: sub_dir = path
parameter: maf_cutoff = 0.01
parameter: cis_window = 1000000
parameter: tss_dist_col = "start_distance"
parameter: tes_dist_col = "end_distance"
parameter: af_col = "af"
# This is for selecting the subset of data to process on protential signals 
# assuming we drop those above this threshold
# This might lead to underestimates in qvalue method since qvalue < 0.05 may contain pvalue > 0.05
parameter: pvalue_cutoff = 0.05
# This is used for both event and variant level significance filter
parameter: fdr_threshold = 0.05
# eg "pvalue" for pvalue, "pvalue_.*_interaction" for interaction
parameter: pvalue_pattern = "^pvalue$"
# eg "qvalue" for pvalue, "qvalue_interaction" for interaction
parameter: qvalue_pattern = "^qvalue$"
# eg "*.cis_qtl.regional.tsv.gz$" for genetic effect via our pipeline TensorQTL.ipynb
# "*.cis_qtl_top_assoc.txt.gz$" for interaction genetic effect
parameter: regional_pattern = "NULL"
parameter: qtl_pattern = "*.cis_qtl.pairs.tsv.gz$"
parameter: n_variants_suffix = "cis_n_variants_stats.tsv.gz"

work_dir = f"{cwd:a}/{sub_dir}"
if sub_dir == path("."):
    output_dir = f"{output_dir:a}/{cwd:b}"
    archive_dir = f"{archive_dir:a}/{cwd:b}"
    work_dir = f"{cwd:a}"
else:
    output_dir = f"{output_dir:a}/{cwd:b}/{sub_dir}"
    archive_dir = f"{archive_dir:a}/{cwd:b}/{sub_dir}"
[default]
output: f"{output_dir}/{cwd:b}_multiple_testing_consolidated.rds"
task: trunk_workers = 1, tags = f'tensorqtl_postprocessing_{_output:n}'
R: expand = "${ }"

    params <- list()
    params$workdir           <- "${work_dir}"
    params$maf_cutoff        <- ${maf_cutoff}
    params$cis_window        <- ${cis_window}
    params$pvalue_cutoff     <- ${pvalue_cutoff}
    params$fdr_threshold     <- ${fdr_threshold}
    params$gene_coordinates  <- "${gene_coordinates:a}"
    params$output_dir        <- "${output_dir}"
    params$archive_dir       <- "${archive_dir}"
    params$regional_pattern  <- "${regional_pattern}"
    params$n_variants_suffix <- "${n_variants_suffix}"
    params$qtl_pattern       <- "${qtl_pattern}"
    params$pvalue_pattern    <- "${pvalue_pattern}"
    params$qvalue_pattern    <- "${qvalue_pattern}"
    params$start_distance_col <- "${tss_dist_col}"
    params$end_distance_col   <- "${tes_dist_col}"
    params$af_col <- "${af_col}"

    convert_null_strings <- function(params) {
      if (is.list(params)) {
        # Apply the function to each element in the list
        result <- lapply(params, convert_null_strings)
        return(result)
      } else {
        # For non-list elements, check if it's the string "NULL"
        if (is.character(params) && params == "NULL") {
          return(NULL)
        } else {
          return(params)
        }
      }
    }

    params <- convert_null_strings(params)
    source("~/GIT/pecotmr/inst/code/tensorqtl_postprocessor.R")
    results <- hierarchical_multiple_testing_correction(params)
    write_results(results, params$output_dir, params$workdir, to_cwd = "regional")
    archive_files(params)
    saveRDS(results, ${_output:r})