xQTL-GWAS pairwise enrichment and colocalization#
Run pairwise enrichment and colocalization between xQTL and GWAS SuSiE fine-mapping results on the toy dataset. The workflow first estimates a global enrichment between the two sets of signals, then performs colocalization (coloc) on the overlapping regions.
Description#
This workflow processes fine-mapping results for xQTL, generated by susie_twas in the mnm_regression.ipynb notebook for cis xQTL, and GWAS fine-mapping results produced by susie_rss in the rss_analysis.ipynb notebook. It is designed to perform enrichment and colocalization analysis, particularly when fine-mapping results originate from different regions in the case of cis-xQTL and GWAS. The pipeline is capable to integrate and analyze data across these distinct regions. Originally tailored for cis-xQTL and GWAS integration, this pipeline can be applied to other pairwise integrations. An example of such application is in trans analysis, where the fine-mapped regions might be identical between trans-xQTL and GWAS, representing a special case of this broader implementation.
Input#
Lists of SuSiE fine-mapping output objects, in RDS format, of class(susie) in R.
For xQTL the list is meta-data of format:
chr,start,end,original_data,conditions_top_loci,block_top_lociwhereoriginal_datais an RDS file,conditions_top_lociis showing which contexts have top loci table (potential signals)block_top_lociis the blocks have overlapped top loci variant with xQTL data. That file could be output fromfine_mapping_post_processing/overlap_qtl_gwas.For GWAS the list is meta-data of format:
chr,start,end,original_data,block_top_loci...whereoriginal_datais an RDS file. That file could be output fromfine_mapping_post_processing/gwas_results_export.Context meta is a metafile that shows the analysis_names and the contained contexts. It can be used to identify the corresponding raw data for each context.
--gwas_meta_data
output fromfine_mapping_post_processing/gwas_results_export
#chr start end region_id TSS original_data combined_data combined_data_sumstats conditions conditions_top_loci
chr1 101384274 104443097 1_101384274-104443097 NA 1_101384274-104443097.susie_rss.rds gwas.1_101384274-104443097.cis_results_db.export.rds gwas.1_101384274-104443097.cis_results_db.export_sumstats.rds NA NA
chr19 44935906 46842901 19_44935906-46842901 NA 19_44935906-46842901.susie_rss.rds gwas.19_44935906-46842901.cis_results_db.export.rds gwas.19_44935906-46842901.cis_results_db.export_sumstats.rds AD_Bellenguez_2022,AD_Jansen_2021,AD_Kunkle_Stage1_2019,AD_Wightman_Full_2021,AD_Wightman_Excluding23andMe_2021,AD_Wightman_ExcludingUKBand23andME_2021 AD_Wightman_ExcludingUKBand23andME_2021.qc_impute,AD_Wightman_ExcludingUKBand23andME_2021.qc_only AD_Wightman_ExcludingUKBand23andME_2021.qc_impute
--xqtl_meta_data
for enrichment: output from fine_mapping_post_processing/cis_results_export; for coloc: output from fine_mapping_post_processing/overlap_qtl_gwas, the difference between them is without or with last 2 columns: block_top_loci, final_combined_data to document overlapped block info at variant level
#chr start end region_id TSS original_data combined_data combined_data_sumstats conditions conditions_top_loci block_top_loci final_combined_data
chr1 167600000 171480000 ENSG00000000457 169894266 MSBB_eQTL.ENSG00000000457.univariate_susie_twas_weights.rds, ROSMAP_Kellis_eQTL.ENSG00000000457.univariate_susie_twas_weights.rds Fungen_xQTL.ENSG00000000457.cis_results_db.export.rds Fungen_xQTL.ENSG00000000457.cis_results_db.export_sumstats.rds BM_10_MSBB_eQTL,BM_22_MSBB_eQTL,BM_36_MSBB_eQTL,BM_44_MSBB_eQTL,Ast_Kellis_eQTL BM_22_MSBB_eQTL,BM_44_MSBB_eQTL
chr19 41840000 47960000 ENSG00000130203 44905790 ROSMAP_Kellis_eQTL.ENSG00000130203.univariate_susie_twas_weights.rds, ROSMAP_Bennett_Klein_pQTL.ENSG00000130203.univariate_susie_twas_weights.rds Fungen_xQTL.ENSG00000130203.cis_results_db.export.rds Fungen_xQTL.ENSG00000130203.cis_results_db.export_sumstats.rds Mic_Kellis_eQTL,DLPFC_Bennett_pQTL Mic_Kellis_eQTL 19_44935906-46842901 Fungen_xQTL.ENSG00000130203.cis_results_db.export.overlapped.gwas.rds
chr20 49934867 53560000 ENSG00000000419 50958554 MSBB_eQTL.ENSG00000000419.univariate_susie_twas_weights.rds Fungen_xQTL.ENSG00000000419.cis_results_db.export.rds Fungen_xQTL.ENSG00000000419.cis_results_db.export_sumstats.rds BM_10_MSBB_eQTL,BM_22_MSBB_eQTL BM_22_MSBB_eQTL Fungen_xQTL.ENSG00000000419.cis_results_db.export.overlapped.gwas.rds
--context_meta
should be updated as more analysis is done, we used analysis_name and context in this analysis
analysis_name cohort context
KNIGHT_pQTL KNIGHT Knight_pQTL_brain,Knight_eQTL_brain
MSBB_eQTL MSBB BM_10_MSBB_eQTL,BM_22_MSBB_eQTL,BM_36_MSBB_eQTL,BM_44_MSBB_eQTL
MIGA_eQTL MIGA MiGA_GTS_eQTL,MiGA_SVZ_eQTL,MiGA_THA_eQTL
Analytical Logic#
Analytical Logic#
Enrichment
Identify GWAS Blocks: Select GWAS blocks with top loci from Bellenguez data and using a single variant regression method. Locate the original filenames (
original_data) for these blocks and map the analysis regions to identify overlapping gene analysis regions and corresponding QTL files with top loci table (condition_top_loci).Contexts in xQTL Meta-data: Find contexts within the xQTL meta-data that include top loci results for each gene. Retrieve the corresponding original xQTL files (
original_data) by referencing the context meta-data.Execute xQTL GWAS Enrichment: Perform
xqtl_gwas_enrichmentanalysis to generate enrichment parameters (a0, a1) and calculate priors (p1, p2, p12).
Coloc
xQTL Meta-data Contexts: Within the xQTL meta-data, identify contexts that include top loci results for each gene, indicated by
condition_top_loci. Retrieve the corresponding original xQTL files (original_data) by referring back to the context meta-data. (Optional) Subset the QTL table with specifies gene list.Original GWAS Files: Identify GWAS blocks that contain overlapping top loci variants for each gene (
block_top_loci) in above file. Utilize the GWAS list to locate the original filenames (original_data) for the identified GWAS block candidates.Enrichment and Coloc Analysis: Automatically execute
xqtl_gwas_enrichmentto enable enrichment analysis, generating priors for subsequent coloc analysis through logic defined inenrichment. Then, applysusie_colocfor coloc analysis on the selected xQTL and GWAS original files, analyzing each gene under each identified condition.
Steps#
Step 1. Enrichment. Estimate the global xQTL-GWAS enrichment. get_analysis_regions first collects the overlapped regions per gene/block, then xqtl_gwas_enrichment produces the enrichment object used as the prior for coloc.
Timing: ~varies on typical compute infrastructure.
sos run pipeline/SuSiE_enloc.ipynb xqtl_gwas_enrichment \
--gwas-meta-data input/susie_enloc_data/protocol_example.enloc.gwas_meta.tsv \
--xqtl-meta-data input/susie_enloc_data/protocol_example.enloc.xqtl_meta.tsv \
--xqtl-finemapping-obj preset_variants_result susie_result_trimmed \
--xqtl-varname-obj preset_variants_result variant_names \
--gwas-finemapping-obj AD_Bellenguez_2022 RSS_QC_RAISS_imputed susie_result_trimmed \
--gwas-varname-obj AD_Bellenguez_2022 RSS_QC_RAISS_imputed variant_names \
--xqtl-region-obj region_info grange \
--qtl-path input/susie_enloc_data \
--gwas-path input/susie_enloc_data \
--context-meta input/susie_enloc_data/protocol_example.enloc.context_meta.tsv \
--cwd output/xqtl_gwas_enrichment
Step 2. Coloc. Run colocalization on the overlapping regions. --skip-enrich reuses the enrichment object produced in Step 1, so the *enrichment.rds file from that step must be present in --cwd. Otherwise enrichment is triggered automatically. get_overlapped_analysis_regions selects the regions with overlapping variants before susie_coloc runs.
sos run pipeline/SuSiE_enloc.ipynb susie_coloc \
--gwas-meta-data input/susie_enloc_data/protocol_example.enloc.gwas_meta.tsv \
--xqtl-meta-data input/susie_enloc_data/protocol_example.enloc.xqtl_meta.tsv \
--xqtl-finemapping-obj preset_variants_result susie_result_trimmed \
--xqtl-varname-obj preset_variants_result variant_names \
--gwas-finemapping-obj AD_Bellenguez_2022 RSS_QC_RAISS_imputed susie_result_trimmed \
--gwas-varname-obj AD_Bellenguez_2022 RSS_QC_RAISS_imputed variant_names \
--xqtl-region-obj region_info grange \
--qtl-path input/susie_enloc_data \
--gwas-path input/susie_enloc_data \
--context-meta input/susie_enloc_data/protocol_example.enloc.context_meta.tsv \
--ld-meta-file-path input/ld_reference/protocol_example.ld_meta_file.tsv \
--skip-enrich \
--cwd output/susie_coloc
Command interface#
sos run pipeline/SuSiE_enloc.ipynb -h
Workflow implementation#
[global]
# Workdir
parameter: cwd = path("output")
# A list of file paths for fine-mapped GWAS results.
parameter: gwas_meta_data = path
# A list of file paths for fine-mapped xQTL results.
parameter: xqtl_meta_data = path
# Optional: if a region list is provide the enrichment analysis will be focused on provided region.
# The LAST column of this list will contain the ID of regions to focus on
parameter: region_list = path()
# Optional: if a region name is provided
# the analysis would be focused on the union of provides region list and region names
parameter: region_name = []
# It is required to input the name of the analysis
parameter: name = f"{xqtl_meta_data:bnn}.{gwas_meta_data:bnn}"
parameter: container = ""
import re
# For cluster jobs, number commands to run per job
parameter: job_size = 200
# Wall clock time expected
parameter: walltime = "5m"
# Memory expected: quite large for enrichment analysis but small for xQTL colocalization
parameter: mem = "16G"
# Number of threads
parameter: numThreads = 1
#Optional table name in xQTL RDS files to get fine mapping results (eg 'susie_result_trimmed ').
parameter: xqtl_finemapping_obj = []
#Optional table name in xQTL RDS files to get variable names (eg ' variant_names ').
parameter: xqtl_varname_obj = []
#Optional table name in GWAS RDS files to get fine mapping results (eg 'AD_Bellenguez_2022 single_effect_regression susie_result_trimmed ').
parameter: gwas_finemapping_obj = []
#Optional table name in GWAS RDS files to get variable names(eg 'AD_Bellenguez_2022 single_effect_regression variant_names ').
parameter: gwas_varname_obj = []
#Optional table name in xQTL RDS files to get region info (eg 'susie_result_trimmed region_info grange ').
parameter: xqtl_region_obj = []
#Optional table name in GWAS RDS files to get region info (eg 'AD_Bellenguez_2022 single_effect_regression region_info grange ').
parameter: gwas_region_obj = []
#Directory path for GWAS orignal finemapping results
parameter: gwas_path = ''
#Directory path for xQTL orignal finemapping results
parameter: qtl_path = ''
# a meta file showing the context and corresponding analysis_name
parameter: context_meta = path()
# Optional: if a region list is provide the analysis will be focused on provided region.
# The LAST column of this list will contain the ID of regions to focus on
# Otherwise, all regions with both genotype and phenotype files will be analyzed
parameter: region_list = path()
# Optional: if a region name is provided
# the analysis would be focused on the union of provides region list and region names
parameter: region_name = []
# use conditions_top_loci_minp column or not, which can reduce a lot computing resource by pick top 1 isoform
parameter: minp = False
import os
import pandas as pd
import re
def group_by_region(lst, partition):
# from itertools import accumulate
# partition = [len(x) for x in partition]
# Compute the cumulative sums once
# cumsum_vector = list(accumulate(partition))
# Use slicing based on the cumulative sums
# return [lst[(cumsum_vector[i-1] if i > 0 else 0):cumsum_vector[i]] for i in range(len(partition))]
return partition
def make_unique_data(data):
return ','.join(set(data.split(',')))
def generate_meta_dataframe(meta_data_path):
"""Generate a new long metadata by converting to context:analysis_name (1:1)."""
meta = pd.read_csv(meta_data_path, sep='\t')
new_meta = pd.DataFrame()
contexts = pd.unique(meta['context'].str.split(',', expand=True).stack().str.strip())
for context in contexts:
if not isinstance(context, str):
continue
mask = meta['context'].str.contains(context)
tmp = pd.DataFrame({
'context': [context],
'analysis_name': [','.join(meta.loc[mask, 'analysis_name'])]
})
new_meta = pd.concat([new_meta, tmp], ignore_index=True)
return new_meta
def generate_condition_based_dataframe(xqtl_df, minp):
"""Generate a new long table by converting to condition and region:original_data (1:1)."""
# only consider the QTL contexts that have top loci data
condition_column = 'conditions_top_loci_minp' if minp else 'conditions_top_loci'
conditions = pd.unique(xqtl_df[condition_column].str.split(',', expand=True).stack().str.strip())
new_df = pd.DataFrame()
for condition in conditions:
if not isinstance(condition, str):
continue
escaped_condition = re.escape(condition)
mask = xqtl_df[condition_column].str.contains(escaped_condition, case=False, regex=True)
# Iterate through each unique region in the filtered data
unique_regions = pd.unique(xqtl_df.loc[mask, 'region_id'])
for region_id in unique_regions:
region_mask = (xqtl_df['region_id'] == region_id) & mask
tmp = pd.DataFrame({
'condition': [condition],
'region_id': [region_id],
'QTL_original_data': [','.join(xqtl_df.loc[region_mask, 'original_data'])],
'GWAS_original_data': [','.join(xqtl_df.loc[region_mask, 'block_data'])]
})
new_df = pd.concat([new_df, tmp], ignore_index=True)
return new_df
def merge_and_filter_dfs(new_df, new_meta):
"""Merge and filter the dataframes based on condition/context where context is a substring of condition, and analysis_name to pick the corresponding original files."""
# Explode the QTL data into separate rows
new_df = new_df.set_index(['condition', 'region_id', 'GWAS_original_data'])['QTL_original_data'].str.split(',', expand=True).stack().reset_index(name='QTL_original_data').drop('level_3', axis=1)
new_df['analysis_name_prefix'] = new_df['QTL_original_data'].apply(lambda x: x.split('.')[0])
# Create a custom merge logic to match context as a substring of condition
def custom_merge(row, df_meta):
# Filter meta dataframe to find any context that is part of the condition string
sub_df = df_meta[df_meta['context'].apply(lambda x: x in row['condition'])]
if not sub_df.empty:
return sub_df
else:
return pd.DataFrame(columns=df_meta.columns) # Return an empty DataFrame with the same columns if no match found
# Apply custom merge function row-wise and concatenate results
results = [custom_merge(row, new_meta) for index, row in new_df.iterrows()]
merged_df = pd.concat(results, keys=new_df.index).reset_index(level=1, drop=True).join(new_df, how='outer')
# Filter rows where analysis_name_prefix matches analysis_name exactly
filtered_df = merged_df[merged_df['analysis_name_prefix'].str.strip() == merged_df['analysis_name'].str.strip()]
# Drop unnecessary columns and duplicates
filtered_df = filtered_df.drop(columns=['analysis_name_prefix', 'context']).drop_duplicates()
filtered_df['GWAS_original_data'] = filtered_df['GWAS_original_data'].apply(make_unique_data)
return filtered_df
def prepare_final_paths(filtered_df, qtl_path, gwas_path):
"""Prepare final paths for QTL and GWAS original data."""
filtered_df['QTL_original_data'] = filtered_df['QTL_original_data'].apply(
lambda x: ','.join(f"{qtl_path}/{file_name.strip()}" for file_name in x.split(','))
)
filtered_df['GWAS_original_data'] = filtered_df['GWAS_original_data'].apply(
lambda x: ','.join(f"{gwas_path}/{file_name.strip()}" for file_name in x.split(','))
)
return filtered_df
# get overlapped regions by gene and block region for enrichment analysis
[get_analysis_regions: shared = "regional_data"]
import pandas as pd
def process_dataframes_with_toploci(xqtl_meta_data, gwas_meta_data):
"""Process the XQTL and GWAS dataframes."""
xqtl_df = pd.read_csv(xqtl_meta_data, sep="\t")
# filter xQTL data with the ones have overlapped top loci variants with GWAS data
xqtl_df = xqtl_df[xqtl_df['conditions_top_loci_minp' if minp else 'conditions_top_loci'].notna()]
gwas_df = pd.read_csv(gwas_meta_data, sep="\t")
gwas_df = gwas_df[gwas_df['conditions_top_loci'].notna()]
# e.g. gwas_finemapping_obj = ['AD_Bellenguez_2022', 'single_effect_regression', 'susie_result_trimmed']
# filter gwas data with find variants in specified cohort 'AD_Bellenguez_2022' and method 'single_effect_regression'
gwas_finemapping_obj_filtered = [obj for obj in gwas_finemapping_obj if obj != 'susie_result_trimmed']
if len(gwas_finemapping_obj_filtered) > 0:
pattern = '|'.join(gwas_finemapping_obj_filtered)
gwas_df = gwas_df[gwas_df['conditions_top_loci'].str.contains(pattern)]
xqtl_df = overlapped_analysis_region(xqtl_df, gwas_df)
return xqtl_df, gwas_df
def overlapped_analysis_region(xqtl_df, gwas_df):
# Create an empty dataframe to store overlapping xQTL records
overlapped_xqtl_df = pd.DataFrame()
# Iterate over each row in the GWAS dataframe
for index, gwas_row in gwas_df.iterrows():
gwas_chr = gwas_row['#chr']
gwas_start = gwas_row['start']
gwas_end = gwas_row['end']
gwas_block_id = gwas_row['original_data'] # Assuming each GWAS row has a unique block ID
# Filter xQTL dataframe for the same chromosome
same_chr_xqtl_df = xqtl_df[xqtl_df['#chr'] == gwas_chr]
# Find overlapping xQTL regions with the current GWAS region
overlapping_xqtl = same_chr_xqtl_df[((same_chr_xqtl_df['start'] <= gwas_end) &
(same_chr_xqtl_df['end'] >= gwas_start))].copy() # Add .copy() to avoid SettingWithCopyWarning
# Check if there are any overlapping xQTLs
if not overlapping_xqtl.empty:
# Use .loc to safely assign 'block_data' to avoid the SettingWithCopyWarning
overlapping_xqtl.loc[:, 'block_data'] = gwas_block_id
# Append the overlapping xQTLs to the accumulated dataframe
overlapped_xqtl_df = pd.concat([overlapped_xqtl_df, overlapping_xqtl], ignore_index=True)
# Group by xQTL region and concatenate 'block_data'
overlapped_xqtl_df['block_data'] = overlapped_xqtl_df.groupby(['#chr', 'start', 'end'])['block_data'].transform(lambda x: ','.join(x))
overlapped_xqtl_df = overlapped_xqtl_df.drop_duplicates().reset_index(drop=True)
return overlapped_xqtl_df
xqtl_df, gwas_df = process_dataframes_with_toploci(xqtl_meta_data, gwas_meta_data)
region_ids=[]
# If region_list is provided, read the file and extract IDs
if not region_list.is_dir():
if region_list.is_file():
region_list_df = pd.read_csv(region_list, sep='\t', header=None, comment = "#")
region_ids = region_list_df.iloc[:, -1].unique() # Extracting the last column for IDs
else:
raise ValueError("The region_list path provided is not a file.")
if len(region_name) > 0:
region_ids = list(set(region_ids).union(set(region_name)))
if len(region_ids) > 0:
xqtl_df = xqtl_df[xqtl_df['region_id'].isin(region_ids)]
new_df = generate_condition_based_dataframe(xqtl_df, minp)
# if there is only one original data file, we don't have to map to context meta
if (new_df['QTL_original_data'].str.split(',').apply(lambda x: len(x) in [0, 1]).all() and not context_meta.is_file()):
filtered_df = new_df.copy()
filtered_df['GWAS_original_data'] = filtered_df['GWAS_original_data'].apply(make_unique_data)
else:
filtered_df = merge_and_filter_dfs(new_df, generate_meta_dataframe(context_meta))
filtered_df = prepare_final_paths(filtered_df, qtl_path, gwas_path)
filtered_df = filtered_df.groupby('condition').agg({
'GWAS_original_data': lambda x: ','.join(x),
'QTL_original_data': lambda x: ','.join(x)
}).reset_index()
regional_data = {
'data': [row['QTL_original_data'].split(',') for _, row in filtered_df.iterrows()],
'conditions': [(f"{row['condition']}", *row['GWAS_original_data'].split(',')) for _, row in filtered_df.iterrows()]
}
# get overlapped regions from flatten table, to get the regions with overlapped variants and select those regions for coloc analysis
[get_overlapped_analysis_regions: shared = "overlapped_regional_data"]
import pandas as pd
def map_blocks_to_data(blocks, mapping_dict):
"""Map blocks to data using a provided mapping dictionary."""
mapped_data = ','.join(mapping_dict.get(block.strip(), 'NA') for block in blocks.split(','))
return mapped_data
def process_dataframes(xqtl_meta_data, gwas_meta_data):
"""Process the XQTL and GWAS dataframes."""
xqtl_df = pd.read_csv(xqtl_meta_data, sep="\t")
# filter xQTL data with the ones have overlapped top loci variants with GWAS data
xqtl_df = xqtl_df[xqtl_df['block_top_loci'].notna()]
gwas_df = pd.read_csv(gwas_meta_data, sep="\t")
gwas_df = gwas_df[gwas_df['conditions_top_loci'].notna()]
# e.g. gwas_finemapping_obj = ['AD_Bellenguez_2022', 'single_effect_regression', 'RSS_QC_RAISS_imputed']
# filter gwas data with find variants in specified cohort 'AD_Bellenguez_2022' and method 'RSS_QC_RAISS_imputed'
gwas_finemapping_obj_filtered = [obj for obj in gwas_finemapping_obj if obj != 'susie_result_trimmed' and obj != 'RSS_QC_RAISS_imputed']
if len(gwas_finemapping_obj_filtered) > 0:
pattern = '|'.join(gwas_finemapping_obj_filtered)
gwas_df = gwas_df[gwas_df['conditions_top_loci'].str.contains(pattern)]
gwas_df = gwas_df[gwas_df['region_id'].isin(xqtl_df['block_top_loci'].str.split(',').explode().unique())]
region_to_combined_data = dict(zip(gwas_df['region_id'], gwas_df['original_data']))
# and only consider the overlapped GWAS blocks
xqtl_df['block_data'] = xqtl_df['block_top_loci'].apply(map_blocks_to_data, args=(region_to_combined_data,))
xqtl_df['block_data'] = xqtl_df['block_data'].str.replace(',NA', '').str.replace('NA,', '')
return xqtl_df, gwas_df
xqtl_df, gwas_df = process_dataframes(xqtl_meta_data, gwas_meta_data)
region_ids=[]
# If region_list is provided, read the file and extract IDs
if not region_list.is_dir():
if region_list.is_file():
region_list_df = pd.read_csv(region_list, sep='\t', header=None, comment = "#")
region_ids = region_list_df.iloc[:, -1].unique() # Extracting the last column for IDs
else:
raise ValueError("The region_list path provided is not a file.")
if len(region_name) > 0:
region_ids = list(set(region_ids).union(set(region_name)))
if len(region_ids) > 0:
xqtl_df = xqtl_df[xqtl_df['region_id'].isin(region_ids)]
new_df = generate_condition_based_dataframe(xqtl_df, minp)
# if there is only one original data file, we don't have to map to context meta
if (new_df['QTL_original_data'].str.split(',').apply(lambda x: len(x) in [0, 1]).all() and not context_meta.is_file()):
filtered_df = new_df.copy()
filtered_df['GWAS_original_data'] = filtered_df['GWAS_original_data'].apply(make_unique_data)
else:
filtered_df = merge_and_filter_dfs(new_df, generate_meta_dataframe(context_meta))
filtered_df = prepare_final_paths(filtered_df, qtl_path, gwas_path)
overlapped_regional_data = {
'data': [row['QTL_original_data'].split(',') for _, row in filtered_df.iterrows()],
'conditions': [(f"{row['condition']}@{row['region_id']}", *row['GWAS_original_data'].split(',')) for _, row in filtered_df.iterrows()]
}
[xqtl_gwas_enrichment]
depends: sos_variable("regional_data")
stop_if(len(regional_data['data']) == 0, f'No files left for analysis')
meta = regional_data['conditions']
input: regional_data["data"], group_by = lambda x: group_by_region(x, regional_data["data"]), group_with = "meta"
output: f'{cwd:a}/{name}.{_meta[0]}.enrichment.rds'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: expand = '${ }', stdout = f"{_output:n}.stdout", stderr = f"{_output:n}.stderr", container = container
set -e
# Per-region xQTL files for this group, and the GWAS rds entries in _meta[1:].
# Convert each side of this region to a pecotmr S4 collection, then run
# qtlEnrichmentPipeline on the pair.
Rscript code/script/pecotmr_integration/legacy_enloc_finemap_convert.R \
--mode qtl \
--rds-files ${",".join([str(x) for x in _input])} \
--finemapping-obj '${" ".join(xqtl_finemapping_obj)}' \
--varname-obj '${" ".join(xqtl_varname_obj)}' \
--study ${name} \
--output ${_output:n}.qtl.rds
Rscript code/script/pecotmr_integration/legacy_enloc_finemap_convert.R \
--mode gwas \
--rds-files ${",".join([str(x) for x in _meta[1:] if str(x).endswith(".rds")])} \
--finemapping-obj '${" ".join(gwas_finemapping_obj)}' \
--varname-obj '${" ".join(gwas_varname_obj)}' \
--output ${_output:n}.gwas.rds
Rscript code/script/pecotmr_integration/qtl_enrichment.R \
--qtl-fine-mapping ${_output:n}.qtl.rds \
--gwas-fine-mapping ${_output:n}.gwas.rds \
--ncore ${numThreads} \
--output ${_output}
[susie_coloc]
depends: sos_variable("overlapped_regional_data")
# depends: sos_step("xqtl_gwas_enrichment") #changed
stop_if(len(overlapped_regional_data['data']) == 0, f'No files left for analysis')
# skip enrichment or not
parameter: skip_enrich=False
# filter lbf by cs as default in susie coloc. default is False means filtering by V > prior_tol
parameter: filter_lbf_cs = False
# ld reference path for postprocessing
parameter: ld_meta_file_path = path()
meta = overlapped_regional_data['conditions']
input: overlapped_regional_data["data"], group_by = lambda x: group_by_region(x, overlapped_regional_data["data"]), group_with = "meta"
# output: f'{cwd:a}/{step_name[:-2]}/{name}.{_meta[0]}.coloc.rds'
output: f'{cwd:a}/{step_name}/{name}.{_meta[0]}.coloc.rds'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: expand = '${ }', stdout = f"{_output:n}.stdout", stderr = f"{_output:n}.stderr", container = container
set -e
# context (before '@' in _meta[0]) selects the per-region enrichment file the
# enrichment step wrote: {cwd}/{name}.{context}.enrichment.rds (legacy naming).
# Convert this region's xQTL + GWAS sides, then run colocPipeline (enloc
# variant when --enrichment is supplied).
Rscript code/script/pecotmr_integration/legacy_enloc_finemap_convert.R \
--mode qtl \
--rds-files ${",".join([str(x) for x in _input])} \
--finemapping-obj '${" ".join(xqtl_finemapping_obj)}' \
--varname-obj '${" ".join(xqtl_varname_obj)}' \
--study ${name} \
--output ${_output:n}.qtl.rds
Rscript code/script/pecotmr_integration/legacy_enloc_finemap_convert.R \
--mode gwas \
--rds-files ${",".join([str(x) for x in _meta[1:] if str(x).endswith(".rds")])} \
--finemapping-obj '${" ".join(gwas_finemapping_obj)}' \
--varname-obj '${" ".join(gwas_varname_obj)}' \
--output ${_output:n}.gwas.rds
Rscript code/script/pecotmr_integration/coloc.R \
--qtl-fine-mapping ${_output:n}.qtl.rds \
--gwas-input ${_output:n}.gwas.rds \
${('--enrichment '+cwd.absolute().joinpath(name + "." + _meta[0].split("@")[0] + ".enrichment.rds").as_posix()) if not skip_enrich else ''} \
--output ${_output}
Anticipated Results#
Enrichment analysis results — this is a global enrichment estimate that combines all input data for each context.
Colocalization results for regions of interest — if
ld_meta_file_pathis provided, would also report coloc cs by coloc postprocessing.