This function formats the input summary statistics dataframe with uniform column names to fit into the SuSiE pipeline. The mapping is performed through the specified column file. Additionally, it extracts sample size, case number, control number, and variance of Y. Missing values in n_sample, n_case, and n_control are backfilled with median values.
Usage
load_rss_data(
sumstat_path,
column_file_path,
n_sample = 0,
n_case = 0,
n_control = 0,
region = NULL,
extract_region_name = NULL,
region_name_col = NULL,
comment_string = "#"
)Arguments
- sumstat_path
File path to the summary statistics.
- column_file_path
File path to the column file for mapping.
- n_sample
User-specified sample size. If unknown, set as 0 to retrieve from the sumstat file.
- n_case
User-specified number of cases.
- n_control
User-specified number of controls.
- region
The region where tabix use to subset the input dataset.
- extract_region_name
User-specified gene/phenotype name used to further subset the phenotype data.
- region_name_col
Filter this specific column for the extract_region_name.
- comment_string
Comment sign in the column_mapping file, default is #