This function formats the input summary statistics dataframe with uniform column names to fit into the SuSiE pipeline. Column standardization is performed via MungeSumstats::standardise_header(), with an optional custom column mapping file for additional non-standard names. Additionally, it extracts sample size, case number, control number, and variance of Y. Missing values in n_sample, n_case, and n_control are backfilled with median values.
Usage
load_rss_data(
sumstat_path,
column_file_path = NULL,
n_sample = 0,
n_case = 0,
n_control = 0,
region = NULL,
extract_region_name = NULL,
region_name_col = NULL,
comment_string = "#"
)Arguments
- sumstat_path
File path to the summary statistics.
- column_file_path
Optional file path to a custom column mapping file for non-standard column names not recognized by MungeSumstats.
- n_sample
User-specified sample size. If unknown, set as 0 to retrieve from the sumstat file.
- n_case
User-specified number of cases.
- n_control
User-specified number of controls.
- region
The region where tabix use to subset the input dataset.
- extract_region_name
User-specified gene/phenotype name used to further subset the phenotype data.
- region_name_col
Filter this specific column for the extract_region_name.
- comment_string
Comment sign in the column_mapping file, default is #