Load summary statistic data — load_rss

This function formats the input summary statistics dataframe with uniform column names to fit into the SuSiE pipeline. Column standardization is performed via MungeSumstats::standardise_header(), with an optional custom column mapping file for additional non-standard names. Additionally, it extracts sample size, case number, control number, and variance of Y. Missing values in n_sample, n_case, and n_control are backfilled with median values.

Usage

load_rss_data(
  sumstat_path,
  column_file_path = NULL,
  n_sample = 0,
  n_case = 0,
  n_control = 0,
  region = NULL,
  extract_region_name = NULL,
  region_name_col = NULL,
  comment_string = "#"
)

Arguments

sumstat_path: File path to the summary statistics.
column_file_path: Optional file path to a custom column mapping file for non-standard column names not recognized by MungeSumstats.
n_sample: User-specified sample size. If unknown, set as 0 to retrieve from the sumstat file.
n_case: User-specified number of cases.
n_control: User-specified number of controls.
region: The region where tabix use to subset the input dataset.
extract_region_name: User-specified gene/phenotype name used to further subset the phenotype data.
region_name_col: Filter this specific column for the extract_region_name.
comment_string: Comment sign in the column_mapping file, default is #

Value

A list of rss_input, including the column-name-formatted summary statistics, sample size (n), and var_y.