Skip to contents

This function formats the input summary statistics dataframe with uniform column names to fit into the SuSiE pipeline. Column standardization is performed via MungeSumstats::standardise_header(), with an optional custom column mapping file for additional non-standard names. Additionally, it extracts sample size, case number, control number, and variance of Y. Missing values in n_sample, n_case, and n_control are backfilled with median values.

Usage

load_rss_data(
  sumstat_path,
  column_file_path = NULL,
  n_sample = 0,
  n_case = 0,
  n_control = 0,
  region = NULL,
  extract_region_name = NULL,
  region_name_col = NULL,
  comment_string = "#"
)

Arguments

sumstat_path

File path to the summary statistics.

column_file_path

Optional file path to a custom column mapping file for non-standard column names not recognized by MungeSumstats.

n_sample

User-specified sample size. If unknown, set as 0 to retrieve from the sumstat file.

n_case

User-specified number of cases.

n_control

User-specified number of controls.

region

The region where tabix use to subset the input dataset.

extract_region_name

User-specified gene/phenotype name used to further subset the phenotype data.

region_name_col

Filter this specific column for the extract_region_name.

comment_string

Comment sign in the column_mapping file, default is #

Value

A list of rss_input, including the column-name-formatted summary statistics, sample size (n), and var_y.