Skip to contents

This function formats the input summary statistics dataframe with uniform column names to fit into the SuSiE pipeline. The mapping is performed through the specified column file. Additionally, it extracts sample size, case number, control number, and variance of Y. Missing values in n_sample, n_case, and n_control are backfilled with median values.

Usage

load_rss_data(
  sumstat_path,
  column_file_path,
  n_sample = 0,
  n_case = 0,
  n_control = 0,
  region = NULL,
  extract_region_name = NULL,
  region_name_col = NULL,
  comment_string = "#"
)

Arguments

sumstat_path

File path to the summary statistics.

column_file_path

File path to the column file for mapping.

n_sample

User-specified sample size. If unknown, set as 0 to retrieve from the sumstat file.

n_case

User-specified number of cases.

n_control

User-specified number of controls.

region

The region where tabix use to subset the input dataset.

extract_region_name

User-specified gene/phenotype name used to further subset the phenotype data.

region_name_col

Filter this specific column for the extract_region_name.

comment_string

Comment sign in the column_mapping file, default is #

Value

A list of rss_input, including the column-name-formatted summary statistics, sample size (n), and var_y.