Skip to contents

This function loads genotype, phenotype, and covariate data for a specific region and performs data preprocessing.

Usage

load_regional_association_data(
  genotype,
  phenotype,
  covariate,
  region,
  conditions,
  maf_cutoff = 0,
  mac_cutoff = 0,
  xvar_cutoff = 0,
  imiss_cutoff = 0,
  association_window = NULL,
  extract_region_name = NULL,
  region_name_col = NULL,
  keep_indel = TRUE,
  keep_samples = NULL,
  keep_variants = NULL,
  phenotype_header = 4,
  scale_residuals = FALSE,
  tabix_header = TRUE
)

Arguments

genotype

PLINK bed file containing genotype data.

phenotype

A vector of phenotype file names.

covariate

A vector of covariate file names corresponding to the phenotype file vector.

region

A string of chr:start-end for the phenotype region.

conditions

A vector of strings representing different conditions or groups.

maf_cutoff

Minimum minor allele frequency (MAF) cutoff. Default is 0.

mac_cutoff

Minimum minor allele count (MAC) cutoff. Default is 0.

xvar_cutoff

Minimum variance cutoff. Default is 0.

imiss_cutoff

Maximum individual missingness cutoff. Default is 0.

association_window

A string of chr:start-end for the association analysis window (cis or trans). If not provided, all genotype data will be loaded.

extract_region_name

A list of vectors of strings (e.g., gene ID ENSG00000269699) to subset the information when there are multiple regions available. Default is NULL.

region_name_col

Column name containing the region name. Default is NULL.

keep_indel

Logical indicating whether to keep insertions/deletions (INDELs). Default is TRUE.

keep_samples

A vector of sample names to keep. Default is NULL.

phenotype_header

Number of rows to skip at the beginning of the transposed phenotype file (default is 4 for chr, start, end, and ID).

scale_residuals

Logical indicating whether to scale residuals. Default is FALSE.

tabix_header

Logical indicating whether the tabix file has a header. Default is TRUE.

Value

A list containing the following components:

  • residual_Y: A list of residualized phenotype values (either a vector or a matrix).

  • residual_X: A list of residualized genotype matrices for each condition.

  • residual_Y_scalar: Scaling factor for residualized phenotype values.

  • residual_X_scalar: Scaling factor for residualized genotype values.

  • dropped_sample: A list of dropped samples for X, Y, and covariates.

  • covar: Covariate data.

  • Y: Original phenotype data.

  • X_data: Original genotype data.

  • X: Filtered genotype matrix.

  • maf: Minor allele frequency (MAF) for each variant.

  • chrom: Chromosome of the region.

  • grange: Genomic range of the region (start and end positions).

  • Y_coordinates: Phenotype coordinates if a region is specified.