Load regional association data — load_regional_association

This function loads genotype, phenotype, and covariate data for a specific region and performs data preprocessing.

Usage

load_regional_association_data(
  genotype,
  phenotype,
  covariate,
  region,
  conditions,
  maf_cutoff = 0,
  mac_cutoff = 0,
  xvar_cutoff = 0,
  imiss_cutoff = 0,
  association_window = NULL,
  extract_region_name = NULL,
  region_name_col = NULL,
  keep_indel = TRUE,
  keep_samples = NULL,
  keep_variants = NULL,
  phenotype_header = 4,
  scale_residuals = FALSE,
  tabix_header = TRUE
)

Arguments

genotype: PLINK bed file containing genotype data.
phenotype: A vector of phenotype file names.
covariate: A vector of covariate file names corresponding to the phenotype file vector.
region: A string of chr:start-end for the phenotype region.
conditions: A vector of strings representing different conditions or groups.
maf_cutoff: Minimum minor allele frequency (MAF) cutoff. Default is 0.
mac_cutoff: Minimum minor allele count (MAC) cutoff. Default is 0.
xvar_cutoff: Minimum variance cutoff. Default is 0.
imiss_cutoff: Maximum individual missingness cutoff. Default is 0.
association_window: A string of chr:start-end for the association analysis window (cis or trans). If not provided, all genotype data will be loaded.
extract_region_name: A list of vectors of strings (e.g., gene ID ENSG00000269699) to subset the information when there are multiple regions available. Default is NULL.
region_name_col: Column name containing the region name. Default is NULL.
keep_indel: Logical indicating whether to keep insertions/deletions (INDELs). Default is TRUE.
keep_samples: A vector of sample names to keep. Default is NULL.
phenotype_header: Number of rows to skip at the beginning of the transposed phenotype file (default is 4 for chr, start, end, and ID).
scale_residuals: Logical indicating whether to scale residuals. Default is FALSE.
tabix_header: Logical indicating whether the tabix file has a header. Default is TRUE.

Value

A list containing the following components:

residual_Y: A list of residualized phenotype values (either a vector or a matrix).
residual_X: A list of residualized genotype matrices for each condition.
residual_Y_scalar: Scaling factor for residualized phenotype values.
residual_X_scalar: Scaling factor for residualized genotype values.
dropped_sample: A list of dropped samples for X, Y, and covariates.
covar: Covariate data.
Y: Original phenotype data.
X_data: Original genotype data.
X: Filtered genotype matrix.
maf: Minor allele frequency (MAF) for each variant.
chrom: Chromosome of the region.
grange: Genomic range of the region (start and end positions).
Y_coordinates: Phenotype coordinates if a region is specified.