This function loads genotype, phenotype, and covariate data for a specific region and performs data preprocessing.
Usage
load_regional_association_data(
genotype,
phenotype,
covariate,
region,
conditions,
maf_cutoff = 0,
mac_cutoff = 0,
xvar_cutoff = 0,
imiss_cutoff = 0,
association_window = NULL,
extract_region_name = NULL,
region_name_col = NULL,
keep_indel = TRUE,
keep_samples = NULL,
keep_variants = NULL,
phenotype_header = 4,
scale_residuals = FALSE,
tabix_header = TRUE
)Arguments
- genotype
PLINK bed file containing genotype data.
- phenotype
A vector of phenotype file names.
- covariate
A vector of covariate file names corresponding to the phenotype file vector.
- region
A string of chr:start-end for the phenotype region.
- conditions
A vector of strings representing different conditions or groups.
- maf_cutoff
Minimum minor allele frequency (MAF) cutoff. Default is 0.
- mac_cutoff
Minimum minor allele count (MAC) cutoff. Default is 0.
- xvar_cutoff
Minimum variance cutoff. Default is 0.
- imiss_cutoff
Maximum individual missingness cutoff. Default is 0.
- association_window
A string of chr:start-end for the association analysis window (cis or trans). If not provided, all genotype data will be loaded.
- extract_region_name
A list of vectors of strings (e.g., gene ID ENSG00000269699) to subset the information when there are multiple regions available. Default is NULL.
- region_name_col
Column name containing the region name. Default is NULL.
- keep_indel
Logical indicating whether to keep insertions/deletions (INDELs). Default is TRUE.
- keep_samples
A vector of sample names to keep. Default is NULL.
- phenotype_header
Number of rows to skip at the beginning of the transposed phenotype file (default is 4 for chr, start, end, and ID).
- scale_residuals
Logical indicating whether to scale residuals. Default is FALSE.
- tabix_header
Logical indicating whether the tabix file has a header. Default is TRUE.
Value
A list containing the following components:
residual_Y: A list of residualized phenotype values (either a vector or a matrix).
residual_X: A list of residualized genotype matrices for each condition.
residual_Y_scalar: Scaling factor for residualized phenotype values.
residual_X_scalar: Scaling factor for residualized genotype values.
dropped_sample: A list of dropped samples for X, Y, and covariates.
covar: Covariate data.
Y: Original phenotype data.
X_data: Original genotype data.
X: Filtered genotype matrix.
maf: Minor allele frequency (MAF) for each variant.
chrom: Chromosome of the region.
grange: Genomic range of the region (start and end positions).
Y_coordinates: Phenotype coordinates if a region is specified.