Covariate Data Preprocessing#
This notebook contains workflow of processing covariate files and computes PCA-derived covariates from phenotype data.
Miniprotocol Timing#
This represents the total duration for all miniprotocol phases. While module-specific timings are provided separately on their respective pages, they are also included in this overall estimate.
Timing < 3 minutes
Overview#
This workflow is an application of the covariate related sections from the xQTL project pipeline.
covariate_formatting.ipynb
(step i): Merge covariates and genotype PCAcovariate_hidden_factor.ipynb
(step ii): Compute residual on merged covariates and perform hidden factors analysis
Steps#
i. Merge Covariates and Genotype PCs#
You can edit the total amount of variation you want your PCs to explain by editing the --k
parameter. In this example, we chose 80%.
sos run pipeline/covariate_formatting.ipynb merge_genotype_pc \
--cwd output/covariate/ \
--pcaFile output/genotype/genotype_pca/wgs.merged.plink_qc.plink_qc.prune.pca.rds \
--covFile data/covariate/covariates.tsv \
--tol-cov 0.4 \
--k `awk '$3 < 0.8' output/genotype/genotype_pca/wgs.merged.plink_qc.plink_qc.prune.pca.scree.txt | tail -1 | cut -f 1 `
Anticipated Results#
Processed covariate data includes a file with covariates and hidden factors for use in TensorQTL.