Compute LD (Linkage Disequilibrium) Correlation Matrix from Genotypes
Source:R/misc.R
compute_LD.RdComputes a pairwise Pearson correlation matrix from a genotype matrix. Supports two variance conventions:
"sample"Standard sample variance with N-1 denominator (default). Uses mean imputation for missing genotypes, then
Rfast::cora(if available) or basecor()."population"Population variance with N denominator, matching GCTA-style tools (e.g. DENTIST, GCTA –make-grm). Per-SNP means are computed from non-missing values; missing entries are set to zero after centering so they do not contribute to cross-products. Cross-products are normalized by the total sample count N, not by pairwise non-missing counts.
Usage
compute_LD(X, method = c("sample", "population"), trim_samples = FALSE)Arguments
- X
Numeric genotype matrix (samples x SNPs). May contain
NAfor missing genotypes.- method
Character, either
"sample"(default, N-1 denominator) or"population"(N denominator, GCTA-style). Partial matching is supported.- trim_samples
Logical. If
TRUEandmethod = "population", drops trailing samples so thatnrow(X)is a multiple of 4, matching PLINK .bed file chunk processing. Ignored whenmethod = "sample". Default isFALSE.
Details
Missing data handling.
With method = "sample", missing values are mean-imputed per SNP
before computing the full Pearson correlation matrix.
With method = "population", per-SNP means are computed from
non-missing values, the matrix is centered, then NAs are set to 0
so that missing pairs contribute nothing to the cross-product.
The denominator is always the total sample count N
(after optional trimming), matching the original GCTA formula:
$$\text{Var}(X_i) = E[X_i^2] - E[X_i]^2$$
$$\text{Cor}(X_i, X_j) = \frac{\text{Cov}(X_i, X_j)}{\sqrt{\text{Var}(X_i)\,\text{Var}(X_j)}}$$
Zero-variance SNPs.
Any monomorphic SNP will have zero variance, producing NaN
correlations. These are set to 0 in the returned matrix; the diagonal
is forced to 1.