Skip to contents

Identifies outlier samples in a numeric matrix (e.g., PCA scores) using Mahalanobis distance with chi-squared-based p-values. Useful for QC in genotype PCA or expression PCA workflows.

Usage

detect_outliers_mahalanobis(x, prob = 0.99, pval_threshold = 0.05)

Arguments

x

Numeric matrix (samples x features). Rownames are used as sample IDs in the output.

prob

Numeric in (0, 1); quantile threshold for the Mahalanobis distance cutoff (default 0.99).

pval_threshold

P-value threshold for outlier classification (default 0.05). A sample is flagged only if its distance exceeds the quantile cutoff and its p-value is below this threshold.

Value

A data.frame with columns:

sample_id

Row names from x, or row indices if unnamed.

mahal

Mahalanobis distance.

pvalue

Chi-squared p-value (df = number of features).

is_outlier

Logical; TRUE if distance > quantile cutoff and p-value < pval_threshold.