Skip to contents

Iterative greedy algorithm that removes related individuals exceeding a kinship threshold. First reduces large connected components via graph-based pruning (removing highest-degree nodes), then applies plinkQC::relatednessFilter iteratively until no related pairs remain.

Usage

filter_relatedness(
  relatedness,
  relatedness_threshold = 0.0625,
  analysis_type = c("maximize_unrelated", "maximize_cases"),
  relatedness_iid1 = "IID1",
  relatedness_iid2 = "IID2",
  relatedness_fid1 = NULL,
  relatedness_fid2 = NULL,
  relatedness_value = "PI_HAT",
  pheno_data = NULL,
  pheno_col = "pheno",
  other_criterion = NULL,
  other_criterion_threshold = NULL,
  other_criterion_direction = "ge",
  other_criterion_iid = "IID",
  other_criterion_measure = NULL,
  max_component_size = 20L,
  reduce_fraction = 0.05,
  max_iterations = 20L,
  verbose = FALSE
)

Arguments

relatedness

A data.frame of pairwise relatedness estimates (e.g. KING .kin0 output). Must contain columns for IID1, IID2, and relatedness value.

relatedness_threshold

Kinship threshold above which individuals are considered related (default 0.0625, i.e. 2nd degree).

analysis_type

One of "maximize_unrelated" (default) or "maximize_cases". The latter preserves cases in case-control studies.

relatedness_iid1

Column name for first individual ID (default "IID1").

relatedness_iid2

Column name for second individual ID (default "IID2").

relatedness_fid1

Column name for first family ID (default NULL).

relatedness_fid2

Column name for second family ID (default NULL).

relatedness_value

Column name for the relatedness measure (default "PI_HAT").

pheno_data

A data.frame with columns IID and the column named by pheno_col. Required when analysis_type = "maximize_cases".

pheno_col

Column name for the phenotype (default "pheno"). Expected to be binary (1 = case, 0 = control).

other_criterion

Optional data.frame with additional filtering criteria (passed to plinkQC::relatednessFilter).

other_criterion_threshold

Threshold for additional criterion.

other_criterion_direction

Direction for threshold comparison (default "ge").

other_criterion_iid

Column name for individual ID in criterion data (default "IID").

other_criterion_measure

Column name for the criterion measure.

max_component_size

Maximum component size before graph-based pre-pruning (default 20).

reduce_fraction

Fraction of highest-degree nodes to remove per iteration during pre-pruning (default 0.05).

max_iterations

Maximum plinkQC iterations for resolving remaining related pairs (default 20).

verbose

Logical, print progress messages (default FALSE).

Value

A character vector of individual IDs to exclude.