Iterative greedy algorithm that removes related individuals exceeding a
kinship threshold. First reduces large connected components via graph-based
pruning (removing highest-degree nodes), then applies
plinkQC::relatednessFilter iteratively until no related pairs remain.
Usage
filter_relatedness(
relatedness,
relatedness_threshold = 0.0625,
analysis_type = c("maximize_unrelated", "maximize_cases"),
relatedness_iid1 = "IID1",
relatedness_iid2 = "IID2",
relatedness_fid1 = NULL,
relatedness_fid2 = NULL,
relatedness_value = "PI_HAT",
pheno_data = NULL,
pheno_col = "pheno",
other_criterion = NULL,
other_criterion_threshold = NULL,
other_criterion_direction = "ge",
other_criterion_iid = "IID",
other_criterion_measure = NULL,
max_component_size = 20L,
reduce_fraction = 0.05,
max_iterations = 20L,
verbose = FALSE
)Arguments
A data.frame of pairwise relatedness estimates (e.g. KING .kin0 output). Must contain columns for IID1, IID2, and relatedness value.
Kinship threshold above which individuals are considered related (default 0.0625, i.e. 2nd degree).
- analysis_type
One of
"maximize_unrelated"(default) or"maximize_cases". The latter preserves cases in case-control studies.Column name for first individual ID (default "IID1").
Column name for second individual ID (default "IID2").
Column name for first family ID (default NULL).
Column name for second family ID (default NULL).
Column name for the relatedness measure (default "PI_HAT").
- pheno_data
A data.frame with columns
IIDand the column named bypheno_col. Required whenanalysis_type = "maximize_cases".- pheno_col
Column name for the phenotype (default "pheno"). Expected to be binary (1 = case, 0 = control).
- other_criterion
Optional data.frame with additional filtering criteria (passed to
plinkQC::relatednessFilter).- other_criterion_threshold
Threshold for additional criterion.
- other_criterion_direction
Direction for threshold comparison (default "ge").
- other_criterion_iid
Column name for individual ID in criterion data (default "IID").
- other_criterion_measure
Column name for the criterion measure.
- max_component_size
Maximum component size before graph-based pre-pruning (default 20).
- reduce_fraction
Fraction of highest-degree nodes to remove per iteration during pre-pruning (default 0.05).
- max_iterations
Maximum plinkQC iterations for resolving remaining related pairs (default 20).
- verbose
Logical, print progress messages (default FALSE).