Skip to contents

This function aligns variant names from two strings containing variant names in the format of "chr:pos:A1:A2" or "chr:pos_A1_A2". The first string should be the "source" and the second should be the "reference".

Usage

align_variant_names(
  source,
  reference,
  remove_indels = FALSE,
  remove_build_suffix = TRUE
)

Arguments

source

A character vector of variant names in the format "chr:pos:A2:A1" or "chr:pos_A2_A1".

reference

A character vector of variant names in the format "chr:pos:A2:A1" or "chr:pos_A2_A1".

remove_build_suffix

Whether to strip trailing genome build suffixes like ":b38" or "_b38" before alignment. Default TRUE.

Value

A list with two elements: - aligned_variants: A character vector of aligned variant names. - unmatched_indices: A vector of indices for the variants in the source that could not be matched.

Examples

source <- c("1:123:A:C", "2:456:G:T", "3:789:C:A")
reference <- c("1:123:A:C", "2:456:T:G", "4:101:G:C")
align_variant_names(source, reference)
#> $aligned_variants
#> [1] "1:123:A:C" "2:456:T:G" "3:789:C:A"
#> 
#> $unmatched_indices
#> [1] 3
#>