Reads a structured Excel workbook of SARS-CoV-2 Variants of Concern (VOC) and Variants of Interest (VOI) and returns a named list containing mutation profiles, nomenclature, temporal detection metadata, defining SNPs, and a fully citable reference table with interactive display support.
Arguments
- tibble
A
tibbleordata.frameproduced byread_excelfromSARS_CoV_2_VOC_VOI.xlsx. Column 1 must be theVariantfield column; columns 2 onward are one column per variant in the order listed above.- check
Logical. If
TRUE, validates that all internal vectors have equal length and stops with an informative message on mismatch. DefaultFALSE.
Details
Intended to be called once during data preparation via
data-raw/sarscov2_variants.R:
variants_dat <- readxl::read_excel("SARS_CoV_2_VOC_VOI.xlsx")
variants_list <- get_variants(variants_dat)
saveRDS(variants_list,
file = "inst/extdata/sarscov2_variants.rds")The saved object can then be loaded anywhere in the package or vignettes:
voc_data <- readRDS(system.file("extdata", "sarscov2_variants.rds",
package = "ViralEntropR"))Column order of variants in the Excel workbook: Alpha, Beta, Epsilon, Eta, Iota, Kappa, Delta, Lambda, Gamma, Zeta, Theta, Omicron.
Returned list elements:
WHO_LabelList. WHO variant label strings (e.g.
"Alpha").Pango_LineageList. Pango lineage designations.
GISAID_CladeCharacter vector. GISAID clade strings.
Nextstrain_CladeCharacter vector. Nextstrain clade strings.
Country_First_DetectedList. Country of first documented detection per variant.
Date_Earliest_SampleCharacter vector. Month-Year of the earliest documented sample per variant.
Date_First_DetectedCharacter vector. Month-Year of world-level first detection.
Date_First_Detected_USCharacter vector. Month-Year of first US detection.
Spike_MutationsList. Spike protein mutation strings read from the Excel workbook.
Mutation_SitesList. Integer site positions extracted from
Spike_Mutations.Defining_SNPsList. Canonical defining SNP strings per variant (
NAwhere not characterised).Defining_SNP_SitesList. Integer positions extracted from
Defining_SNPs.ReferencesNamed list with three elements:
$data— data frame of 21 verified references;$display(variant = NULL)— interactivedatatable, optionally filtered by WHO label;$cite(variant)— character vector of formatted citation strings suitable for manuscript use.