Parse and classify scientific plant names into taxonomic components: genus, specific epithet, infraspecific rank, infraspecific epithet, and author.
Output is aligned to a backbone convention:
Orig.Genusin Title Case (first letter uppercase, rest lowercase).Orig.SpeciesandOrig.Infraspeciesepithets in lowercase.Infra.Rankin lowercase (subsp.,var.,subvar.,f.,subf.).Authoris recovered from the input and preserved in its original casing/punctuation (no forced uppercasing).Orig.Nameis reconstructed as: genus + species + (rank + infra) + author.
Robustness rules:
cf./aff.are removed from parsing but preserved as flags (has_cf,has_aff).Hybrid markers (
x/\u00D7) as standalone tokens are removed withhad_hybrid = TRUE.sp./spp.triggers genus-only classification (Rank = 1,Orig.Species = NA) and setsis_sp/is_spp.If an infraspecific rank is present but the infraspecific epithet is missing, sets
rank_missing_infra = TRUEand keepsInfra.RankwhileOrig.Infraspecies = NA.If rank appears "late" (after author-like tokens), parsing is best-effort and
rank_late = TRUE.If there is no explicit rank and a third token exists, the function can infer an unranked infraspecific epithet when the third token looks epithet-like (all lowercase), and does not look like the start of an author. In that case
implied_infra = TRUE,Orig.Infraspeciesis filled,Infra.Rank = NA, andRank = 3.
Value
A tibble with one row per input name and standardized columns/flags:
- sorter
Numeric index of original order.
- Input.Name
Original input string as provided by user.
- Orig.Name
Reconstructed standardized name aligned to backbone + original-cased author.
- Orig.Genus
Genus in Title Case.
- Orig.Species
Specific epithet in lowercase, or
NAfor genus-only (sp./spp.).- Author
Recovered author string (original casing/punctuation) or
"".- Orig.Infraspecies
Infraspecific epithet in lowercase (ranked or implied), or
NA.- Infra.Rank
Infraspecific rank in lowercase (
subsp.,var.,subvar.,f.,subf.), orNA.- Rank
Numeric level:
1genus-only,2genus+species,3includes infraspecific epithet.- has_cf,has_aff,is_sp,is_spp,had_hybrid,rank_late,rank_missing_infra,had_na_author,implied_infra
Logical flags.
Examples
library(wcvpmatch)
classify_spnames(c("Opuntia sp.", "Rosa canina subsp. coriifolia (Fr.) Leffler"))
#> Warning: Undetermined species indicator detected ('sp.'/'spp.'). Classified at genus
#> level only; Orig.Species set to NA for
#> • Opuntia sp.
#> # A tibble: 2 × 18
#> sorter Input.Name Orig.Name Orig.Genus Orig.Species Author Orig.Infraspecies
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 Opuntia sp. Opuntia Opuntia NA "" NA
#> 2 2 Rosa canina… Rosa can… Rosa canina "(Fr.… coriifolia
#> # ℹ 11 more variables: Infra.Rank <chr>, Rank <dbl>, has_cf <lgl>,
#> # has_aff <lgl>, is_sp <lgl>, is_spp <lgl>, had_hybrid <lgl>,
#> # rank_late <lgl>, rank_missing_infra <lgl>, had_na_author <lgl>,
#> # implied_infra <lgl>
classify_spnames(c("Cydonia japonica tricolor")) # implied unranked infra epithet
#> # A tibble: 1 × 18
#> sorter Input.Name Orig.Name Orig.Genus Orig.Species Author Orig.Infraspecies
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 Cydonia jap… Cydonia … Cydonia japonica "" tricolor
#> # ℹ 11 more variables: Infra.Rank <chr>, Rank <dbl>, has_cf <lgl>,
#> # has_aff <lgl>, is_sp <lgl>, is_spp <lgl>, had_hybrid <lgl>,
#> # rank_late <lgl>, rank_missing_infra <lgl>, had_na_author <lgl>,
#> # implied_infra <lgl>