loading page

Taxa: An R package implementing data standards and methods for taxonomic data
  • Zachary S.L. Foster,
  • Scott Chamberlain,
  • Niklaus J Grünwald
Zachary S.L. Foster
Department of Botany and Plant Pathology, Oregon State University, Covallis, OR 97331, USA

Corresponding Author:[email protected]

Author Profile
Scott Chamberlain
rOpenSci, University of California, Berkeley, CA 94720, USA
Author Profile
Niklaus J Grünwald
Horticultural Crops Research Laboratory, USDA Agricultural Research Service, Corvallis, OR 97330, USA
Author Profile

Abstract

The taxa R package provides a set of tools for defining and manipulating taxonomic data. The recent and widespread application of DNA sequencing to community composition studies is making large data sets with taxonomic information commonplace. However, compared to typical tabular data, this information is encoded in many different ways and the hierarchical nature of taxonomic classifications makes it difficult to work with. There are many R packages that use taxonomic data to varying degrees but there is currently no cross-package standard for how this information is encoded and manipulated. We developed the R package taxa to provide a robust and flexible solution to storing and manipulating taxonomic data in R and any application-specific information associated with it. Taxa provides parsers that can read common sources of taxonomic information (taxon IDs, sequence IDs, taxon names, and classifications) from nearly any format while preserving associated data. Once parsed, the taxonomic data and any associated data can be manipulated using a cohesive set of functions modeled after the popular R package dplyr. These functions take into account the hierarchical nature of taxa and can modify the taxonomy or associated data in such a way that both are kept in sync. Taxa is currently being used by the metacoder and taxize packages which provide broadly useful functionality that we hope will speed adoption by users and developers.