2.4 | Taxonomic and functional annotation
The genes of the RGC were translated into amino acid sequences using
NCBI Genetic Codes 11 (Kozak, 1983). Taxonomic annotation of amino acid
sequences was performed by Kaiju v1.8.0 (Menzel, Ng, & Krogh, 2016) and
the NCBI NR database (released on Feb 24th, 2021),
providing a detailed overview of the taxonomical composition of SNMs gut
microbiome with the parameters ‘-a greedy -e 5 -E 0.01 -v -z 5’.
Functional annotation was performed using the Kyoto Encyclopedia of
Genes and Genomes databases (KEGG, Release 100.0, genes from animals or
plants were excluded) and Evolutionary genealogy of genes:
Non-supervised Orthologous database (eggNOG, v5.0) on the basis of
Diamond (v0.9.24) with the parameters ‘–evalue 1e-5 -k 1’ (Benjamin
Buchfink, 2015). KEGG and eggNOG annotations were performed using an
in-house pipeline, where each protein was assigned to a KEGG orthologous
group (KO) or eggNOG orthologous group (OG) when the highest-scoring
annotated hits contained at least one alignment with over 60 hits (Qin
et al., 2012). Carbohydrate enzyme annotation was carried out through
the Carbohydrate-Active enZYmes Database (CAZymes). We mapped protein
sequences to entries in the hidden Markov model (HMM) libraries of
CAZyme families downloaded from the CAZy database (CAZyDB.07312019) with
the hmmscan program in HMMER (3.1b2) (Pattabiraman & Warnow, 2021).