Phylogenetic analysis
Near-identical sequences were clustered in order to generate Operational
Taxonomic Units (OTUs) using the CLUSTER command of VSEARCH (Rognes et
al., 2016) with an identity cut-off of 97%. Then, OTUs with only single
read (singletons) were removed. Taxonomy was assigned to OTU list using
the R-Syst::diatom v7.1 reference database (Rimet et al., 2016) and the
BLASTn algorithm (Altschul et al., 1997) with a minimum identity value
of 85%. Next, those DNA sequences that did not belong to diatom phylum
(Bacillariophyta) were filtered out. All remaining OTU sequences were
scanned by EMBOSS Getorf for Open Reading Frames (ORFs) with more than
251 bp (Rice, Longden, & Bleasby, 2000). Then, read values of resulting
OTUs were normalized among samples to the most abundant value (82,446
reads). Finally, OTUs with a normalized read value less than 0.005 %
were filtered out according with Bokulich et al. (2013).
For the purpose of obtain an overview of the taxonomic assigned OTUs we
computed two phylogenetic trees. In order to save computational time, we
build a reference phylogenetic tree using the 708 reference sequences
that matched with our OTU inventory. In addition, we computed another
phylogenetic tree using the same reference sequences and the 3138
taxonomy-assigned OTU sequences. We use MAFFT v7 program (Katoh and
Standley, 2013) with the default settings to align all DNA sequences.
The Maximum Likelihood (ML) phylogenies were constructed using the tool
RAxML (Randomized Accelerated Maximum Likelihood) implemented on the
CIPRES Portal (Miller, Pfeiffer, & Schwartz, 2010) using the GTRCATI
(Generalized Time Reversible Model + optimization of substitution rates
+ optimization of site-specific evolutionary rates) as model of
evolution and 1000 replicates for the bootstrap analysis. Phylogenetic
trees were visualized with FigTree v1.4.3 (Rambaut, 2016).