2.4 Bioinformatics
Quality control of the reads was performed within the Torrent Server of
the sequencer, using the default settings. Reads were aligned using the
bwa-mem v. 0.7.17-r1188 aligner (Li & Durbin, 2009). The complete
genome sequence of the “Wuhan-Hu-1” SARS-CoV-2 isolate, GenBank Acc.
No. NC_045512 (F. Wu et al., 2020) was used as a reference. The aligned
reads were subjected to both reference-guided assembly and variant
calling. Through the SAMtools software (Li et al., 2009), the %
coverage on the reference sequence and the mean sequencing depth were
calculated. In order to further identify differences that may point a
direction of the chain of the transmission, quasispecies analysis was
performed. The LoFreq v.2.1.4 software (Wilm et al., 2012) was used to
identify low frequency variants (quasispecies) present in the sequenced
samples, since it is able to detect variants that exist in a few aligned
reads, while evaluating those based on quality metrics (cut-off p= 0.01). Due to the fact that the SARS-CoV-2 genome contains long
stretches of homopolymers, quasispecies analysis was focused on single
nucleotide polymorphisms (SNPs). SnpEff v.4.5covid19 (Cingolani et al.,
2012) was used for annotation of variants, so as to be assigned to a
specific viral protein. SnpEff utilizes a genomic feature file (gff)
that contains all the information on the viral protein structure and
intervals on the reference sequence. The positions of non-synonymous
(missense) variants were plotted onto the viral genome. The multiple
sequence alignment was computed through Clustal Omega (Sievers &
Higgins, 2014), using the default parameters.
The phylogenetic tree encompassing European and European-related
international isolates was implemented through the NextStrain platform
(Hadfield et al., 2018). Tree calculations were based on the
maximum-likelihood method, with branch size taking into consideration
temporal data (Price, Dehal, & Arkin, 2010). Branch colors represent
GISAID clade annotation, while branch names on the tree are the
NextStrain clades (Bogner, Capua, Cox, & Lipman, 2006; Hadfield et al.,
2018). Lineage assignment was achieved via the Phylogenetic Assignment
of Named Global Outbreak LINeages (PANGOLIN) SARS-CoV-2 lineage assigner
interface (Rambaut et al., 2020).