2.4 Bioinformatics
Quality control of the reads was performed within the Torrent Server of the sequencer, using the default settings. Reads were aligned using the bwa-mem v. 0.7.17-r1188 aligner (Li & Durbin, 2009). The complete genome sequence of the “Wuhan-Hu-1” SARS-CoV-2 isolate, GenBank Acc. No. NC_045512 (F. Wu et al., 2020) was used as a reference. The aligned reads were subjected to both reference-guided assembly and variant calling. Through the SAMtools software (Li et al., 2009), the % coverage on the reference sequence and the mean sequencing depth were calculated. In order to further identify differences that may point a direction of the chain of the transmission, quasispecies analysis was performed. The LoFreq v.2.1.4 software (Wilm et al., 2012) was used to identify low frequency variants (quasispecies) present in the sequenced samples, since it is able to detect variants that exist in a few aligned reads, while evaluating those based on quality metrics (cut-off p= 0.01). Due to the fact that the SARS-CoV-2 genome contains long stretches of homopolymers, quasispecies analysis was focused on single nucleotide polymorphisms (SNPs). SnpEff v.4.5covid19 (Cingolani et al., 2012) was used for annotation of variants, so as to be assigned to a specific viral protein. SnpEff utilizes a genomic feature file (gff) that contains all the information on the viral protein structure and intervals on the reference sequence. The positions of non-synonymous (missense) variants were plotted onto the viral genome. The multiple sequence alignment was computed through Clustal Omega (Sievers & Higgins, 2014), using the default parameters.
The phylogenetic tree encompassing European and European-related international isolates was implemented through the NextStrain platform (Hadfield et al., 2018). Tree calculations were based on the maximum-likelihood method, with branch size taking into consideration temporal data (Price, Dehal, & Arkin, 2010). Branch colors represent GISAID clade annotation, while branch names on the tree are the NextStrain clades (Bogner, Capua, Cox, & Lipman, 2006; Hadfield et al., 2018). Lineage assignment was achieved via the Phylogenetic Assignment of Named Global Outbreak LINeages (PANGOLIN) SARS-CoV-2 lineage assigner interface (Rambaut et al., 2020).