3.3. Whole genome and ORF 1 a/b sequencing analysis indicates that Peruvian PDCoV strain originated from an US PDCoV strain
The nucleotide sequence of Peruvian PDCoV strain, identified as PDCoV/Peru/isolate/2019, was submitted to GenBank under the accession number MT227371. Our Peruvian PDCoV genome follows similar patterns with other PDCoV genome sequences deposited in GenBank. Thus, this strain is 25,501 nt in length and consists of, excluding the polyA tail: 5’-UTR (1-480 nt), ORF1a/b (481-11368 nt, 11368-19283 nt), S (19265-22747 nt), E (22741-22992 nt), M (22985-23638 nt), NS6 (23638-23922 nt), N (23943-24971 nt), NS7 (24037-24639 nt) and 3’-UTR (24972-25501 nt). A graphical representation of the characterized PDCoV strain is shown in Figure 2.
Phylogenetic analysis has typically been performed using key major genes of any organism of interest. However, this analysis tends to limit the analysis to a certain gene or group genes. Conversely, whole genome sequencing offers a more complete and deeper genetic characterization compared to partial approaches. In our study, we took advantage of next generation sequencing of our PDCoV strain to track its evolutionary origin. Our results indicated that our Peruvian strain belongs to the North American phylogroup and is closely related to a PDCoV strain from the US isolated in 2015 (99.5% of nucleotide identity). Genetic distance of the Peruvian PDCoV strain with other PDCoV analysed reveals high similarity between 97.1 and 99.5%. Compared to the US strains, the Peruvian PDCoV has a nucleotide identity between 99.45 - 99.51%. Percentages range from 98.6 to 98.74% when compared to the Chinese strains. Finally, nucleotide identity is 97% and 97.5% for Thai and Vietnamese strains, respectively. A summary of nucleotide identity is shown in Table 2. Further analysis based on ORF 1 a/b showed identical topology to the whole genome sequence phylogenetic tree. Altogether, these results indicate that the virus detected in Peru has emerged from a North American ancestor (see Figure 3A and 3B). Similarly, PDCoV protein sequence analysis resembled the topology of the nucleotide analysis.