2 Materials and methods
2.1 Sample collection and virus isolation
A total of 314 rectal swabs were collected from eight intensive pig farm, located in Guangdong Province, China between 2017 and 2019. PDCoV isolation was performed using LLC porcine kidney (LLC-PK) cell line. The optimal cell culture conditions to isolate and propagate PDCoV in LLC-PK required Dulbecco’s modified eagle medium (DMEM, Gibco, USA), supplemented with 10% (v/v) fetal bovine serum (FBS, Gibco), and 10 μg/mL of trypsin (Chen et al., 2016; Hu et al., 2015; Jung, Miyazaki, Hu, & Saif, 2018). The result of isolation was confirmed by an indirect immunofluorescence assay (ELISA) and reverse transcription-polymerase chain reaction (RT-PCR) (Supplementary Tables S1).
2.2 Virus extraction and complete gene sequencing
Viral RNA was extracted from 200 μL of sample using TRIzol reagent (TaKaRa Biotechnology, Dalian, China) according to the manufacturer’s instructions, and nucleic acids were finally eluted in 30 μL of the RNase free H2O. Viral genomic RNA was immediately used for PCR or stored at −80 °C until use. In total, 15 pairs were designed for PCR reactions to sequence the PDCoV genomes (Supplementary Tables S2). Positive plasmids of PDCoV were extracted using a Plasmid Mini Kit I (200) (Omega Bio-tek, Norcross, GA, USA). Then the positive plasmids of PDCoV were sequenced by Sangon Biotech (Shanghai, China) Co., Ltd.
2.3 Sequence analysis and model test
We accessed all available PDCoV genomes sequences up to June 2021 from NCBI GenBank (https://www.ncbi.nlm.nih.gov/). A total of 130 PDCoV genomes (Supplementary Tables S3)—two of which were sequenced by our laboratory—and 130 spike genes sequences, were retrieved by PhyloSuite (version 1.2.2) to attain no identical sequences (Chen et al., 2016; Hu et al., 2015; Jung, Miyazaki, Hu, & Saif, 2018). These PDCoV genomes sequences were input to MAFFT (multiple sequence alignment based on fast Fourier transform) to align (Katoh, Kuma, Toh, & Miyata, 2005; Katoh, Misawa, Kuma, & Miyata, 2002). Then aligned sequences were input to Gblocks to determine the conserved domains and edited manually (Talavera & Castresana, 2007). The best fit nucleotide substitution models were tested by ModelFinder (version 1.6.8) (Kalyaanamoorthy, Minh, Wong, von Haeseler, & Jermiin, 2017).
2.4 Maximum-likelihood and Bayesian Inference analysis of the PDCoV
The best fit general time reversible substitution model with an empirical base frequencies and relaxing gamma distributed rate heterogeneity (GTR+F+R5) with 1000 bootstraps for Maximum-likelihood (ML) trees of PDCoV genomes was constructed by IQ-RTEE (version 1.6.8) according to corrected Akaike information criterion (AICc) (Minh et al., 2020).
The best fit general time reversible substitution model with a proportion of invariant sites, empirical base frequencies and gamma distributed rate heterogeneity (GTR+F+I+G4) with 2000000 generations for Bayesian Inference (BI) trees of PDCoV genomes was input to MrBayes (version 3.2.6) according to corrected Akaike Information Criterion (AICc) (Huelsenbeck & Ronquist, 2001; Ronquist et al., 2012; Ronquist & Huelsenbeck, 2003).
The final results were illustrated in the Interactive Tree of Life (iTOL, https://itol.embl.de/) (Letunic & Bork, 2007; 2021).
2.5 Phylogenetic analysis of spike gene
Bayesian evolutionary analysis by sampling trees (BEAST) package was used to estimate the time to the most recent common ancestor (tMRCA) and evolutionary rates. A total of 130 spike genes sequences were selected from 130 PDCoV genomes and input to IQ-TREE to reconstruct the ML tree. Then we used TempEst (version 1.5.3) to ensure that these sequences had sufficient temporal symbol (Rambaut, Lam, Max Carvalho, & Pybus, 2016). According to the ModelFinder (version 1.6.8) result for BEAST (version 1.10.4), the Bayesian Markov chain Monte Carlo (MCMC) method implemented in the BEAST (BEAST, version 1.10.4) package and BEAGLE was used (Ayres et al., 2012; Suchard & Rambaut, 2009), with the GTR+G nucleotide substitution model, a strict molecular clock model and the Coalescent Bayesian Skyline tree prior (Drummond, Rambaut, Shapiro, & Pybus, 2005; Ferreira & Suchard, 2008). The chain length was 500000000 with a sampling frequency every 50000 steps. Tracer software (version 1.7) was used to test that all parameter estimates yielded effective sampling size >200 and burning up the period of 10% of the total chain length (Rambaut, Drummond, Xie, Baele, & Suchard, 2018). The final Bayesian maximum clade credibility (MCC) tree was generated by TreeAnnotator (version 2.6.3) (http://beast.community/treeannotator) and illustrated in Figtree (version 1.4.3) (http://tree.bio.ed.ac.uk/software/figtree/). Then, to estimate the population dynamics of PDCoV, Bayesian Skyline Plot was used to reconstruct the population history. The results were shown by analysis of Bayesian Skyline Reconstruction in Tracer (version 1.7) (Barido-Sottani et al., 2018).
2.6 Phylogeographic inference
To estimate phylogenies and divergence times and explore phylogeography, the same preprocessing as 2.5 on a total of 130 spike genes sequences were input to BEAST (version 1.10.4) package, with the GTR+G nucleotide substitution model, an uncorrelated relaxed molecular clock model and the BSSVS (Bayesian Stochastic Search Variable Selection) traits model (Barido-Sottani et al., 2018)(Li & Drummond, 2012; Su et al., 2015), according to the results from path sampling and stepping-stone sampling (Supplementary Tables S4)(Lartillot & Philippe, 2006). The chain length was 500000000 with a sampling frequency every 50000 steps. Tracer software (version 1.7) was used to test that all parameter estimates yielded effective sampling size >200 and burning the period of 10% of the total chain length. The trees were generated by TreeAnnotator (version 2.6.3) (http://beast.community/treeannotator) and illustrated in Figtree (version 1.4.3) (http://tree.bio.ed.ac.uk/software/figtree/). The phylogeography results were analyzed by Spatial Phylogenetic Reconstruction of Evolutionary Dynamics (SPREAD3, version 0.9.7) (Bielejec, Rambaut, Suchard, & Lemey, 2011).
2.7 Analysis of spike protein
According to the features of PDCoV from GenBank, Illustrator for Biological Sequences (IBS, version 1.0.3) was used to visualize protein‐encoding segments in PDCoV (Liu et al., 2015). Amino acid sequences of spike protein were input to SWISS-MODEL to find proper models and illustrated by Visual Molecular Dynamics (VMD, version 1.9.4) (http://www.ks.uiuc.edu/) (Waterhouse et al., 2018). BioAider (version 1.334) was used to analyze mutation of these protein sequences (Zhou, Qiu, Pu, Huang, & Ge, 2020).