3.5 Gene family cluster identification and phylogenetic analysis
We compared genomes of Chinese walnut and eight other plants based on 523 single-copy orthologs (Figure 4a). The number of single-copy orthologs in the genome of Chinese walnut was similar to Arabidopsis, the percentage of the Chinese walnut genome occupied by single-copy orthologs was higher than all other species in the comparison exceptQ. robur and C. sinensis . (Figure 4a). We identified 125,530 orthologous gene families that consist of 310,273 genes, with 9,906 orthogroups containing proteins from all species (Figure 4b, Table S10). We further compared Chinese walnut with three cultivated woody species: Persian walnut (J. regia ), apple (M. domestica ) and olive (O. europaea ). We found 17.4 % (10,321/59,377) of all gene families existed in all four species, while 5 % (2,972) were specific to walnuts (Juglans ) (Figure 4a; Figure S5). When the two Juglans species were compared, 457 genes were specific to Chinese walnut, and 1,704 gene families were shared in both walnut genomes. We discovered 399 gene families were expanded in Chinese walnut compared to all other species in the phylogenetic tree, and 1,528 were contracted (Figure 4b). As a comparison, Chinese walnut’s close relativeJuglans regia , showed 2,025 expanded gene families (5-fold more than J. cathayensis ) and 243 contracted gene families (about 1/7 the number in the Chinese walnut genome) (Figure 4b).
We constructed a phylogenetic tree of these nine plant species with the monocot rice (Oryza sativa ) as outgroup. The phylogeny was based on 552 single-copy orthologous genes (Figure 4b). As expected, the two closely related walnut species clustered on a branch with 100 % bootstrap support (Figure 4b). The divergence between Chinese walnut and Persian walnut was estimated to have occurred ~28 Mya (Figure 4b).
We investigated whether any whole genome duplication (WGD) events have occurred during Chinese walnut evolution. We identified a total of 86 synteny blocks and 5,614 genes in all blocks that covered 20.1 % of Chinese walnut genome (Figure 3; Table S11). We calculated the density distribution of the Ks values for the paired genes within each syntenic genomic block based on the collinear blocks between the genomes of Chinese walnut and Persian walnut (Figure 5). The peak of Ks was ~0.25 and ~1.5 for orthologous gene pairs between the two walnuts, indicating that ancestors of these two walnuts evolved through two ancient WGD events (Figure 5). Because the peak of Ks was ~0 for orthologous gene pairs between Chinese walnut and Persian walnut, their genomes reflect recent species differentiation (Figure 5).