Distribution of UCEs and CADD scores
The 4976 UCEs along the 34 chromosomes of the chicken reference genome are not evenly distributed (Fig.2A), 15 chromosomes were significantly depleted for UCEs, whilst 9 chromosomes were significantly enriched for UCEs (Supplementary Table 1). Figure 2B shows the distribution of all chCADD scores along a single UCE (UCE-2729) and its 2000 bp flanking region on chromosome 1. The chCADD scores in the flanking region are lower than those within the UCE, except for a potential coding region (e.g., position 116230300 – 116230450 in Figure. 2B). Protein coding genes are typified by a combination of high chCADD scores (representing the first and second codon position substitutions), and low chCADD scores (third codon position substitutions).
Figure 2C shows the distribution of chCADD scores along chromosome 1 of the chicken genome. Most chCADD scores fall below 10, which per definition represent 90% of all scores. The right-hand tail represents few high chCADD scores of highly deleterious mutations. In contrast, the UCEs and their flanking regions in chromosome 1 have a bimodal distribution of chCADD scores, with a second peak of chCADD scores ranging between 17 and 18 (Figure 2D). These chCADD scores represent the worst, ∼2% of all possible substitutions in the genome. The median chCADD score of UCEs is significantly higher than that of the flanking regions (Mann-Whitney test W = 4541885925, p-value < 0.0001). Whilst the frequency of derived mutations is significantly lower at UCEs compared to that at the flanking regions (Mann-Whitney test W = 13010970, p-value < 0.0001), consistent with the effect of purifying selection.