Distribution of UCEs and CADD scores
The 4976 UCEs along the 34 chromosomes of the chicken reference genome
are not evenly distributed (Fig.2A), 15 chromosomes were significantly
depleted for UCEs, whilst 9 chromosomes were significantly enriched for
UCEs (Supplementary Table 1). Figure 2B shows the distribution of all
chCADD scores along a single UCE (UCE-2729) and its 2000 bp flanking
region on chromosome 1. The chCADD scores in the flanking region are
lower than those within the UCE, except for a potential coding region
(e.g., position 116230300 – 116230450 in Figure. 2B). Protein coding
genes are typified by a combination of high chCADD scores (representing
the first and second codon position substitutions), and low chCADD
scores (third codon position substitutions).
Figure 2C shows the distribution of chCADD scores along chromosome 1 of
the chicken genome. Most chCADD scores fall below 10, which per
definition represent 90% of all scores. The right-hand tail represents
few high chCADD scores of highly deleterious mutations. In contrast, the
UCEs and their flanking regions in chromosome 1 have a bimodal
distribution of chCADD scores, with a second peak of chCADD scores
ranging between 17 and 18 (Figure 2D). These chCADD scores represent the
worst, ∼2% of all possible substitutions in the genome. The median
chCADD score of UCEs is significantly higher than that of the flanking
regions (Mann-Whitney test W = 4541885925, p-value < 0.0001).
Whilst the frequency of derived mutations is significantly lower at UCEs
compared to that at the flanking regions (Mann-Whitney test W =
13010970, p-value < 0.0001), consistent with the effect of
purifying selection.