Validation of the phasing approach using WES data
The parent-of-origin for the DNMs was initially assessed from the
short-read WES data. Short-reads, 100bp in length, encompassing DNM and
iSNPs in the WES data were available for 14 of the 109 DNMs, but only 9
(8%) (Figure 1.b1) had acceptable coverage (>10x per
allele, Table 1). These DNMs were also target amplified and sequenced
using ONT long-read sequencing to provide a control cohort for our
long-read phasing method. For all 9 DNMs with acceptable coverage in the
WES data, the parent-of-origin assignment agreed between the two
approaches and the DNM allele frequencies obtained were comparable
(Table 1). Interestingly, this was also true for the 5 WES phased
samples with limited sequencing coverage, suggesting that the stringent
requirement of >10x coverage for WES phasing could be
reduced to as little as 4x. For both short and long-read sequencing, 11
out of 14 DNMs showed allele frequencies around 50%
(+/- 10%), with no significant
third allelic form observed (supplementary Table 11), indicating that
these were germline DNM events. Of the three DNMs deviating from
prezygotic allele frequencies, two were determined as postzygotic
(SORCS2 and C10orf71), and one (SIGLEC10) may result from allelic
sequencing bias (supplementary Table 12). Coverage of the ONT long-read
sequencing for SIGLEC10 was significantly reduced compared to average
ONT sequencing coverage, and while both WES and ONT allele data showed
DNM allele frequency deviations from 50%, no significant third allelic
form was observed. In addition, the DNM base frequencies of both WES and
ONT data were within the prezygotic range of 50% +/- 10%.
For the DNM affecting the C10orf71 gene in patient 01247, the de
novo mutated allele (T-A) had a much lower allele frequency of 9% and
17% when detected with both WES and ONT approaches respectively, with
similar percentages observed in DNM base frequencies. Apart from the wt
allele and the DNM allele, we clearly observed a third allelic form that
represented the wt version of the DNM allele. The presence of this third
allelic form suggests that the DNM likely occurred as a postzygotic
event. The postzygotic mutation is observed in both the WES and ONT data
in this case. Postzygotic mutations can, however, be missed when WES has
low coverage, as seen for the DNM in SORCS2 in patient 01209. The
postzygotic DNM in SORCS2 presents an average discrepancy of 15% from
prezygotic norms in the WES base frequencies. A third allelic form is
not shown in the WES allele data due to only having 1x coverage. Greater
base discrepancies are observed in the ONT base frequencies, with an
average deviation from prezygotic norms of 38%, and with several
thousand times more coverage in the ONT data, a wt of the DNM allele is
observed at 18%.