Target selection and capture
Our screening illustrates the robustness of identifying ORFs that are
likely single-copy and orthologous from existing databases followed by
paralog detection based on ingroup transcriptomes. Our enrichment
strategy performed well and was not affected much by the unknown
intron-exon boundaries upon probe design. This result opens
opportunities to use ORF predictions from transcriptome assemblies in
other Metazoa as direct targets for probe design, provided orthology is
verified. Therewith, our strategy simplifies the development of genomic
datasets significantly, especially for non-model organisms. If many
short exons are expected, the use of shorter probes, e.g. of 80 nt, or
covering targets more densely with probes, e.g. at 3× or 4×, may further
enhance the capture efficiency.
The effectiveness of selecting UCEs for unionids with the PHYLUCE
pipeline is somewhat hampered because few and only distant genomes were
available for mollusks (Sigwart et al., 2021; Sun et al., 2019) compared
to other taxa for which UCE sets have been developed. Nevertheless, we
recovered hits for 1,895 (46%) of our target UCEs, which is comparable
to values obtained in some previous UCE studies (e.g. Kulkarni et al.,
2020; Starrett et al., 2017; Streicher et al., 2018), indicating that
our design worked. As is regularly the case in UCE studies (Buenaventura
et al., 2021; Faircloth et al., 2012; Kulkarni et al., 2020; Quattrini
et al., 2018; Starrett et al., 2017), the number of UCEs that can
eventually be included in the alignment for phylogenetic inference was
restricted to a subset of UCEs with high recovery across all ingroup
taxa (but see Branstetter et al., 2017). Phylogenetic analysis on 276
UCEs allowed to unambiguously reconstruct the backbone phylogeny of
Coelaturini and estimates of population genetic diversity from 309 UCEs
were comparable to those obtained from ORFs, but more similar to the
diversity at non-synonymous than at synonymous sites.
Combining ORFs and UCEs in the same probe set has resulted in
competition: Although UCEs account for over 25% of the probes, only
~1% of our reads cover UCE targets. A potential factor
of influence is the phylogenetic distance among the genomes used to
identify UCEs and our ingroup, compared to the selection of ORFs based
on ingroup transcriptomes. The recovery of 1,895 UCEs across our samples
despite having only ~1% of our reads mapping to UCE
targets indicates that the issue results from hybridization efficiency
rather than probe design, however. This result was unexpected based on a
previous integration of multiple types of markers (Hutter et al., 2019),
where no such competition was observed, but in that study the average
length of UCE targets was >700 bp, compared to
~145 bp in ours. As we did not find a relationships
between the length and recovery of UCEs (Fig. S2), the most likely
explanation is that differences in inherent properties of UCE and ORF
targets (e.g. mismatches to genomic libraries or differences in melting
temperatures) cause variation in sensitivity and specificity during
hybridization. This hypothesis is corroborated by the more restricted
recovery of UCEs compared to ORFs upon in silico mapping of reads
onto the Venustaconcha genome. UCE recovery could be enhanced by
altering the temperatures of hybridization and washing reactions, but
further work is required to better understand the balance of enrichment
across UCEs and ORFs.