Pedigrees complement genomic study design and inference
The relationship information and metadata captured by pedigrees are an
invaluable tool to help design and implement genomic research. For
example, without pedigree data, there is no way to know how
representative reference genomes actually are. Consequently, pedigrees
provide biologically-relevant data to inform a non-biased selection of
individuals for building representative and high-quality reference
genomes. When selecting an individual for a reference genome for species
with genetic sex determination, some researchers have preferred
selecting either the homogametic sex to ensure adequate coverage of the
homogametic sex chromosome (i.e., X or Z), or the heterogametic sex to
capture the alternative and often highly repetitive sex chromosome
(i.e., Y or W; Tomaszkiewicz, Medvedev, & Makova, 2017; Rhie et al.,
2021). In addition to helping select the candidates for sampling based
on sex, pedigrees can also identify individuals that are likely to be
highly inbred, which assists with genome assembly by reducing error
associated with ambiguity between heterozygosity and genetic paralogues
(Hahn, Zhang, and Moyle 2014; Rhie et al., 2021). Further, detailed
pedigrees can enable selection of parent-offspring trios to generate
phased de novo genome assemblies (Korbel & Lee, 2013; Koren et
al., 2018; Leitwein, Duranton, Rougemont, Gagnaire, & Bernatchez,
2020). High-quality de novo reference genomes are a powerful
resource for the conservation and evolutionary genomics community by
facilitating read mapping (Card et al., 2014), mining for genes of
interest (e.g., Greenhalgh et al., 2021), and SNP discovery and
genotyping (e.g., Galla et al., 2019; Brandies, Peel, Hogg, & Belov,
2019; Gooley et al., 2020). To further characterise variants across the
genome, including structural variants (SVs), pedigrees may be used to
inform the curation of a pangenome, which is the assembly of multiple
individuals with the aim to capture all standing genomic diversity in a
population or species of interest (Tettelin et al. 2005; Brockhurst et
al., 2019). In this instance, a pedigree can be leveraged to identify
distantly related individuals to ensure the pangenome is representative
(Wold et al. 2021 pre-print).
Pedigree data is also an invaluable resource for selecting individuals
for resequencing (i.e., whole genome resequencing, or WGS). For example,
a pedigree can inform the choice of closely related family groups for
genomic inquiry (e.g., Galla et al., 2020), understanding characterized
phenotypes of interest (Nersisyan, Nikoghosyan, & Arakelyan, 2019), or
when maximizing representative genomic diversity across a species
(Robinson et al., 2021). In the case of sable antelope
(Hippotragus niger ; Gooley et al., 2020) the software program
PedSam (https://sites.uwm.edu/latch/software-2/) was used to streamline
the selection of individuals representative of founder diversity across
many managed populations for downstream diversity comparisons. In a
recent study in California condors, individuals with low inbreeding and
kinship coefficients were selected using the pedigree, and were compared
in terms of runs of homozygosity using WGS (Robinson et al., 2021). When
familial relationships are known via pedigrees, this information can
also be used to validate whether molecular genetic and genomic
approaches (e.g., extraction, amplification, library preparation, or
sequencing) produce data that are consistent with biologically-relevant
expectations or experienced error along the way (see Galla et al., 2020
for details).
Beyond informing the individuals sampled for molecular studies,
pedigrees can be pivotal to successful genetic variant discovery. For
many conservation genomic research projects, variants (e.g., SNPs, SVs)
are used as markers to identify and measure diversity (Hohenlohe, Funk,
& Rajora, 2020; Wold et al., Preprint ). Artefacts from library
preparation, sequencing, and bioinformatic processing can lead to false
variants in datasets, which can bias downstream analyses (O’Leary,
Puritz, Willis, Hollenbeck, & Portnoy, 2019). In addition to adequate
filtering for sequencing depth and Hardy-Weinberg equilibrium, validated
pedigrees can be used as one tool for filtering false datasets from
variants using Mendelian inheritance. This approach has long been used
in the field of human genetics for marker validation, and in one study,
was able to reduce marker error rates by 50% (Chen et al., 2013). A
study in the pedigreed population of Florida scrub jays shows great
promise for this approach, identifying sex-linked and false SNPs from a
reduced representation data set (Chen, Van Hout, Gottipati, & Clark,
2014). Further, variant discovery for the critically endangered kākāpō
is being informed by Mendelian inheritance, creating a high quality
variant data set for all individuals of this species (Joseph Guhlin,Personal comm. ). Because genomic research for species of
conservation concern is often budget-constrained, datasets are often
hampered by low sequencing depth and subsequent missing data. In the
fields of human and crop genetics, imputation (e.g., completing missing
data sets with likely alleles using algorithms) is one option for
addressing large amounts of missing data (Hickey, Kinghorn, Tier, van
der Werf, & Cleveland, 2012; Sargolzaei, Chesnais, & Schenkel, 2015).
When coupled with genotypic information from family groups, this
approach can increase the likelihood of accurate imputation, even of
rare alleles (Ullah et al., 2019). While imputation is not currently
practiced in conservation or ecological genomics, we anticipate it is
only a matter of time before it will be explored, especially for species
with large genomes that are costly to sequence at high depths (e.g.,
some fish, insects, and plants; Mao et al., 2020) or as a cost-effective
option for conservation programs that can only sequence at low depths.