2.3 Impact of filtering sex-linked loci on genetic diversity,
individual heterozygosity, genetic structure and parentage analyses
Population genetic diversity. Six measures of population
genetic diversity were calculated for ‘before’ and ‘after’ datasets:
observed (Ho) and expected heterozygosity (He), Wright’s fixation index
(F IS), polymorphism (P), number of private
alleles not present in any other population (PA), and allelic richness
(AR). Ho, He, F IS and PA were calculated withdartR package v2.0.4 (function gl.report.heterozygosity method =
‘pop’, and function gl.report.pa method = ‘one2rest’). AR was calculated
using hierfstat package v0.5-11 (function allelic.richness;
Goudet 2004). P was calculated as the proportion of loci that were
polymorphic in a given population.
Individual observed heterozygosity (Ho). Individual Ho was
calculated with dartR function gl.report.heterozygosity (method =
‘ind’). In order to measure whether individual Ho changed when
sex-linked loci were removed, we compared ‘before’ and ‘after’
individual Ho with a paired t-test (α = 0.05) per sex. We also tested
for significant differences in individual Ho between males and females
(independent sample t-test), with ‘before’ and ‘after’ datasets. Cohen’s
d was used to measure effect sizes.
Genetic structure. Genetic structure between populations was
qualitatively assessed with Pearson Principal Component Analyses (PCA,dartR function gl.pcoa). In order to reduce computation time,
loci whose Minor Allele Count (MAC) was below 3 were removed from all
datasets (dartR function gl.filter.maf, threshold = 3). We report
results for the first two PCs, but the six major PCs were explored.
Parentage analyses. Given the potential for sex-linked
chromosomes to affect the inference of parentage relationships, we
performed separate parentage analyses using ‘before’ and ‘after’
datasets. We analysed 677 EYR individuals, and 527 YTH individuals
(cassidix only). In both cases, MAC = 3 was applied to keep only
loci shared between at least two individuals in order to reduce
computation time. The genetic datasets for EYR consisted of 13,685 and
12,618 SNPs for the ‘before and ‘after’ datasets, respectively. Forcassidix , the ‘before’ dataset comprised 11,477 SNPs, and the
‘after’ dataset, 10,848 SNPs (Table 2).
Parentage analyses were run in COLONY v2.0.6.8 (Jones & Wang 2010). The
function gl2colony was used to transform the genetic datasets to
a COLONY input file. We assigned all individuals as candidate offspring,
all females as candidate mothers (EYR: n = 308, cassidix : n =
255), and all males as candidate fathers (EYR: n = 369, cassidix: n =
272). In the case of EYR, candidate parents for 203 offspring were
excluded based on year of birth, year of death (when known) and
excessive geographical distance (Austin et al. unpublished
manuscript ). For both species, we used a full-likelihood approach
(‘likelihood = 1’) with medium runs (‘length_run = 2’) at medium
precision (‘precision_fl = 1’). We assumed polygamy (‘polygamy_male =
0’, ‘polygamy_female = 0’) and a prior probability that the true parent
is present in the sample of 0.5 (‘probability_mother’,
‘probability_father’). Allele frequencies were not updated in order to
minimize computational time (‘update_allele_freq = 0’). Forcassidix , we indicated the presence of inbreeding (‘inbreed = 1’)
and set genotyping error to 0.05 (‘other_typ_err = 0.05@’) after
Robledo-Ruiz et al. (2022). Genotyping error for EYR was set to
empirically-determined 0.03, following Austin et al. (unpublished
manuscript ). Due to the stochasticity of the method implemented in
COLONY (Jones & Wang 2010), we performed five independent runs per
dataset (each with a different seed) to better explore the space of
potential pedigree configurations.
Parentage assignments per run were compared to a set of known parentage
relationships: 119 social EYR mothers observed consistently attending
the nest and incubating (Austin et al. unpublished manuscript ),
and 45 YTH known parent-offspring relationships from cassidixcaptive breeding (Robledo-Ruiz et al. 2022). The accuracy of parentage
assignments was measured in two ways: (i) by counting how many runs out
of five correctly identified a parent per known parentage relationship,
and comparing before and after averages using a paired t-test, and (ii)
by assigning as final parents those that were identified in at least
three out of five runs (following Robledo-Ruiz et al. 2022) and testing
whether the number of correct final assignments was positively
associated with the removal of sex-linked loci with a
χ2-test.
Minimum number of known-sex individuals forfilter.sex.linked function
We used both EYR and YTH datasets to estimate the number of sex-linked
loci that are identified with subsets of known-sex individuals of
variable size. We created eight subsets: 20, 24, 30, 40, 50, 100, 200
and 400 individuals chosen at random, all with 1:1 sex ratio, and
applied function filter.sex.linked to each. We then identified
the smallest subset of known-sex individuals with which it was still
possible to identify sex-linked loci, and tested whether those loci were
useful to sex the rest of the individuals and in turn, use the new sex
assignments to identify all sex-linked loci. For this, we created five
random subsets of known-sex individuals of the smallest size (24 and 30
known-sex individuals for EYR and YTH, respectively; see Results 3),
applied function filter.sex.linked followed by functioninfer.sex , and used the new sex assignments to re-runfilter.sex.linked .