1.1 Function filter.sex.linked
Purpose: Detecting and filtering out sex-linked loci.
Input: One genlight object with at least 30 individuals of
known sex (15 of each sex; see Results section 3), and a user-specified
parameter declaring the sex-determination system of the species (‘zw’ or
‘xy’). Known sex is provided in ‘ind.metrics’ with a column named ‘sex’
and individuals assigned ‘F’ (females) or ‘M’ (males). Individuals with
unknown sex (i.e., assigned anything other than ‘F’ or ‘M’) are ignored
by the function.
How it works: The rationale behind this function is that the
scoring rate and heterozygosity of autosomal loci should not differ
between the sexes, but they do differ for sex-linked loci. Based on
this, the function works in two phases:
Phase I. Use locus call rate to identify W-linked/Y-linked loci
and other loci with sex-biased call rates. The function counts, for each
locus, the number of known females and the number of known males with NA
(i.e., missing data) and with a called genotype (i.e., ‘0’, ‘1’ or ‘2’).
These four counts are used to build a 2 × 2 contingency table per locus
on which a Fisher’s exact test is performed in order to test for the
independence of call rate and sex (α = 0.05). The logic is that
autosomal loci should present roughly the same call rate for males and
females (Figure 2a, diagonal cloud in gray), and therefore, a locus in
which one sex has significantly more missing data than the other is
likely to be sex-linked. The p-values of all loci are adjusted for False
Discovery Rate with R function p.adjust (Benjamini & Hochberg,
1995). Of the loci with adjusted p < 0.05, those whose male
call rate is ≤ 0.1 are assigned as W-linked (because males lack a W
chromosome; Figure 2a, in yellow), or as Y-linked if female call rate is
≤ 0.1 (because females lack a Y chromosome). Remaining loci with
adjusted p < 0.05 are identified as ‘sex-biased’ (Figure 2a,
in blue).Phase II. Use locus heterozygosity to identify Z-linked/X-linked
loci and gametologs. The function counts, for each locus, the number of
known females and the number of known males that are heterozygous (i.e.,
‘1’), and homozygous (i.e., ‘0’ or ‘2’). In the same way as forPhase I, these four counts are used to build a 2 × 2 contingency
table per locus and to perform a Fisher’s exact test to test for the
independence of heterozygosity and sex (α = 0.05). Under the logic that
autosomal loci should present no difference in proportion of
heterozygous individuals between sexes (Figure 2b, diagonal cloud in
dark gray), a locus in which one sex has significantly more heterozygous
individuals than the other is likely to be sex-linked. P-values are
adjusted for False Discovery Rate with R function p.adjust
(Benjamini & Hochberg, 1995). Of the loci with adjusted p <
0.05, those whose proportion of heterozygous males is greater than the
proportion of heterozygous females are identified as Z-linked (because
females have only one Z chromosome, and should be mainly scored as
homozygous; Figure 2b, in orange). On the other hand, loci whose
proportion of heterozygous females is larger than the proportion of
heterozygous males are identified as gametologs (because males have two
Z chromosomes, and thus should present only the Z-associated allele and
be scored as homozygous; Figure 2b, in green). The same logic, with
reversed expectations for sexes, is applied to XY-sex determination
system (X-linked: proportion of heterozygous females >
proportion of heterozygous males; gametologs: proportion of heterozygous
males > proportion of heterozygous females).
The loci that are not identified as belonging to any category of
sex-linkage are inferred autosomal. The function finishes by splitting
each category of loci into its own genlight object.