Phylogenetic supermatrix assembly
We compiled a species-level supermatrix of genetic data for the
Columbidae plus selected outgroups using sequence data downloaded from
NCBI (https://www.ncbi.nlm.nih.gov/). This used specified exemplar
reference gene sequences drawn from the key Columbiformes phylogenetic
references, with BLASTn+organism NCBI database searches. The emphasis
was on widely sampled loci used in published phylogenetic studies with a
reasonable proportion of taxa and major lineages. The key references are
provided in the Supplementary File, and genes and GenBank accessions
listed in Table S1. Species-level taxonomy followed the International
Ornithology Council World Bird List v. 9.2
https://www.worldbirdnames.org/updates/. Recent phylogenomic analyses of
the neoaves (Jarvis et al. , 2014; Prum et al. , 2015; Reddyet al. , 2017) indicated the Pteroclidae and Mesitornithidae as
outgroups to the Columbidae. As there is patchy gene sequence for these
taxa, we pooled data for each lineage to create composite family
representatives, along with a composite Cuculiformes taxon as a further
outgroup. Genetic data was aligned with MAFFT (v. 7.245) (Katoh &
Standley, 2013) using the local-pair (L-INS-i) algorithm, alignments
assembled into a custom Microsoft Excel database, and nomenclature
rationalized to IOC9.2 (with the help of cross-referencing via Wikipedia
using common names). Gene trees were inferred by IQ-TREE (Nguyenet al. , 2015) ultrafast bootstrap consensus (Hoang et al. ,
2018), using models of sequence evolution identified by ModelFinder as
implemented in IQTREE (Kalyaanamoorthy et al. , 2017). These trees
were then scanned for non-monophyletic genera and species (using the
custom script GTREER5), and the database updated by excluding aberrant
accessions or in some cases revising nomenclature. Where necessary
sequence sets were then realigned (as above).
Some long genes are routinely sequenced in fragments (e.g. RAG-1, COI),
so in order to maximize data for the COI gene we also used a consensus
method where the alignment was reduced down to a single consensus
sequence per species, based on the most common base per site (with ties
scored as ambiguous). This in effect picks the most commonly sequenced
sub-lineages, and is a simple way to combine data and discount aberrant
sequences (wrong loci/taxa etc). These consensus alignments were then
subjected to the same procedure of gene tree and genera monophyly scans
as above. We also used mitogenome data as follows. As across the
relevant taxa gene order is preserved, we first aligned the entire
mitogenomes then excised the set of commonly used genes and added the
sequences to their respective alignments. The remainder (referred to as
mtg-block) was kept as a separate alignment, after deleting the
non-coding D-loop region. Concatenated supermatrix sequence data then
used the best (longest accepted) single exemplar sequence per gene per
species. These gene alignments were then compacted by removing regions
with little or no data (<10% taxa per gene) and ambiguous
alignment regions (via GTREER5, ALISCORE v2.2; Misof & Misof 2009).
Two versions of supermatrix - with and without the mtg-block - were
analysed; the latter to avoid distorting the result due to biased
mitogenome sampling (missing from several key groups) and nucleotide
saturation effects on relative divergence (especially for deeper
outgroup lineages). Final analyses used the supermatrix without the
mtg-block (as the six genes add enough well-sampled mtDNA, and
empirically results were very similar). This final supermatrix comprised
247 out of 344 recognised pigeon species (72% complete) including
sections of four nuclear and six mitochondrial gene loci, amounting to
11,100 sites 39% data-complete; 1,125,420 defined bases in 1,262
sequences (including 63 COI consensus) from 1527 accessions out of a
total database of 3,639 accessions (with 37 rejected). Of 49 IOC9.2
Columbidae genera only three (all monotypic) were missing
(Starnoenas , Microgoura and Cryptophaps ).