Using data to enable conservation planning
Most forms of biodiversity data can be used for environmental management and conservation planning, and tailoring analysis to the available data is critical. Target development can use two major approaches, either focusing on species distribution mapping, or, if data of sufficient resolution, quality and quantity are not available, attempting to map diversity patterns and reconcile richness patterns in the face of bias. If sufficient data are available either models or filtered convex polygons can be used to map species ranges. Whereas when insufficient data are available and richness is mapped (Liu et al., 2022; Orr et al., 2021; Potapov et al., 2023), inventories of species richness are used and richness itself is reprojected using models. These approaches can reproject richness patterns to a reasonable degree, if sufficient inventories have been carried out across all major environmental conditions, and assuming that biogeographic differences will not influence overall richness patterns. These approaches are useful in groups where insufficient data are available for higher resolution analysis, and can also be used to identify areas for further research if there is a potential for high hidden diversity (Orr et al. 2021; Kass et al. 2022).
For large spatial or taxonomic scopes, if species-level analysis is impossible (the majority of global analyses) then interpolation-based methods are likely to be the most appropriate. In these types of studies, one should employ either a subsampling approach (Qiao et al., 2023) or interpolation based on community level inventories (to model richness overall rather than individual species ranges: Orr et al., 2021 Liu et al., 2022; Potapov et al., 2023). For subsampling, there is still a minimum data requirement, as most areas lack data. Thus, for well sampled taxa such as birds, it can be fairly widely applied (e.g. almost all urban areas), but there is so little sampling for most taxa that an index-based approach may not be possible without then interpolating. Approaches based on biodiversity indices (Hill numbers, Shannon, Simpson, etc.) all require both a minimum number of samples and a minimum coverage (Qiao et al., 2023). Species area curves are a common way to estimate completeness for any given region, yet these assume representative coverage throughout that region and a localised inventory of a small proportion may asymptote even when it is not representative of the whole area to which the assessment is applied. Thus, for such curves to be useful first assessments of the percentage of the area with data is needed.
For many taxa, including most invertebrates, interpolation based on modelling is needed. Such methods rely on interpolating richness based on community-level samples and using species modelling techniques to relate richness data to conditions present. Inevitably, this method also involves assumptions about the representativeness of the data. For a community projection approach, a minimum sample-size and species number should be used (to remove the possibility of selective sampling or overrepresentation of generalists to the neglect of specialists), and all biome types should be represented so that the richness (or richness index) of these varying biomes and conditions can be assayed. However, it should be noted that such an approach will assume that there are no biogeographic variations in drivers between regions, and consequently cannot be applied to oceanic islands, as such models cannot inherently incorporate biogeographic processes or dispersal. Thus, for interpolation approaches to be applied, the number of records per species, and even to a degree the accuracy of identification within sites is less important (provided it is consistent within a site), and provided there is coverage across environmental conditions these approaches provide a powerful mechanism for global analysis, enabling analysis even in poorly-known regions.
For species-specific approaches, both the volume and accuracy of the data must be substantially higher, as they are much more vulnerable to spatial bias and sensitive to data errors, with even greater consequences for poorly known species. Firstly, data must be clean and accurate for any species-level assessment, so cleaning checks and filtering of bad records is a critical first step (see Box 1). The first question is whether the data are sufficiently representative for species level analysis both in terms of taxa, and the region under analysis, furthermore any form of species level assessment requires sufficient data to assess range. When examined critically, public data sources alone are insufficient for modelling most species (Garcia-Rosello et al., 2023), even across vertebrates, so some of the most diverse regions might be underestimated.
Sophisticated models can be developed for well-sampled individual species using approaches such as Maxent, or other species niche modelling methods can be applied (though many of these will map all relevant habitats, lacking any geospatial reference point to differentiate functional and realised niches). However, such models have very high data requirements, as sufficient and even data must exist from across a species’ range to effectively model its distribution and pair it with environmental characteristics. This means that, unless considerable effort is devoted to collating representative global data with many partners, or taxa are already well sampled, sophisticated models may not be representative or appropriate. Assessing these models, not only using statistical approaches (AUC, Boyce index, AIC) alone is also not sufficient, and work with experts to assess if ranges capture species ranges is also likely needed to assess whether they are reliable, and also recognise biogeographic boundaries (which may be missed in models, especially in complex areas or where there are major differences between fundamental and realised niche). MCPs may also be used when data are scarce or analysis is regional, but understanding how to curate data is a first essential step before mapping species ranges (Zizka et al. 2019; Ribeiro et al. 2022; Dorey et al. In review ).
Filtering for success
For basic analysis of large numbers of species, automated and repeatable pipelines are critical. Creating an MCP is one method to delimit the majority of a species’ known range. However, for vertebrates it has been known for centuries that species have finer-grain habitat requirements, and even in IUCN maps the need to refine habitat within the range polygon is becoming a basic standard (Lack, 1953; Brooks et al., 2019); points may completely surround cities or other unsuitable regions, yet the species may no longer be present there. Failure to remove clearly unsuitable habitat would both dramatically increase range size and could reduce the proportion of range protected (as much of a city is developed). Coastal filters are also needed, as a failure to realistically trim MCPs may render oceans suitable for land animals.
Sensible filters can transform species ranges and entirely rearrange diversity patterns. To demonstrate how decisions on data-refinement and cleaning impact on range sizes and degree of protection, we selected a range of species and imposed different levels of filtering on the data, all of which can be conducted with small datasets, or when some species may have small volumes of data available. This includes adding spatial filters, adding a habitat filter, trimming by coastline, and comparing it to known IUCN ranges for species. It should be noted that most IUCN ranges are also inaccurate and overinflate species ranges (Li et al., 2019; Hughes et al, 2021c), yet uncritical MCPs are exponentially larger (whilst still missing parts of the range as they will not capture species range limits, where abundance is typically lower). For example, an IUCN range is only 7-8% the size of those recovered using basic MCPs for the species shown here (as in Chowdhary et al., 2023a). If these ranges are being mapped to assess hotspots for protection, or the degree of protection, then the area covered and the location will entirely determine the outcomes of assessment, and if care to filter data appropriately is not applied, then analysis on such data may have little relationship with the real patterns of distribution or degree of protection of species.
Even when more carefully delineated ranges (IUCN, birdlife, GARD: http://www.gardinitiative.org/) are likely to overestimate the degree of protection, their area is still smaller than an MCP, especially if a habitat filter is not applied (Table 1). We used a general habitat filter, so more specialist filters and other steps outlined throughout could greatly improve range estimates and make them more similar to those in expert range maps (de Barros et al., 2021; Huang et al., 2020; Xu et al., 2022). In all cases, the lack of filtering means ranges are projected as many times larger than they are likely to be. Thus, as we show here, the cleaning of data can transform where species are mapped, richness patterns, and the efficacy of protection. We selected a range of species for which sufficient data exist to map ranges, and where the IUCN has mapped ranges for comparison (thus most of our examples are mammals, though one bee, B. dahlbomii , is also present), our previous work has also examined the prevalence of biases in these types of data, and how they persist across taxa (Hughes et al2021b, 2021c; Li et al., 2019).
Table 1. Percentage of species range protected with different filters applied for species minimum convex polygons (MCP), as well as for International Union for the Conservation of Nature (IUCN) ranges. The filters that were applied are noted in column headers: Hem-hemisphere filter, Coast-removal of ocean areas within the polygon, habitat-a simple habitat filter based on basic classifications of land-use types. We used the species Ailuropoda melanoleuca (Carnivora: Ursidae),Bombus dahlbomii (Hymenoptera: Apidae), Panthera onca andPanthera tigris (Carnivora: Felidae), Priodontes maximus(Cingulata: Chlamyphoridae), Tapirus pinchaque and Tapirus terrestris (Perissodactyla: Tapiridae).