Abstract
Calculating spatial ranges of species and individuals is a crucial problem throughout ecology. However, sample size biases can be strong, and defining range boundaries can be difficult. These hurdles can be overcome by calculating areas without calculating boundaries. The first step is to algorithmically define a graph that connects the spatial points where observations have been made. The routine generates a small number of short edges that form a pattern resembling a mosaic. The edge lengths are summed, squared, divided by the edge count, and multiplied by a known constant to obtain a total area estimate for the shape. This non-parametric mosaic area method can work with irregular outlines and clumped point distributions. It is more accurate than convex hull, kernel density, and hypervolume estimation according to simulation analyses. Mosaic area calculations can be used in areas ranging all the way from conservation biology to morphometrics.
INTRODUCTION
One of the most fundamental problems in theoretical ecology is estimating the extent of a shape in two-dimensional space from point data. Two categories of data are relevant: occurrences of species and of individuals.
Species ranges are important at large scales because geographic range patterns are a bedrock of biogeography and macroecology, telling us about such things as provincialism (Kreft & Jetz 2010) and latitudinal diversity gradients (Lawrence & Fraser 2020). Estimating ranges based on expert opinion, species distribution modelling, or otherwise is of great importance in conservation biology (Maréchaux et al. 2017).
At the scale of individuals, home ranges have been studied intensively by wildlife biologists for decades (Burt 1943). The availability of large data sets derived from GPS technology calls the value of the concept into question (Kie et al. 2010), but interspecific comparisons of home range data are of such broad interest that this information remains relevant. For example, the allometry of home range size is a classical topic in macroecology (Kelt 2001).
Shape areas also come up in the field of niche modelling, which addresses high-dimensional spaces in addition to two-dimensional spaces (Blonder et al. 2014; Junker et al. 2016; Qiao et al. 2016). Additionally, the field of multivariate morphometrics is relevant: estimating the area of occupancy of a morphospace by points representing species or individuals is fundamentally the same problem. It has often been tackled in the past by computing statistics that are not explicitly spatial, such as mean pairwise distances (Foote 1991), because high-dimensional spaces are often considered. However, the connection is clear.
The full list of subjects that rely on area estimation is presumably much larger. Given the breadth and depth of interest in the topic, it comes as no surprise that a plethora of methods has been proposed. The most simple is to grid observations and count occupied squares. Gridded data have been used extensively and for many years in macroecology (Simpson 1964). Under the name ”area of occupancy”, they are still used for threat status evaluation (IUCN Standards and Petitions Committee 2019) by the International Union for the Conservation of Nature (IUCN). This approach is not without merits, because occupancy can be used to estimate population size (He 2012). However, the values are scale-dependent, and gridding will underestimate if sampling is sparse relative to the scale of interest (Hartley & Kunin 2003).
Another simple alternative is to compute a convex hull around the observations, i.e., to create a minimum convex polygon, which was a popular approach in wildlife biology for many years (Hayne 1949). Convex hulls also tend to underestimate, although they will overestimate if there are holes in distributions or if there are large outliers. But likewise, the IUCN continues to use this method for determining the ”extent of occupation” of a species, a second major criterion for threat status evaluation (IUCN Standards and Petitions Committee 2019). Indeed, both approaches are still considered to be central by conservation biology researchers, not just the IUCN (Smith et al. 2020).
Nonetheless, field-based ecologists are strongly cognizant of bias in convex hull areas, so alternatives such as kernel density estimation have long been commonplace in that area (Worton 1989). The IUCN guidelines mention this approach only in passing (IUCN Standards and Petitions Committee 2019). A hybrid method called local convex hull nonparametric kernel estimation also is used by wildlife biologists, but its performance has been questioned (Lichti & Swihart 2011).
There are many methods other than kernel density estimation, some quite sophisticated. Recently, for example, computation of hypervolumes (Blonder et al. 2014) has become popular with niche modellers. This method assumes the data are bivariate normal or elliptical in their distribution, which is problematic and has been critiqued (Qiao et al. 2016), and which some researchers have tried to address (Jarviset al. 2019). However, the method’s popularity earns it serious attention. Meanwhile, palaeobiologists have used other methods such as computing maximum great circle distances. This approach makes sense when the data follow a linear trend (Foote et al. 2008), but it has the drawback of putting aside most of the data points.
In any event, many existing approaches have three major flaws addressed with the new method proposed here. First, they can systematically underestimate or overestimate, depending on their properties. Consistent accuracy is a rare property. Second, they may not be particularly accurate when the data points form clumped or irregular patterns. Finally, methods that depend on a series of flexible options and parameters yield results that are indecisive and therefore not very interpretable.
As I will explain, all three problems can be solved by creating a network of points that resembles a mosaic and using the edge lengths to obtain an area estimate. This method, which has been implemented in an R package called mosaic , has a variety of additional applications. For example, areas of overlap between ranges are directly computable, areas of multi-dimensional shapes can be approximated, and the method allows for identifying outliers by breaking long edges.
Ecologists have used graph theory in the past, but only when working on selected topics such as landscape analysis (Foltête et al. 2020). The method outlined here is unrelated to any of this work. For example, existing methods that concern area estimation are founded on entirely different theory (Keith, Spring & Kompas 2019).
Before detailing the new approach, it is important to mention what this paper is and is not about. The goal is to estimate range area, not range shape. However, mosaic patterns are more intuitive approximations of range shapes than are convex hulls because they need not be convex. More importantly, range area per se is of central concern to biogeographers, macroecologists, allometricians, niche modellers, and even the IUCN. Second, this not a comparative benchmarking analysis. The only goal is to show that the method performs well, not to definitively prove that it outperforms every proposed alternative. Thus, comparisons will be limited to three things that are of general interest: convex hulls, kernel density estimators, and hypervolumes. Finally, many readers will have come to expect that every paper on range estimation methods will be graced with many equations and framed in terms of complex and most often parametric process models. This is not one of them. Instead, I will argue that a simple method should be taken seriously because it makes sense and it works.
MATERIAL AND METHODS