Abstract
Calculating spatial ranges of species and individuals is a crucial
problem throughout ecology. However, sample size biases can be strong,
and defining range boundaries can be difficult. These hurdles can be
overcome by calculating areas without calculating boundaries. The first
step is to algorithmically define a graph that connects the spatial
points where observations have been made. The routine generates a small
number of short edges that form a pattern resembling a mosaic. The edge
lengths are summed, squared, divided by the edge count, and multiplied
by a known constant to obtain a total area estimate for the shape. This
non-parametric mosaic area method can work with irregular outlines and
clumped point distributions. It is more accurate than convex hull,
kernel density, and hypervolume estimation according to simulation
analyses. Mosaic area calculations can be used in areas ranging all the
way from conservation biology to morphometrics.
INTRODUCTION
One of the most fundamental problems in theoretical ecology is
estimating the extent of a shape in two-dimensional space from point
data. Two categories of data are relevant: occurrences of species and of
individuals.
Species ranges are important at large scales because geographic range
patterns are a bedrock of biogeography and macroecology, telling us
about such things as provincialism (Kreft & Jetz 2010) and latitudinal
diversity gradients (Lawrence & Fraser 2020). Estimating ranges based
on expert opinion, species distribution modelling, or otherwise is of
great importance in conservation biology (Maréchaux et al. 2017).
At the scale of individuals, home ranges have been studied intensively
by wildlife biologists for decades (Burt 1943). The availability of
large data sets derived from GPS technology calls the value of the
concept into question (Kie et al. 2010), but interspecific
comparisons of home range data are of such broad interest that this
information remains relevant. For example, the allometry of home range
size is a classical topic in macroecology (Kelt 2001).
Shape areas also come up in the field of niche modelling, which
addresses high-dimensional spaces in addition to two-dimensional spaces
(Blonder et al. 2014; Junker et al. 2016; Qiao et
al. 2016). Additionally, the field of multivariate morphometrics is
relevant: estimating the area of occupancy of a morphospace by points
representing species or individuals is fundamentally the same problem.
It has often been tackled in the past by computing statistics that are
not explicitly spatial, such as mean pairwise distances (Foote 1991),
because high-dimensional spaces are often considered. However, the
connection is clear.
The full list of subjects that rely on area estimation is presumably
much larger. Given the breadth and depth of interest in the topic, it
comes as no surprise that a plethora of methods has been proposed. The
most simple is to grid observations and count occupied squares. Gridded
data have been used extensively and for many years in macroecology
(Simpson 1964). Under the name ”area of occupancy”, they are still used
for threat status evaluation (IUCN Standards and Petitions Committee
2019) by the International Union for the Conservation of Nature (IUCN).
This approach is not without merits, because occupancy can be used to
estimate population size (He 2012). However, the values are
scale-dependent, and gridding will underestimate if sampling is sparse
relative to the scale of interest (Hartley & Kunin 2003).
Another simple alternative is to compute a convex hull around the
observations, i.e., to create a minimum convex polygon, which was a
popular approach in wildlife biology for many years (Hayne 1949). Convex
hulls also tend to underestimate, although they will overestimate if
there are holes in distributions or if there are large outliers. But
likewise, the IUCN continues to use this method for determining the
”extent of occupation” of a species, a second major criterion for threat
status evaluation (IUCN Standards and Petitions Committee 2019). Indeed,
both approaches are still considered to be central by conservation
biology researchers, not just the IUCN (Smith et al. 2020).
Nonetheless, field-based ecologists are strongly cognizant of bias in
convex hull areas, so alternatives such as kernel density estimation
have long been commonplace in that area (Worton 1989). The IUCN
guidelines mention this approach only in passing (IUCN Standards and
Petitions Committee 2019). A hybrid method called local convex hull
nonparametric kernel estimation also is used by wildlife biologists, but
its performance has been questioned (Lichti & Swihart 2011).
There are many methods other than kernel density estimation, some quite
sophisticated. Recently, for example, computation of hypervolumes
(Blonder et al. 2014) has become popular with niche modellers.
This method assumes the data are bivariate normal or elliptical in their
distribution, which is problematic and has been critiqued (Qiao et
al. 2016), and which some researchers have tried to address (Jarviset al. 2019). However, the method’s popularity earns it serious
attention. Meanwhile, palaeobiologists have used other methods such as
computing maximum great circle distances. This approach makes sense when
the data follow a linear trend (Foote et al. 2008), but it has
the drawback of putting aside most of the data points.
In any event, many existing approaches have three major flaws addressed
with the new method proposed here. First, they can systematically
underestimate or overestimate, depending on their properties. Consistent
accuracy is a rare property. Second, they may not be particularly
accurate when the data points form clumped or irregular patterns.
Finally, methods that depend on a series of flexible options and
parameters yield results that are indecisive and therefore not very
interpretable.
As I will explain, all three problems can be solved by creating a
network of points that resembles a mosaic and using the edge lengths to
obtain an area estimate. This method, which has been implemented in an R
package called mosaic , has a variety of additional applications.
For example, areas of overlap between ranges are directly computable,
areas of multi-dimensional shapes can be approximated, and the method
allows for identifying outliers by breaking long edges.
Ecologists have used graph theory in the past, but only when working on
selected topics such as landscape analysis (Foltête et al. 2020).
The method outlined here is unrelated to any of this work. For example,
existing methods that concern area estimation are founded on entirely
different theory (Keith, Spring & Kompas 2019).
Before detailing the new approach, it is important to mention what this
paper is and is not about. The goal is to estimate range area, not range
shape. However, mosaic patterns are more intuitive approximations of
range shapes than are convex hulls because they need not be convex. More
importantly, range area per se is of central concern to
biogeographers, macroecologists, allometricians, niche modellers, and
even the IUCN. Second, this not a comparative benchmarking analysis. The
only goal is to show that the method performs well, not to definitively
prove that it outperforms every proposed alternative. Thus, comparisons
will be limited to three things that are of general interest: convex
hulls, kernel density estimators, and hypervolumes. Finally, many
readers will have come to expect that every paper on range estimation
methods will be graced with many equations and framed in terms of
complex and most often parametric process models. This is not one of
them. Instead, I will argue that a simple method should be taken
seriously because it makes sense and it works.
MATERIAL AND METHODS