Statistical analyses
All the statistical analyses were performed by using R version 4.0.4 (R
Core Team, 2021). First, we estimated the effect sizes and their
variances of each collected dataset using the metacor function in
the package ‘meta’ (Balduzzi et al., 2019), where R2values were Fisher’s z-transformed to meet normality (Fisher, 1921). We
ran the random-effect model taking multiple effect sizes derived from
the same study into account by the rma.mv function in the package
‘metafor’ (Viechtbauer, 2010). This assumed that the variances of the
effect sizes in all the collected studies included the heterogeneity
across the studies and datasets, as our collected dataset is not
functionally identical (i.e., some R2 values could be
derived from the same study). More strictly, the weight (inverse
variance) for averaging R2 values comprised both
intra- and inter-study variances of effect size (Borenstein et al.,
2010).
We then performed a meta-regression analysis to assess the effects of
each factor on eDNA-based estimation accuracy of species abundance
accounting for the variances of the effect sizes. We used a generalized
linear mixed model (GLMM) assuming Gaussian distribution with the
Bayesian Markov chain Monte Carlo (MCMC) algorithm in the package
‘MCMCglmm’ (Hadfield, 2010). R2 values (Fisher’s
z-transformed) were included as the objective variable, assay strategy,
target taxa, filter pore size, amplicon size, study environment, and
abundance metrics were included as the fixed effects, and study groups
(i.e., each literature) were included as the random effects. We did not
include the volume of water samples because of its significant
correlation with filter pore size (Pearson’s coefficient = 0.360;P < 0.001). The variances of Fisher’s z-transformed
R2 values were included in our model as the
measurement error variances for each dataset. The prior distribution of
all the fixed and random effects were set as default settings. 13,000
iterations of the MCMC algorithm were run and burn-in was set at 3,000
to discard the initial transient region of the chain to obtain precise
parameter estimates. Prior to model fitting, we excluded all the dataset
from Salter et al. (2019) (N = 2) because they used a commercially
available assay kit whose amplicon size was unknown.
We further assessed the heterogeneity across studies/datasets by
performing Cochran’s Q test and calculating I2 values
(Nakagawa & Santos, 2012), which were performed using themetacor function. I2 values have recently been
used to quantify heterogeneity in the data; I2 = 25,
50, and 75% are considered as low, moderate, and high heterogeneity,
respectively (Higgins et al., 2003). In addition, we assessed the
publication bias, occasionally termed as the ‘file drawer problem’
(Haddaw ay et al., 2020), by visualizing a funnel plot (Fisher’s z
values plotted against their variances) and conducting a modified
Egger’s regression (Egger et al., 1998) using the regtestfunction. Moreover, to test the effects of outliers in our modeling, we
excluded the dataset whose filter pore size was 10 µm (N = 1) and
amplicon size was approximately 900 bp (N = 2) (Table S1) and performed
the similar meta-regression analysis as described above.