Statistical analyses
All the statistical analyses were performed by using R version 4.0.4 (R Core Team, 2021). First, we estimated the effect sizes and their variances of each collected dataset using the metacor function in the package ‘meta’ (Balduzzi et al., 2019), where R2values were Fisher’s z-transformed to meet normality (Fisher, 1921). We ran the random-effect model taking multiple effect sizes derived from the same study into account by the rma.mv function in the package ‘metafor’ (Viechtbauer, 2010). This assumed that the variances of the effect sizes in all the collected studies included the heterogeneity across the studies and datasets, as our collected dataset is not functionally identical (i.e., some R2 values could be derived from the same study). More strictly, the weight (inverse variance) for averaging R2 values comprised both intra- and inter-study variances of effect size (Borenstein et al., 2010).
We then performed a meta-regression analysis to assess the effects of each factor on eDNA-based estimation accuracy of species abundance accounting for the variances of the effect sizes. We used a generalized linear mixed model (GLMM) assuming Gaussian distribution with the Bayesian Markov chain Monte Carlo (MCMC) algorithm in the package ‘MCMCglmm’ (Hadfield, 2010). R2 values (Fisher’s z-transformed) were included as the objective variable, assay strategy, target taxa, filter pore size, amplicon size, study environment, and abundance metrics were included as the fixed effects, and study groups (i.e., each literature) were included as the random effects. We did not include the volume of water samples because of its significant correlation with filter pore size (Pearson’s coefficient = 0.360;P < 0.001). The variances of Fisher’s z-transformed R2 values were included in our model as the measurement error variances for each dataset. The prior distribution of all the fixed and random effects were set as default settings. 13,000 iterations of the MCMC algorithm were run and burn-in was set at 3,000 to discard the initial transient region of the chain to obtain precise parameter estimates. Prior to model fitting, we excluded all the dataset from Salter et al. (2019) (N = 2) because they used a commercially available assay kit whose amplicon size was unknown.
We further assessed the heterogeneity across studies/datasets by performing Cochran’s Q test and calculating I2 values (Nakagawa & Santos, 2012), which were performed using themetacor function. I2 values have recently been used to quantify heterogeneity in the data; I2 = 25, 50, and 75% are considered as low, moderate, and high heterogeneity, respectively (Higgins et al., 2003). In addition, we assessed the publication bias, occasionally termed as the ‘file drawer problem’ (Haddaw ay et al., 2020), by visualizing a funnel plot (Fisher’s z values plotted against their variances) and conducting a modified Egger’s regression (Egger et al., 1998) using the regtestfunction. Moreover, to test the effects of outliers in our modeling, we excluded the dataset whose filter pore size was 10 µm (N = 1) and amplicon size was approximately 900 bp (N = 2) (Table S1) and performed the similar meta-regression analysis as described above.