Strengths and limitations
This SR was based on an ambitious, open and inclusive protocol, which
aimed to include studies using any test to support the diagnosis of any
food allergy. This way we captured all available evidence beyond the
commonly used tests and the most common food allergies. However, we were
limited by the number of studies available to do meta-analyses and by
the quality of the available evidence. For instance, randomized
controlled trials (RCTs) are considered the highest level of evidence
for evaluating the effectiveness of diagnostic strategies; however, none
of the studies found by our SR followed this methodology. It is
important to note that RCTs may not always be feasible or practical for
evaluating diagnostic strategies, especially if the strategy is already
in widespread use as is the case for SPT, sIgE and CRD. In such
instances, observational studies may be used to evaluate diagnostic
tests. Evidence from our SR met these criteria and included
cross-sectional and cohort study designs. Although we included 8 case
control studies, this were judged as having high risk of bias and did
not contribute to the certainty of evidence.
The heterogeneity of studies was a major obstacle for our SR
complicating meaningful comparisons across studies. We found variability
in the definition of the target condition, in the interpretation of test
results and in the characteristics of the study populations. The
different diagnostic thresholds implemented across the studies as well
as the composition of the food extracts and commercial brands could
affect the sensitivity and specificity of the tests used in the
meta-analyses. Most studies on FA diagnosis have been conducted in
children. Of the studies included, 60.4% were undertaken in a
population ≤12 years of age. While these studies have provided important
insights, they may not be fully generalizable to adults.
Our data highlights the important of having age validated cut-offs for
allergy diagnostic test. Previous research has examined diagnostic test
accuracy in specific age groups or ethnicities as one single population
and pooled analysis of this data have thus far not been performed. While
the individual raw data were not available, we were able to draw
inferences of interest. For example, we found that peanut-sIgE had
greater diagnostic accuracy in children under 2 years of age while Ara h
2-sIgE exhibited high specificity among adults.
Data included in the SR came mainly from Europe. Multiple geographical
locations had only limited or no studies, such as Southeast Asia, Middle
East, Africa and Central and South America. Only 13.4% of eligible data
were derived from multicentre studies, highlighting a need for future
collaboration to understand cross-population differences. The lack of
representation from certain regions or populations can limit the
generalizability of the findings and may not accurately reflect the
diversity of the global population.
While studies from Europe may provide valuable insights into the
diagnosis in that region, it is important to recognize that test
accuracy may vary in other parts of the world. We analyzed the data for
different geographical regions and saw that Ara h 2-sIgE presented
higher specificity in Northern Europe and Australia than in North
America or Asia [184]. Furthermore, various ethnicities within a
geographical region could have different diagnostic test accuracies.
Most studies included in this SR made no reference to ethnicity
variations within the populations studied. Only 12 studies mentioned the
ethnicity of the subjects enrolled and 3 studies [80-82] analyzed
the accuracy of diagnostic test between different ethnicities within the
same population. Better descriptions of the study populations in future
diagnostic test accuracy studies may help to establish more personalized
approaches.
Another limitation of diagnostic studies is that the results are often
dichotomous, meaning that a specific cut-off value is used to classify
participants as allergic or tolerant, and this affects the reported
diagnostic performance. For example, if a high cut-off value of 8 mm is
used, sensitivity (proportion of participants with true food allergy
with SPT ≥8mm) would be relatively low while the specificity (proportion
of true tolerant participants with SPT <8mm) would be
relatively high. This gives a misleading impression that the test has a
low sensitivity when it may be good at ruling out food allergy when the
SPT result is much smaller (e.g. <3mm). Ideally, a continuous
model would be used linking actual results to probability of food
allergy to accurately evaluate the results of allergy tests, but this
approach requires additional raw data that were not available at this
stage. Furthermore, we assessed the cut-offs employed in various
studies; this approach using pooled estimates obtained may not
accurately represent any specific cut-off point studied. Consequently,
there is a need to exercise caution and rate the certainty of the
findings lower due to the indirect nature of the evidence.
The sensitivity and specificity of the tests rely on the chosen
threshold. Tables S5 and S6 demonstrate that when the threshold is set
sufficiently high, almost every test for every food exhibit high
specificity. Similarly, by setting the threshold low enough, most tests
can achieve high sensitivity. Instead of solely concentrating on pooled
results to determine optimal thresholds, it’s important to consider that
different studies may have been designed to optimize different factors.
Consequently, pooling them together may not yield meaningful results.
Utilizing the Youden’s index to maximize sensitivity and specificity can
lead to a threshold that does not perform well for either metric.
We performed meta-analyses for maximum sensitivity and specificity,
whose aim was to provide insights into the specific cut-offs which could
help rule in or out specific food allergies. A highly sensitive test
when negative rules out allergic disease while a highly specific test
when positive rules it in. The values obtained for the maximum
specificity and sensitivity analysis were those provided by the authors
as their maximum cut-offs; thus, this is dependent on the way the data
is reported in the different studies.