Challenges
In Table 4 we summarise the strengths and weaknesses of the different
approaches. In systematic reviews of complex interventions there is
likely to be a large amount of heterogeneity due to differences in
setting, population, intervention and study design. When combining
different types of outcomes, measured and reported in a variety of
different ways, heterogeneity due to outcome measurement also has to be
a serious additional consideration. Our estimate of heterogeneity,
I2 for the SMD analyses ranged from 95.3% to 98.5%
suggesting substantial heterogeneity. Exploration of heterogeneity is
not the focus of the paper and has been discussed by a number of authors242526. Sources of
heterogeneity can be explored using methods such as subgroup analysis27 and meta-regression28 although these
common approaches are subject to ecological fallacy, and superior
approaches exist where sufficient data are available29 . In contrast to a
meta-analysis of a well-defined pharmaceutical intervention, where
heterogeneity is generally seen as a nuisance, identifying the sources
of the heterogeneity is often a key research questions when synthesising
data from complex interventions.
Some authors have expressed concerns about the use of SMDs in
meta-analysis. The SMD estimates the average improvement in outcome per
SD on whatever scale that outcome is measured on; as Greenland30 points out the SD
measured within a trial is likely to be different to the population SD
and will vary according to the design features of the trial (e.g.
inclusion/exclusion criteria). Trials are often designed to minimise
variability and therefore SDs reported are likely to be smaller than the
SD in the target population, leading to an overestimate of the treatment
effect of interest. Another problem, as discussed by Senn31, that is especially
pertinent here, is that the SD will depend on the measurement error, and
since we have lots of different measurement scales we will have lots of
different measurement errors; this means that you could get lots of
different SMDs even if the treatment effect was the same in each study.
In an attempt to combine all available information we have converted
odds ratios into SMDs using the methods described by Chinn32. This method
provides an estimate of the SMD from an odds ratio using the assumption
that the odds ratio has come from a dichotomy of a normally distributed
continuous variable; this may be a poor estimate when this assumption is
not true. Sanchez-Meca33 compares alternative
indices to combine continuous measures with dichotomies and show that
this method slightly underestimates the SMD. Our conclusions were
unchanged when binary and continuous data were analysed separately, but
the SMDs estimated from continuous data alone were considerably higher
than those when binary data were combined so it is possible that by
converting odds ratios to SMDs we were underestimating the true
treatment effect in this context.
Varying units of randomisation and analysis lead to difficulties both in
terms of synthesis methods and interpretation. One of our reported
methods (Method 3) aims to apply consistent weighting based on the
number of health care professionals to allow inference about a
consistent population; however this leads to other problems. Weights
based on sample size do not take into account the variability of the
data, essentially assuming a constant standard deviation across all
trials. In their simulation study, Marin-Martinez and Sanchez-Meca34 show that weighting
by the inverse variance yields less biased results than weighting by
sample size. Complexity is added when a review wishes to combine
evidence from different types of trial design35,36.
Individually randomised trials, cluster randomised trials and stepped
wedge trials are all useful in answering questions about behaviour
change interventions targeted at health care professionals, but you
would not necessarily expect the SMD (effect size) to be consistent
across each type of trial due to the different units of analysis (and
therefore different underlying SDs)37,38.
Some consensus among trialists of health professional behaviour change
interventions, in the form of a core outcome set39would be useful for
future systematic reviews. Consistency in terms of outcomes used, unit
of analysis and format of outcome reporting is desirable. In addition,
we may want to separate out the effect on the health care provider from
the effect on the individual patient; this would require individual
participant data and multilevel modelling24.
Some trials used in this analysis have reported ‘mean percentage
compliance’ or similar – e.g. the percentage of occasions a test was
ordered, averaged over a group of GPs. This measurement is bounded
between 0% and 100% and therefore cannot be considered truly
continuous. Inference methods (meta-analysis of SMDs) used here assume
continuity and normality and are likely to perform poorly where results
are close to the boundaries (0% and 100%). We performed additional
sensitivity analyses removing trials where the mean compliance was
between 0% and 20% or between 80% and 100%; and results appeared
robust. Alternative methods to analyse proportions include those
suggested by Miller 40and Stijnen et al.41and these may be preferable when meta-analysing proportions alone.
We acknowledge all of these challenges and feel that conclusions based
on any of the methods presented here need to be very cautious. However
we feel that there are occasions where the combination of mixed outcomes
is still warranted, but should be accompanied with appropriate
sensitivity analyses and caveats.