Challenges
In Table 4 we summarise the strengths and weaknesses of the different approaches. In systematic reviews of complex interventions there is likely to be a large amount of heterogeneity due to differences in setting, population, intervention and study design. When combining different types of outcomes, measured and reported in a variety of different ways, heterogeneity due to outcome measurement also has to be a serious additional consideration. Our estimate of heterogeneity, I2 for the SMD analyses ranged from 95.3% to 98.5% suggesting substantial heterogeneity. Exploration of heterogeneity is not the focus of the paper and has been discussed by a number of authors242526. Sources of heterogeneity can be explored using methods such as subgroup analysis27 and meta-regression28 although these common approaches are subject to ecological fallacy, and superior approaches exist where sufficient data are available29 . In contrast to a meta-analysis of a well-defined pharmaceutical intervention, where heterogeneity is generally seen as a nuisance, identifying the sources of the heterogeneity is often a key research questions when synthesising data from complex interventions.
Some authors have expressed concerns about the use of SMDs in meta-analysis. The SMD estimates the average improvement in outcome per SD on whatever scale that outcome is measured on; as Greenland30 points out the SD measured within a trial is likely to be different to the population SD and will vary according to the design features of the trial (e.g. inclusion/exclusion criteria). Trials are often designed to minimise variability and therefore SDs reported are likely to be smaller than the SD in the target population, leading to an overestimate of the treatment effect of interest. Another problem, as discussed by Senn31, that is especially pertinent here, is that the SD will depend on the measurement error, and since we have lots of different measurement scales we will have lots of different measurement errors; this means that you could get lots of different SMDs even if the treatment effect was the same in each study.
In an attempt to combine all available information we have converted odds ratios into SMDs using the methods described by Chinn32. This method provides an estimate of the SMD from an odds ratio using the assumption that the odds ratio has come from a dichotomy of a normally distributed continuous variable; this may be a poor estimate when this assumption is not true. Sanchez-Meca33 compares alternative indices to combine continuous measures with dichotomies and show that this method slightly underestimates the SMD. Our conclusions were unchanged when binary and continuous data were analysed separately, but the SMDs estimated from continuous data alone were considerably higher than those when binary data were combined so it is possible that by converting odds ratios to SMDs we were underestimating the true treatment effect in this context.
Varying units of randomisation and analysis lead to difficulties both in terms of synthesis methods and interpretation. One of our reported methods (Method 3) aims to apply consistent weighting based on the number of health care professionals to allow inference about a consistent population; however this leads to other problems. Weights based on sample size do not take into account the variability of the data, essentially assuming a constant standard deviation across all trials. In their simulation study, Marin-Martinez and Sanchez-Meca34 show that weighting by the inverse variance yields less biased results than weighting by sample size. Complexity is added when a review wishes to combine evidence from different types of trial design35,36. Individually randomised trials, cluster randomised trials and stepped wedge trials are all useful in answering questions about behaviour change interventions targeted at health care professionals, but you would not necessarily expect the SMD (effect size) to be consistent across each type of trial due to the different units of analysis (and therefore different underlying SDs)37,38. Some consensus among trialists of health professional behaviour change interventions, in the form of a core outcome set39would be useful for future systematic reviews. Consistency in terms of outcomes used, unit of analysis and format of outcome reporting is desirable. In addition, we may want to separate out the effect on the health care provider from the effect on the individual patient; this would require individual participant data and multilevel modelling24.
Some trials used in this analysis have reported ‘mean percentage compliance’ or similar – e.g. the percentage of occasions a test was ordered, averaged over a group of GPs. This measurement is bounded between 0% and 100% and therefore cannot be considered truly continuous. Inference methods (meta-analysis of SMDs) used here assume continuity and normality and are likely to perform poorly where results are close to the boundaries (0% and 100%). We performed additional sensitivity analyses removing trials where the mean compliance was between 0% and 20% or between 80% and 100%; and results appeared robust. Alternative methods to analyse proportions include those suggested by Miller 40and Stijnen et al.41and these may be preferable when meta-analysing proportions alone.
We acknowledge all of these challenges and feel that conclusions based on any of the methods presented here need to be very cautious. However we feel that there are occasions where the combination of mixed outcomes is still warranted, but should be accompanied with appropriate sensitivity analyses and caveats.