Introduction
Cardiotocograph (CTG) is a screening tool which aims to detect fetal
heart responses to an ongoing intrapartum hypoxic or mechanical stress
of labour. Nowadays, it is well known that the ability of CTG-monitoring
to accurately detect intrapartum hypoxic-stress has been questioned due
to its high false-positive rate and the lack of a valid gold-standard
for intrapartum-fetal-hypoxia detection to compare with. The recent
Cochrane-Systematic-Review on
intrapartum-fetal-monitoring1 concluded that the use
of CTG-monitoring increased the rate of C-sections and instrumental
deliveries without a significant reduction in the rates of perinatal
death or cerebral palsy.
Several confidential enquiries into poor perinatal outcomes have
highlighted that CTG-misinterpretation is still one of the key avoidable
issues. Between 2000 and 2010, the National Health System (NHS)
Litigation Authority2 in the UK identified 300 claims
involving CTG-misinterpretation, with an estimated value of £466
million. It is estimated that in the UK, between 500-to-800 babies die
or are left with severe brain injuries every year and
CTG-misinterpretation has been found to be a contributing factor in 49%
of all the cases reported3.
Misinterpretation of CTGs is mainly subject to two main components, one
is the clinical interpretation by the practitioner, and the other, the
historical lack of clear consensus by different international and
national guidelines. Recently, the use of confusing guidelines on
CTG-interpretation based on ‘pattern-recognition’ have been identified
as a source of variability affecting intra and inter-observer
agreement4. Also, human element is other strong source
of variability, as even ‘CTG-Experts’ have been shown to change their
opinion, once they are made aware of the neonatal
outcomes5.
Furthermore, there is still no reliable technology able to alleviate
this issue6. The only aspect that seems to have had a
positive impact in improving inter and intra-observer agreement of
CTG-interpretation is intense training and
education7–9. Nonetheless, there are still
controversies about the standardisation and efficacy in the current
training schemes offered worldwide10,11. Therefore,
improving training in CTG-interpretation seems to be crucial to improve
perinatal outcomes. Our hypothesis suggests that intense
fetal-physiology-based training contribute positively to enhance the
inter- and intra-observer agreement as well as levels of self-confidence
and knowledge. Although some authors12, ‘appealing to
the stone’, venture to refute the fetal-physiology-approach in favour to
pattern-recognition-approach, it is evident that guidelines based on
pattern recognition are contributing to poor perinatal outcomes and
increased intrapartum operative interventions4. The
latest Each Baby Counts Report, published by the Royal College of
Obstetricians and Gynaecologists13, highlights that
33% of cases were due to CTG-Misinterpretation, and in 72% different
care may have resulted in different outcomes. In contrast, an
intense-training on the use of fetal-physiology to interpret CTG-traces
have been reported to be associated with improved perinatal
outcomes9.
The objective of this study was to address the level of agreement, the
sources of discrepancies and the associated human factors on
CTG-interpretation in staff trained in fetal-physiology-approach from
the maternity unit at St George’s University Hospitals NHS Foundation
Trust in London in order to obtain a deep insight about this in
vogue method.
This hospital, which is one of the largest Teaching Hospitals in London
with approximately 5000 births/year, was the first centre in The UK to
introduce a mandatory competency testing for all staff providing
intrapartum care on CTG-interpretation in 2010 after implemented an
intense training in CTG-monitoring based on fetal-physiology-approach,
which is provided by a team of highly experienced obstetricians and
midwives (CTG-Team in the document). This dedicated CTG-Team has
received national awards for its outstanding performance in ensuring a
low intrapartum C-section rate and a low hypoxic
ischaemic-encephalopathy as compared to other Tertiary Teaching
Hospitals in London.
Methods
A total of 25 midwives and 7 doctors, approximately 10% of the total
clinical staff, were asked to interpret five anonymised
colour-printed-copies of five different CTGs [Fig.1]. Traces were
accompanied by the relevant clinical history. Three traces corresponded
to ultrasound-transducer recordings, and the other two were CTG-STAN
recordings. Along with each copy, a questionnaire with closed and open
questions was also provided (supplemental material). The five CTGs were
deliberately selected based on the features that give rise to
differences in their interpretation. The same questionnaire was also
previously filled by the Hospital CTG-Team and was used as theoretical
gold-standard for analytical purposes.
The questionnaire responses included the categorisation of the
CTG-traces, as well as the identification of any ongoing type of
hypoxia. In order to classify the traces, local CTG-guidelines (NICE or
STAN) were used. Detection of the types of fetal hypoxia on the
questionnaire was based on the described criteria in the scientific
literature14,15: gradually evolving hypoxia, subacute
hypoxia, acute hypoxia, and chronic hypoxia. The questionnaire also
allowed the quantification of several aspects: (1) the proportion of
concordance, between the CTG-Team and clinical staff, in
CTG-classification by ‘CTG-guidelines’ as well as by identification of
‘types-of-hypoxia’, (2) the inter-rater (inter-observer) reliability
within the staff, (3) the background knowledge in CTG-interpretation and
(4) the level of self-reported confidence.
Statistical analysis
The CTG categorisations provided by the CTG-Team compared with the
categorisation given by the staff was assessed by proportion of
concordance (PC) with 95% confidence interval (CI). The staff
inter-observer reliability was assessed by Fleiss-Kappa value (K).
K-values were interpreted according to Landis and
Koch16 recommendations: a K<0.20 was
considered poor, 0.20-0.40 slight, 0.41-0.60 fair, 0.61-0.80
substantial, and 0.81-1.00 almost perfect. The rest of the proportions
that were mainly descriptive were expressed as raw percentages without
CI. Comparison of different PC was assessed by chi-square test with a
significant level set a P<0.001. Comparison of K-values was
assessed following Cumming and Finch17 where K were
considered non-significantly different if the 95% CI overlaps. The
statistical analysis was generated using the Real-Statistics
Resource-Pack software (Release-4.3) for Excel-Microsoft-Office 2015 and
IBM SPSS-Statistics for Windows, Version-25.0. Armonk, NY:IBM Corp.2017.
Ethical approval
Data was obtained as part of a university MSc-program and therefore
followed the ethical guidelines of the UK universities in addition of
the permission of the Hospital Local Ethics Committee and the voluntary
participation of the staff. No patient identifiable data were used in
the study.
Results
CTG interpretation: Categorisation and types of
hypoxia.
In total, 160 CTG full interpretations, five for each participant were
examined. The analysis of the differences between the CTG-Team and the
clinical staff on CTG-interpretation applying local CTG-guidelines are
displayed in table-1. Overall, the categorisation of CTG using the
correspondent local-guideline presented a PC (95% CI) = 61.2%(53.6%–68.8%), representing a moderate agreement against the CTG-Team
and a K (95% CI) = 0.33 (0.316–0.362), representing a fair
reliability. However, if the CTGs are being interpreted by types of
hypoxia, the PC= 76.1% (69.4%–82.8%) and K=0.37(0.35–0.39). Consequently, the identification of type hypoxia compared
against local-guidelines as method of CTG interpretation presented
better PC (76.1% vs 61.2%, P=0.006) and slightly better reliability (K
0.37 (0.35–0.39) vs 0.33 (0.32–0.36)).
In comparison with other methods of interpretation based in
pattern-recognition and published under peer-review; interpretations by
types-of-hypoxia present the higher proportion of agreement, and also,
better reliability than studies with similar sample of observers
[Table-2]
Background knowledge
The staff were asked to rank from 1-5 which source of knowledge helped
them most in analysing each CTG. The options given were: 1) uses of
current guidelines, 2) own knowledge in fetal-physiology, 3) previous
experience, 4) opinion of someone more senior and 5) similar case(s)
previously discussed during a CTG meeting/training. Midwives reported
that the background knowledge on which they rely the most are guidelines
first (25.8%), and fetal-physiology second (22.2%) followed by
experience (20.4%), discussion in previous CTG-meetings (16.3%) and
opinion (15.2%). Doctors relied mostly on fetal-physiology (28.9%),
experience (20.5%) and meeting (20.5%) were ranked both in second
position with same percentage followed by guidelines (19.3%), and
opinion (10.7%) [Table-3; Fig.2-4]
Self-reported level of confidence
The staff were asked to rank the level of confidence over 7 possible
points from ‘not confident at all’ to ‘very confident’. Overall, 68% of
them feel confident or very confident with CTG interpretation. Within
the midwifery group, the most confident or very confident were Band-7
midwives (94.7%) followed by Band-6 (64.5%) and Band-5 (41.7%).
Doctors followed a similar pattern to midwives. The most confident to
very confident were the Consultants (100%), followed by senior doctors
(90%) and junior doctors (57.1%). [Table-4;Fig:5]
Discussion
Since the purported rationale of having different categorisation of a
CTG-trace is to identify the risk of the potential hypoxia, our study
shows that it is more practical to directly state whether a fetus is
exposed to a hypoxic stress and the type of ongoing fetal hypoxia, if
any. This may help avoid the use of confusing terminology such as
‘intermediate’ ‘suspicious’ or ‘pathological’ CTG-traces, which have no
correlation with neonatal outcomes18. Also, it is
worth to mention that our method to calculate PC imply a double
agreement: first between staff and second against the gold-standard.
Therefore, we suggest this method enhance the validity of our agreement
results.
Sources of discrepancy: Pattern-recognition vs.
Fetal-Physiology
The staff that did not agree with the diagnosis of the CTG-1and described it as suspicious or pathological were led by the number of
uterine contractions shown in the tocograph and not by non-reassuring
features on the cardiograph. This suggests that features that are not
formally part of the CTG-guideline table may interfere with the overall
interpretation. The intense fetal-physiology training ensures that the
trained staff is also able to consider any ongoing excessive uterine
activity contributing to abnormal features on the CTG-trace. Although
being vigilant for any deviation from normality is crucial in maternity
services, clinicians should also bear in mind that an over-diagnosis may
be equally harmful, as it may lead to expediting the delivery of a
healthy fetus.
An interesting data for discrepancy was noted in CTG-2, where
up to 10 different nomenclatures were used to describe decelerations.
Although, none of those categories and nomenclature would lead to
different management other than imminent delivery, the use of
appropriate terminology stipulated by the guidelines was not followed.
This reflects the inherent flaws in any guideline which is based on
‘pattern-recognition’ which relies on the morphological classification
of decelerations, as this would lead to significant inter and
intra-observer variability.
According to the CTG-Team, the CTG-3 baseline is 108bpm, was a
non-reassuring feature as stipulated by the guidelines and thus, the CTG
must be categorised as suspicious. However, the staff who categorised
the CTG as normal did so because they considered that the base line was
≥110bpm. The problem that arises from this 2bpm difference is that a
base line of 108bpm in a term baby can be perfectly normal, but it can
be (strictly speaking) categorised as a non-reassuring feature.
Consequently, if any other non-reassuring feature appears while the
baseline is defined as suspicious, the CTG would be categorised as
pathological. A similar scenario was seen on the CTG-4 as the
main discrepancy was categorising the trace as intermediary (under
STAN-guidelines) due to a base line of >150bpm.
Understanding the importance of accurately (and physiologically)
interpreting baseline is crucial to avoid over diagnosis leading to
potential unnecessary interventions because incorrect assessment would
lead to incorrect management.CTG-5 only presented one complicated-deceleration with a
reassuring baseline. However, in the context of STAN-guidelines, which
differentiated between different types of decelerations, but do not
specify the number of decelerations required per determined period of
time, promotes confusion in the CTG-categorisation. Similar to CTG-2,
the confusion arise from naming the decelerations or mixing the
guidelines producing a confusion that can reduce the rate of agreement
only on the basis of the terminology. This highlights the role played by
some guidelines based on ‘pattern-recognition’ in promoting confusion
amongst clinicians.
Source of knowledge
When midwives progress from Band-5 to Band-6, logically, they start
relying more in their own experience and less on the opinion of someone
more senior. Most importantly, the data show that the more senior the
midwife, the more reliance on the fetal-physiology to interpret the
CTG-traces and a diminished reliance on the CTG-guideline until they
become Band-7. This last group reported experience as the least valued
option and their decisions are mostly based on the use of guidelines
followed the by knowledge of fetal-physiology. A possible explanation to
this phenomenon amongst Band-7 (labour co-ordinators) may be their
crucial role in having an ‘overall’ responsibility which could create
conflicts between taking defensive decisions following a closed
written-guideline or trusting the fetal-physiology. Similar scenario as
Band-7 is seen on senior doctors, but not in Consultants. However, since
doctors also increase reliance on the understating of the
fetal-physiology along seniority, it is likely that the CTG
intense-teaching is promoting a switch from pattern-recognition to a
physiological-approach amongst staff, this can be easily visualised in
the radial graphs provided [Fig. 2-4].
Confidence on
CTG-interpretation
The level of confidence varies according to professional grade. Both,
midwives and doctors gain self-confidence as they progress in their
respective careers. Band-7 midwives (i.e. labour ward co-ordinators)
reported a higher level of confidence than junior and senior doctors.
This is likely due to the intense ‘cascade training’ on
CTG-interpretation provided to Band-7 midwives by the CTG-Team to ensure
that the unit is always staffed by a co-ordinator with an excellent
knowledge of fetal-physiology. The lower proportions of being confident
or very confident are among Band-5 midwives (i.e. newly appointed or
junior midwives). This is understandable, considering that they are the
professionals who are most likely to seek a senior opinion. In contrast,
100% of consultants felt confident or very confident. However, it is
also interesting to highlight a considerable disagreement in
CTG-interpretation between the consultants who took part in this study.
This is likely due to the incorporation of individual ‘experience’,
disregarding the guidelines or fetal-physiology by some consultants.
Therefore, it is important to appreciate that some degree of
overconfidence and/or non-concordance may exist amongst senior
clinicians in any maternity team due to their experience. Therefore, a
multidisciplinary-team approach to CTG-interpretation by improving the
knowledge of fetal-physiology may help improve concordance and
reliability in CTG-interpretation.
Importance of Fetal-Physiology training and
multi-professional approach
Our study highlights the challenges that arise when pattern-recognition
is in place. On one hand, relying mostly on CTG-guidelines, especially
in junior staff, could be seen as a “horse-blinder” producing
inability to see and understand a wider clinical picture such as an
appropriate fetal-heart-rate base-line, ongoing chorioamnionitis,
maternal pyrexia, meconium stained liquor, etc. This is usually
manifested by lower level of self-confidence in CTG-interpretation. On
the other hand, in the more senior staff, there is a chance of taking a
more defensive and interventionist approach by relying more in
CTG-guidelines and ‘personal experience’ than in the actual physiology
and clinical picture. This could be manifested by an overconfidence
status. Therefore, it is vital to ensure all staff receive intense
training on fetal-physiology and the types of intrapartum-hypoxia, so
the pattern-recognition approach do not trump physiological and
scientific principles underpinning intrapartum fetal heart rate
monitoring18.
To support the above, our study demonstrates that intense training on
fetal-physiology not only improve K and PC but also increase knowledge
and self-confidence in CTG-interpretation. This will contribute to
reduce the variation in the management of labour, and hopefully, will
improve intrapartum maternal and perinatal outcomes. Additionally,
instead of using multiple CTG-guidelines based on pattern-recognition
with confusing terminologies and different ‘features’, we suggest the
use of ‘types of intrapartum-hypoxia’ to classify CTG-traces as a
default method. This will contribute to delineate better the fetal
ability to respond and compensate to an hypoxic insult, which is the
corner-stone of intrapartum-CTG. Similar findings were reported in a
recent study19 which analysed 52,187 births over an
11-years period, which reported 81% agreement between clinicians when
‘types of hypoxia’ were used to classify the CTG-Trace, instead of using
guidelines based on ‘pattern-recognition’.
Strengths and
limitations
To our best knowledge, this is the first study which analysed
inter-observer variability amongst 32 midwives and obstetricians of
different grades and experience who have undergone an intense training
on fetal-physiology. Secondly, we had a dedicated CTG-Team, who have
expertise on CTG-interpretation as they have published extensively in
this area, and conduct CTG-Masterclasses in approximately 14 countries
every year, who were used as the ‘gold-standard’. Thirdly, in addition
to inter-observer variability, we also analysed subjective levels of
confidence on CTG-interpretation.
The main limitation was the restriction to a single centre. However, the
authors felt that it was best to conduct this study in a centre where
had received an intense training on fetal-physiology, and a mandatory
competency testing on CTG-interpretation. Secondly, it may be argued
that clinicians were provided with only 20-minutes of the CTG-trace
instead of the whole trace, we accept that
‘Cycling’8,20, one the most important CTG-features,
could not be evaluated properly. However, to determine the type of
hypoxia, it was felt that a 20-minute trace was sufficient and it
reflected the real life situation, where clinicians are expected to make
crucial decisions based on short segments of the CTG-trace. Thirdly, the
authors accept some may argue that number of observer was small.
However, this was a complex study assessing inter-observer variability,
and although only 32 clinicians took part in the study, a total 160
CTG-traces were analysed. Many studies on inter-observer variability on
CTG traces have used less than 10 clinicians21–29.
Conclusions
This paper demonstrates that continuous education and an intense ‘fetal
physiology-based’ CTG-training by a specialised CTG-team increase
knowledge in fetal-physiology and produces higher levels of staff
confidence reflecting better levels of agreement and reliability.
Classification of CTG-Traces by ‘type of intrapartum-hypoxia’ is
preferable to CTG-guidelines. However, if CTG-guidelines are
sine-qua-non element of the maternity unit, they should be simple
and easy to use, and these should be backed up by immediate availability
of senior input with appropriate knowledge of fetal physiology able to
recognise any ongoing hypoxic process. This approach may help reduce the
pitfalls of pattern recognition amongst more junior members of staff.
Development of a specialized “CTG-Team“ formed by consultants and
midwives to educate staff, and to review and discuss CTG-traces and
outcomes may help to create a multidisciplinary approach resulting in
inter-observer variability reduction and increased staff confidence in
CTG interpretation.
Conflict of interest:
No conflict of interest has been declared by the authors.
Contribution to
authorship:
JG conceived the study and performed data collection. JG, DD and EC
undertook data analysis and interpreted results. All authors contributed
to the writing of the document and approved the final manuscript.
Funding:
JG: Main data was collected as part of a Self-funded university MSc
program. A secondary analysis of the data was performed to elaborate
this manuscript. There is no source of funding to declare by the rest of
authors in this paper.
Details of ethical
approval:
Data was obtained as part of a St. Georges University of London
MSc-program (LA: HP7203/4/XY approved and signed on 22.06.2017) in
addition of the permission of the Local NHS trust and the voluntary
participation of the staff. No patient identifiable data were used in
the study.
Acknowledgments
We would like to thank Susan Heatley, Senior Lecture at St.
Georges University of London, Lindsay Gillman, Senior Lecture at
St. Georges University of London, Margaret Flynn, Deputy Head of
Midwifery at St. Georges University Hospitals NHS, the members of the
‘CTG team’: Mrs Virginia Whelehan, Miss Abigail Archer, Mrs Roise
Heffernan, Miss Rachel Tree, Miss Isabelle Cornet, Mr Austin Ugwumadu,
as well as the multidisciplinary Maternity Team at St George’s
University Hospitals NHS Foundation Trust, London.