Strengths and limitations
There are some limitations to our study. First, although the brain is traditionally examined in the axial plane and the evaluation of this plane is widely used as a screening tool, to make a more comprehensive anatomy examining, coronal and sagittal planes are also required22. Our AI system was established only based on image of axial view and it was unable to provide a fully assessment of lesions, we will continue to train the current AI model with images of other planes to optimize its performance. Second, although transfer learning allows the development of an accurate model with a relatively small training dataset, our sample size might be relatively small considering for multiple kinds of anomalies identification, we will continue to optimize our system with larger amount of data13. Finally, our AI was trained and validated using datasets from southern China, and its efficacy for other populations is yet to be investigated.
The strength of our study is the multicenter design, AI system was training on data from two different hospitals and the high performance of the AI system was validated by the data from the third external hospital, the doctors took part in the test also came from different hospital all over the country, which contribute to the generalizability of AI system and ensure the objective assessment of AI performance.
Interpretation
To the best of our knowledge, this is the first attempt to develop AI system to detect specific CNS malformations. Previous studies showed that images of normal transventricular (TV) and transcerebellar (TC) planes could be recognized and biometric measured by CNN-based deep learning algorithms 14, 18. For example, the AI system established by Yaqub et al. 14can identify normal TV planes by detecting the fetal head and the visibility of the cavi septi pellucidi. Baumgartner et al.18 reported a method for real-time detection and localization of 13 fetal standard planes, including the TV and TC planes. Nevertheless, rare studies involved cases with congenital malformations, and training to classify images as normal or abnormal, let alone to make a diagnosis for specific structural anomalies. Our previous study21 used 15372 normal and 14047 abnormal fetal CNS ultrasound images to establish binary classification of an AI system, and the results showed that that AI system had a sensitivity of 96.9%, specificity of 95.9%, and AUC 0.989 (95% CI: 0.986–0.991) when identifying images as normal or abnormal. Thus, we verified the feasibility of CNN-based deep learning algorithms for binary classification. On the basis of that work, we established this multi-classification model to perform specific malformations diagnosis. This new AI system achieved a 0.798 (95% CI 0.770, 0.826) accuracy and an AUC of 0.86 (0.83–0.89) in identifying 12 types of CNS based on ultrasound images. The results demonstrated an artificial intelligence is capable of detecting specific congenital malformations.
In the clinical testing, our AI system assisted doctors of all expertise levels in improving their detection performance of fetal CNS malformations. This was especially prominent for the trainee doctors, whose performance was improved to a level comparable with that of expert doctors after AI assistance. This might be attributed to the lesion localization function of the AI model, which can help doctors to recognized the lesions then to make diagnosis. This advantage would be especially useful in clinical practice. As we know, the prenatal diagnosis for CNS anomalies is one of the most difficult and challenging task and needs a special technique, namely neurosonography, a targeted ultrasound examination of the fetal brain performed by an expert 27. However, such expertise requires years of experience and cannot be equivalent in all centers, especially in undeveloped countries and remote areas28. Hence, with our AI assistance, the detection rate of fetal CNS anomalies is expected to be improved even in clinical unit lacking of expert. Additionally, the ultrasound images used for training and validation in current AI system were collected by a variety of ultrasound equipments from different companies, which will indicate it can be used universally.