AI tests and comparison with human doctors
An external test set of 812 images from 449 patients was used to evaluate the performance of AI networks. The diagnostic accuracy, specificity, and sensitivity of AI in identifying CNS malformations were calculated, and the ROC curves were generated to evaluate the performance of the established AI algorithm. The performance of AI was then compared with that of doctors, who reviewed the same images in a separate testing. In this testing, images were shown one by one on the personal computer screen in a random order, and each image was along with 13 diagnosis choices (12 types of CNS abnormalities and normal). Ultrasonic doctors from different hospitals with varying degrees of expertise, who had experience >10 years (expert), 5-10 years (competent), and 1 year (trainee), reviewed one image with an optimal diagnosis and turned to the next image without returning to the previous one. The processing time for reading each image was recorded. All the doctors were blind to the diagnoses of images.