Comparison of performance between AI network and doctors
The AI outperformed the average efficacy of 13 doctors with respect to
the overall types of malformations detection as shown in Table 3 and
Figure 4a, the doctors’ diagnostic accuracy [65.4% (95% CI
57.3-73.7%), p = 0.002], sensitivity [88.2% (82.3%-94.1%), p =
0.003], specificity 63.3% [(54.6-72.0%), p = 0.041] and AUC[
0.758 (0.694, 0.821), p = 0.004] were all lower to that of AI system.
When compared AI performance with that of three groups of doctors
respectively, we found the performance of AI model was similar to that
of the expert doctors in terms of accuracy [ 78.9% (95%CI
75.2-82.5%), p = 0.528], sensitivity [77.5% (95%CI 73.7-81.4%),
p = 0.521], and AUC [0.853 (95% CI 0.800-0.905), p = 0.681],
while the performance of AI was higher than that of the competent
{[accuracy: 69.6% (95% CI 75.2-85.2%), p = 0.016];
[sensitivity: 67.5% (95% CI 59.7-75.3%), p = 0.021]; [AUC:
0.793 (95% CI 0.777-0.809), p = 0.001]} and that of the trainees as
well{[ accuracy: 51.5%, 95% CI (39.4-63.6%), p = 0.001];
[sensitivity: 48.6% ( 95% CI 36.0-61.2%), p = 0.003]; [AUC:
0.654( 95% CI 0.538-0.770), p = 0.008) ]}. However, specificity of
AI did not differ to those of three categories of doctors. The
comparison in performance between AI system and the various doctors is
shown in Table 3 and Figure 4b.
The developed AI algorithm could analyze 7–8 images per second(s) and
took only 113s to complete the diagnosis of 812 ultrasound image. The
time consuming was significantly less than the average time of the 13
doctors (113s vs. 11571s, p = 0.001). When compared with the subgroups,
the time of the diagnosis process were also shorter than three groups of
doctors respectively [113s vs. 8864s (expert), p=0.02; 12801s
(competent), p=0.003; 12663s (trainee). p = 0.001].