Table 4 illustrates the classification results (i.e. csPCa or non-csPCa). Seven sequence combinations are involved for comparisons. The former and the later domains in the table are denoted as the source and target domains, respectively. We define such pairs of domains/cohorts as DA settings. First, we compare CMD²A-Net with the separate and joint models (in Table 2 ) in terms of AUC. Take the first DA setting (P-x → LC-A) as an example. In the T2 sequence, CMD²A achieves an AUC of 0.87 in the target domain (i.e. LC-A), outperforming both the separate model (AUC: 0.61) and the joint model (AUC: 0.67). Consistent findings can be observed in ADC and hDWI. When it comes to the other three DA settings, CMD²A-Net also demonstrates its advantage in resolving domain shift between two of our datasets. This validates our hypothesis that incorporating prostate lesion information in prior to the DA process can facilitate PCa classification.
Second, we analyze our model’s PCa classification performance using a single sequence, i.e. T2, ADC, or hDWI. In most source-target DA settings, T2 is the most effective, while ADC receives the lowest AUC. Sequence, hDWI, shows unstable performance in the four DA settings. For example, it accomplishes the most superior performance (w.r.t. AUC, SEN, and SPE) in “P-x → LC-B”, but underperforms T2 and ADC in “LC-A → LC-B”. This could be caused by heterogeneous b-values among the domains. As shown in Table 1 , b-values of 50, 400, and 800s/mm2 were employed on P-x, while 0 and 1,400s/mm2 were used in LC-A, 1,000 and 1,400 s/mm2 were used in LC-B. Thus, we can conclude that the significant discrepancies in the acquisition parameters would result in the inconsistent performance of hDWI. Note that there had no widely accepted guidelines regarding b-value until the release of PI-RADS in 2019, which recommended a minimum value of 1,200s/mm2.
We also investigate effect of ensemble learning using multiple sequences, which could provide references to choose appropriate sequences for PLDC. In each DA setting, the models using multiple sequences are always more effective than using any single sequence alone. Besides, although ADC or hDWI always leads to the worst classification results, T2 ensembled with one/both of them can explicitly enhance the model’s performance. This finding is consistent with the clinical practice of using mpMRI for PCa diagnosis. Sequences ADC and hDWI are usually considered as secondary references by radiologists. It should be noted that the all-sequence-ensembled (i.e. ensemble of T2, ADC, and hDWI) models show significant predictions in most DA settings. Although ensemble of the three sequences could not lead to the best performance in the second DA setting (i.e. P-x → LC-A), the model of the second DA setting still attains a remarkable AUC of 0.91, which is only about 1% smaller than the highest AUC (0.92). It can be concluded that using more sequences would help multi-cohort MRI[14] harmonization, thus boosting the final classification performance. Moreover, with the same target domain (i.e. either LC-A or LC-B), the CMD²A-Net transferred from P-x attains a higher AUC than transferred from a local cohort domain in each sequence combination. This implies more source samples could enhance the model’s cross-domain knowledge transferability, thus improving the model’s generalization in the target domain. The superior performance also demonstrates CMD²A-Net’s capability of transferring the knowledge of a public dataset to our local cohort domains.