Table 4 illustrates the classification results (i.e. csPCa or
non-csPCa). Seven sequence combinations are involved for comparisons.
The former and the later domains in the table are denoted as the source
and target domains, respectively. We define such pairs of
domains/cohorts as DA settings. First, we compare CMD²A-Net with the
separate and joint models (in Table 2 ) in terms of AUC. Take
the first DA setting (P-x → LC-A) as an example. In the T2 sequence,
CMD²A achieves an AUC of 0.87 in the target domain (i.e. LC-A),
outperforming both the separate model (AUC: 0.61) and the joint model
(AUC: 0.67). Consistent findings can be observed in ADC and hDWI. When
it comes to the other three DA settings, CMD²A-Net also demonstrates its
advantage in resolving domain shift between two of our datasets. This
validates our hypothesis that incorporating prostate lesion information
in prior to the DA process can facilitate PCa classification.
Second, we analyze our model’s PCa classification performance using a
single sequence, i.e. T2, ADC, or hDWI. In most source-target DA
settings, T2 is the most effective, while ADC receives the lowest AUC.
Sequence, hDWI, shows unstable performance in the four DA settings. For
example, it accomplishes the most superior performance (w.r.t. AUC, SEN,
and SPE) in “P-x → LC-B”, but underperforms T2 and ADC in “LC-A →
LC-B”. This could be caused by heterogeneous b-values among the
domains. As shown in Table 1 , b-values of 50, 400, and
800s/mm2 were employed on P-x, while 0 and
1,400s/mm2 were used in LC-A, 1,000 and 1,400
s/mm2 were used in LC-B. Thus, we can conclude that
the significant discrepancies in the acquisition parameters would result
in the inconsistent performance of hDWI. Note that there had no widely
accepted guidelines regarding b-value until the release of
PI-RADS in 2019, which recommended a minimum value of
1,200s/mm2.
We also investigate effect of ensemble learning using multiple sequences, which could provide references to choose appropriate sequences for PLDC. In each DA setting, the models using multiple sequences are always more effective than using any single sequence alone. Besides, although ADC or hDWI always leads to the worst classification results, T2 ensembled with one/both of them can explicitly enhance the model’s performance. This finding is consistent with the clinical practice of using mpMRI for PCa diagnosis. Sequences ADC and hDWI are usually considered as secondary references by radiologists. It should be noted that the all-sequence-ensembled (i.e. ensemble of T2, ADC, and hDWI) models show significant predictions in most DA settings. Although ensemble of the three sequences could not lead to the best performance in the second DA setting (i.e. P-x → LC-A), the model of the second DA setting still attains a remarkable AUC of 0.91, which is only about 1% smaller than the highest AUC (0.92). It can be concluded that using more sequences would help multi-cohort MRI[14] harmonization, thus boosting the final classification performance. Moreover, with the same target domain (i.e. either LC-A or LC-B), the CMD²A-Net transferred from P-x attains a higher AUC than transferred from a local cohort domain in each sequence combination. This implies more source samples could enhance the model’s cross-domain knowledge transferability, thus improving the model’s generalization in the target domain. The superior performance also demonstrates CMD²A-Net’s capability of transferring the knowledge of a public dataset to our local cohort domains.