With severe discrepancies among our datasets, we intend to validate whether the rigorous MR[10] image preprocessing methods can contribute to the joint models’ classification performance. Similar to scaled, whitening is another common preprocessing method, capable of normalizing the pixel values with a mean of zero and a variance of unit. Taking the combined dataset, P-x and LC-A, as a representative for evaluation. In Table 3, scaled, whitening, and their combined function with bias field correction (BFC) or noise filtering (NF), 6 preprocessing methods in total (see details in Supplementary Figure 3) , were adopted as in [8]. The joint models using scaled and whitening acted as the two baselines for comparisons with the rigorous MR image preprocessing methods (i.e. BFC and NF). Figure 1 depicts the image preprocessing examples of three methods (i.e. whitening, whitening + BFC, and whitening + NF). The left and right halves of each sample represent before and after preprocessing, respectively. Before preprocessing, we can observe noticeable intensity distribution discrepancies on the samples. The samples from LC-A are characterized by larger numbers of low-intensity grayscale pixels as compared with the images of P-x. Subsequently, the jet color maps were employed to highlight the intensity distribution between domains after preprocessing. All the color maps share the same intensity color scale. Similar intensity distributions can be found among the samples after preprocessing, demonstrating the effectiveness of the methods in image distribution harmonization.
In Table 3, for the T2 sequence, BFC with either scaled or whitening outperforms the baselines. Besides, BFC with whitening achieves best AUCs of 0.91 and 0.80 on P-x and LC-A, respectively. However, these findings are not consistent with the results in ADC and hDWI. In terms of ADC, the models preprocessed with BFC or NF underperform the baselines. Instead, the baseline models receive the highest AUCs, where scaled alone and whitening alone accomplish 0.73 and 0.72 on P-x and LC-A, respectively. When it comes to the sequence of hDWI, either BFC or NF attributes limited improvement over the baselines. On P-x, the AUC increases marginally from 0.73 (scaled only) to 0.80 (scaled with NF); on LC-A, only an AUC of 0.65 is achieved using scaled with BFC. The above results of the three sequences show that these pre-processing approaches could improve CM-Net’s classification performance when combing our two datasets. However, none of the methods is capable of boosting the joint models’ generalization considerably, as compared with the separate models of P-x and LC-A (in Table 2). This indicates that the preprocessing methods are probably insufficient to solve domain shift fundamentally. A possible reason is that the severe discrepancies do not come from the inter-site discrepancies (in Table 1), rather than the intensity distribution of the heterogeneous mpMRI[11]  sequences only (see details in Supplementary Figure 2).