Results
Upon running the script for 29 clusters and 2 clusters, respectively, I gathered the information on two confusion matrices and four Rand Index values.
For the scenario with 29 clusters, the confusion matrix can be found in Table \ref{730387}. The resulting Rand Index value was 0.8550275740313222, which relatively close to 1. This value by itself could mean a good level of similarity. However, looking at the confusion matrix, it is possible to note the vast majority of samples were categorized on the very same cluster 0 and even though the white label dominates every other cluster, it is also the majority in cluster 0. That means that even though one could say all clusters but 0 have a high probability of not being related to ransomware, there isn't enough confidence to actually pinpoint the ones that are. That could mean a high rate of False Negatives . This becomes clearer when taking into account the value of Rand Index adjusted for chance, which yielded -2.918553945107504. A value below 0 is not reliable.
In Table \ref{178847} the confusion matrix for the binary scenario is presented. The negative label was assigned to all rows with original label white, and positive to the rest, that is to the ones that are known to belong to some family of ransomware. The Rand Index values were a bit better in this scenario: 0.9719860133842922 and -0.49493780907250057 for the regular and the adjusted values, respectively, which still means an unreliable level of similarity. The data from the confusion matrix follows the same path as the findings of the previous scenario. Even though there is a really high True Negative Rate (0.9857911192), the True Positive Rate is zero, which means the model was unable to correctly guess a single suspicious transaction.