Figure 7
Subsequent packets are encrypted. Figure 6 illustrates this scenario. To prevent such LE packets from skewing entropy calculation our algorithms wait until N Sequential High EntropyPackets have been detected before calculating entropy. Unfortunately, there is no clear way to estimate N, so we determine the value of N experimentally. For our datasets N = 2 seems to work best.
We now describe briefly the flow-based and packet-based algorithms. Recall that both algorithms aim at labeling a flow as HE or LE, but the former does so by examining the entire flow data, where the latter examines each packet separately.
1) Flow-based Entropy:
After detecting N Sequential High Entropy Packets we capture the payload of all subsequent packets and then calculate the cumulative entropy of the resulting data (including the initial HE packets). We then compare the cumulative entropy with the threshold, as described earlier. If the cumulative entropy is greater than the threshold, then the flow is identified as HE, else it is LE.
2) Packet-based Entropy:
After detecting N Sequential high entropy packets, we calculate the entropy for each packet and classify it as HE or LE. At the end of the flow we count the number of HE and LE packets, denoted as N (HE) and N (LE). If N (HE)/(N(HE)+N(LE)) is greater than our threshold, which is named High Entropy Packet Percentage Threshold, then we consider the flow as HE. Figure 7 illustrates this approach.