3.1.3 Entropy Background
Let w be a word composed of N characters over alphabet Σ = (a0,
a1…am−1). For each ai∈Σ, we can count its occurrences, denoted as ni,
and then the frequency fiof aiis calculated as ni/N. The entropy of w is
estimated as:
Where MLE is the maximum likelihood estimator. When the characters are
bytes, the maximum entropy is 8 (bits per byte). The above estimator is
a little different from the original definition of entropy, which is:
However, the approximation\(H_{N}^{\text{MLE}}\left(w\right)\sim H(p)\)is validwhen N
>> m. In practice, the condition N
>> m couldbe an issue because some packets are
short and less than m.[9] introduces an approach to solve this
problem. Firstly, theauthors estimate the average sample entropy HN(p),
which isdefined as average of \(H_{N}^{\text{MLE}}\left(w\right)\)over all words w of lengthN. When p is uniform distribution μ, HN(μ) can
be calculatedas:
With c = N m and o (1) is less than 0.004. Similar to [9], we use
the Monte-Carlo method to estimate the standard deviation SD (ˆHMLEN).
Finally, we use HN(μ)− 4 ∗ SD( ˆH MLEN ) as the entropy threshold to
decide whether the data under investigation is high or low entropy. Note
that the above methodology applies to both the entire flow and
individual packets. In the former case we consider data in the entire
flow, while in the latter we consider the payload of each packet
individually.
B. HE Flow Detection
We now present the flow- and packet-based algorithms to designate a flow
as HE. We will then compare these algorithms by applying them on data of
known, popular types. While they encrypt user data, many encryption
protocols exchange packets at the beginning of a flow that are not
encrypted (and thus low entropy). After the initial exchange,