loading page

A Mathematical Assessment of the Isolation Random Forest Method for Anomaly Detection in Big Data.
  • Fernando Morales,
  • Jorge Ramírez,
  • Edgar Ramos
Fernando Morales
Universidad Nacional de Colombia Sede Medellin
Author Profile
Jorge Ramírez
Universidad Nacional de Colombia Sede Medellin
Author Profile
Edgar Ramos
Universidad Nacional de Colombia Sede Medellin
Author Profile

Abstract

We present the mathematical analysis of the Isolation Random Forest Method (IRF Method) for anomaly detection, introduced in {\sc F.~T. Liu, K.~M. Ting, Z.-H. Zhou:}, {\it Isolation-based anomaly detection}, TKDD 6 (2012) 3:1–3:39. We prove that the IRF space can be endowed with a probability induced by the Isolation Tree algorithm (iTree). In this setting, the convergence of the IRF method is proved, using the Law of Large Numbers. A couple of counterexamples are presented to show that the method is inconclusive and no certificate of quality can be given, when using it as a means to detect anomalies. Hence, an alternative version of the method is proposed whose mathematical foundation is fully justified. Furthermore, a criterion for choosing the number of sampled trees needed to guarantee confidence intervals of the numerical results is presented. Finally, numerical experiments are presented to compare the performance of the classic method with the proposed one.

Peer review status:UNDER REVIEW

26 Jan 2021Submitted to Mathematical Methods in the Applied Sciences
27 Jan 2021Assigned to Editor
27 Jan 2021Submission Checks Completed
30 Jan 2021Reviewer(s) Assigned