Introduction
In this report I describe the process I used to process a large data set containing authentication events collected from the internal network at the Los Alamos National Laboratory’s. This data represents authentication events collected from individual Windows-based desktop computers, servers, and Active Directory servers. Each event is on a separate line in the form of "time,source user@domain,destination user@domain,source computer,destination computer,authentication type,logon type,authentication orientation,success/failure" and represents an authentication event at the given time. The values are comma delimited and any fields that do not have a valid value are represented as a question mark ('?').
Here are three lines from the data as an example: