Measuring History: The Impact of Data Set Selection on Research Findings
  • Andrey Butenko
Andrey Butenko
University of Washington
The application of data to political science changed the  way research is done, and modern capabilities are advancing the field even  further. GDELT and ICEWS are two modern data sets, recording events and the  actors involved. The research question of this investigation explores the  extent to which the representations of the Euromaidan protests in the GDELT and  ICEWS data sets correlate.
Some existing literature comparing these data sets has been  published, and their findings have been summarized in this paper to establish  context on the current state of the field. Two research papers were consulted:  one led by Michael D. Ward and the other Philip A. Schrodt, both experienced in  this field of event data analysis.
However, the bulk of this paper consists of original  analysis of the public GDELT and ICEWS data in three categories. Conflict  Development explores what types of actions are represented in the two data  sets. The Actors section compares the activity of Ukraine and Russia relative  to each other, as well as the sub-national actors accounted for in each data  set. Finally, examining International Response – the intensity of actions taken  by other countries before and during the conflict – shows how behaviors change  during times of conflict.
The conclusions find that although there are minor  differences in results between the two data sets, both arrive to roughly the  same findings. The discrepancies only become apparent when looking at specific  sub-national actors and event types, whereas high-level research is unlikely to  be significantly affected by data set choice.