Introduction
Kidney cancer, [the 8th most common cancer in the United States, is estimated to have 63,990 new cases and 14,400 [deaths each year] \citep{riesseeraa}\cite{Gandaglia_2014} . The severity kidney cancer is assessed [via the] American Joint Committee on Cancer (AJCC) staging system [which is based on] three factors: (T) the size of the tumor, (N) the presence and the extent of cancer appearing in the lymph nodes, (M) and the presence of the cancer metastasizing to other organs in the body \cite{Edge_2010}.Unfortunately, staging information is often unavailable in electronic medical record systems (EMR) except in the form of text notes. The problem with using text notes in retrospective research is that the have the potential to contain HIPAA protected information and they are very resource-intensive to abstract. However, explanatory and predictor variables are readily available as coded or numeric values in most EMR cancer systems. If a clinical decision support tool could leverage these widely available variables, it could be updated automatically and continuously from EMR data. Data warehouses like i2b2 \cite{Murphy_2009} make it possible for researchers to access large number of patient histories and use them in retrospective studies, including the development of predictive models that could form the basis of future decision support tools. In this study we evaluated 38 potential predictor variables for metastasis among patients who have already been diagnosed with kidney cancer using several different approaches. This sets the stage for fitting a multi-variate risk model with the variables we tested here. Our scripts are publicly available via GitHub and are designed to be adaptable to analyzing EMR data for patients with other types of cancer.