For compiling, you must specify some additional properties for training the network such as loss function, and optimization algorithm (optimizer). In this example, we will use categorical entropy as the loss function, stocastic gradient decent 'sgd' as the optimizer [ref to Geoff Hinton's paper], and accuracy 'acc' as the evaluation measure