Table 2: Representation of the input file used in the deep neural networks. The total number of datapoints (including all the modifications) is 1584 and 3 input parameters (2 X-Parameters and 1 Y-parameter) were used
2.5.2 Tokenization of the data:Character tokenization was performed to convert text (Table 2) into a list of characters using keras pre-processing library [29]. It builds a corpus of all characters and assigns a number to each character. After tokenization, 1584x15 dimension matrix was converted to a string of integers (Table 3). (15 corresponds to the sequence length, that is of window size 7). Finally, these integers were considered as X parameters, rather than the alphabetical characters (column 2 of Table 2). The Y-parameter was processed using Label Binarizer [30] which accepts categorical data as input and returns a NumPy array. The training dataset was randomly split into train and test dataset in a ratio of 2:1 using the Sklearn library function train_test_split. Thus, 1584 data points produced 1061 entries for training, and 523 for testing.