Figure 5: Elbow method for choosing the number of clusters with k-means clustering (left hand side); median and slope plot with resulting clusters (right hand side).

Implementation

As shown in Figure 2, the distillation PEA is equipped with a PLC as a PEA internal control unit. An OPC UA server is located on this PLC, on which all process variables are published relevant for the process. The modular concept with OPC UA provides a standardized server interface, on which the ML algorithm can be adapted. The ML-tool developed in Python has an OPC UA client taken from the Python package freeopcua , which reads the process variables every second and processes them into a data frame. This data frame can be processed further as shown above. Therefore, the data is pre-processed according to chapter 3.1 and the developed models are applied to the data frame. As output the models provide on the one hand the prediction of the pressure for the next 20 seconds. On the other hand, the current operating status is classified from the prediction. The structure of the ML forecast implementation as well as the results plotted in a real time diagram are shown in Figure 2.
The diagram shows the curve of the pressure difference, the filtered pressure curve and the prediction of future pressure difference. The current operating status is displayed above the diagram as text, which informs the operator if flooding occurs.

Distillation experiments and optimization

As validation data, a test procedure is carried out, in which the column is flooded several times. Care is taken to ensure that the flooding is generated by various parameter changes in order to check whether the influence of all parameters on the flooding behavior is reliably mapped. It could be shown that all three trained algorithms (gradient boost, extra trees, AdaBoost + extra trees) are able to detect and reliably display the flooding behavior. The accuracies (coefficient of determination) of the different models with respect to the validation data are R²gradient boost = 0.878, R²extra trees = 0.853 and R²AdaBoost + extra trees = 0.857.
The results of the flooding detection for the trained and selected models are shown in Figure 7. It can be seen that all three models have similar accuracies, but the resulting prediction curves show significant differences. The prediction of the combined model of AdaBoost and extra trees regression shows a strongly fluctuating behavior, which makes it difficult to evaluate the forecast. The pure regression by extra trees is much smoother, but the prediction is too flat and has problems to follow the current pressure curve. In contrast, the Gradient Boost model shows a significantly more reactivated response with sufficient smoothing of the prediction curve, which makes this model the most suitable solution out of the three models investigated.