where \(\alpha\) and \(\beta\) are weighting hyperparameters of the total loss. Both of them were set to 0.5 in our experiments.
To leverage the benefits of multiple sequences, we utilize the weighted average ensemble learning-based method. The outputs of the three separated models are incorporated, thus contributing to the final ensemble prediction \({r_{ens}}\) as follows: