The final situation after training is complete. With the introduction of the clipped gradients, the transition information is more spread out within the cell state than before, or, possibly, more "micro-transitions" are captured (obviously trying to overfit).