. We used non-Hermitian complex matrices of size up to XXX and obtained a ~10-15x speedup compared to MAGMA's geev, with higher accuracy (SI XXX). As explained in the main text, this speedup is due to the high performance of matrix multiplication on parallel architectures, which standard direct algorithms (which rely on Hessenberg reduction) cannot achieve.