HC uses kronecker factorized down&up projections, but it needs a mixing operation H^res. mHC strategically requires H^res to be doubly stochastic matrices, which is a closure w.r.t. matmul. (orthogonal maybe also work) LatentMoE uses full matrices for down&up projections.