Does compressing activations help model parallel training?

Published in MLSys, 2024