PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning

Published in arXiv, 2026