Self-Taught Self-Correction for Small Language Models Paper • 2503.08681 • Published Mar 11, 2025 • 16
Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards Paper • 2605.14539 • Published May 14 • 7