Post-training

Post-training has two main stages: supervised fine-tuning (SFT), which teaches the model to imitate desired response formats and behaviors, and reinforcement learning, which further optimizes the model against a reward signal.