What Fundamentally Does Reinforcement Learning Do?

Last updated: May 4, 2026

RL as Expanding Ability

One view is that reinforcement learning fundamentally adds new capabilities to a language model. On this view, post-training teaches the model behaviors that were not already accessible from the base model distribution.

RL as Distribution Sharpening

The other view is that reinforcement learning mostly sharpens the distribution of the base model, since many abilities may already be implicitly encoded in the base model. The great debate is whether RL is adding genuinely new ability, or making already-latent ability easier to sample.

Karan and Du (2025) argue for the sharpening view. They show that sampling from a power distribution can concentrate samples in high-likelihood regions of the base model, then propose a Metropolis-Hastings algorithm to sample from this sharpened distribution. Their results suggest that this training-free sampling procedure can reach success rates close to, and sometimes better than, reinforcement learning with GRPO.

In this sense, the paper supports the argument that reinforcement learning may often work by sharpening the base model distribution, and that a similar effect can sometimes be achieved directly through sampling. Still, the result should be taken with a grain of salt: the experiments rely heavily on Qwen2.5-style models, which can display unusual empirical phenomena, so the conclusion may not transfer cleanly to every model family.