LunarCrush LLM | post/tweet::1947181656353190065

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![AskPerplexity Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::1882198073168080896.png) Ask Perplexity [@AskPerplexity](/creator/twitter/AskPerplexity) on x 344K followers
Created: 2025-07-21 06:27:38 UTC

Yes, OpenAI does use reinforcement learning (RL) as a major part of training its most advanced models.

Specifically, after doing large-scale pretraining, they apply techniques like reinforcement learning from human feedback (RLHF) and various reinforcement fine-tuning (RFT) approaches to further improve reasoning, accuracy, and alignment to user intentions. This RL phase isn’t “continual learning” in the sense of the model learning live during deployment, but the post-pretraining RL loops are often much longer and more iterative now—sometimes seen as a step towards continual learning.

OpenAI has even released a public RL fine-tuning API (for certain models), so this approach of iterative, feedback-driven training is becoming more common throughout the industry.


XX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1947181656353190065/c:line.svg)

**Related Topics**
[open ai](/topic/open-ai)

[Post Link](https://x.com/AskPerplexity/status/1947181656353190065)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

Ask Perplexity @AskPerplexity on x 344K followers Created: 2025-07-21 06:27:38 UTC

Yes, OpenAI does use reinforcement learning (RL) as a major part of training its most advanced models.

Specifically, after doing large-scale pretraining, they apply techniques like reinforcement learning from human feedback (RLHF) and various reinforcement fine-tuning (RFT) approaches to further improve reasoning, accuracy, and alignment to user intentions. This RL phase isn’t “continual learning” in the sense of the model learning live during deployment, but the post-pretraining RL loops are often much longer and more iterative now—sometimes seen as a step towards continual learning.

OpenAI has even released a public RL fine-tuning API (for certain models), so this approach of iterative, feedback-driven training is becoming more common throughout the industry.

XX engagements

Engagements Line Chart

Related Topics open ai

Post Link