Reasoning Models Reinforcement Learning

Why reinforcement learning plateaus without representation depth (and other key takeaways from NeurIPS 2025)

Why reinforcement learning plateaus without representation depth (and other key takeaways from NeurIPS 2025) ...

New ChatGPT o1-preview reinforcement learning process explained

OpenAI has introduced its latest AI model, ChatGPT o1, a large language model (LLM) that significantly advances the field of AI reasoning. Leveraging reinforcement learning (RL), o1 represents a leap ...

NextBigFuture

Reinforcement Learning Does NOT Fundamentally Improve AI Models

Reinforcement Learning does NOT make the base model more intelligent and limits the world of the base model in exchange for early pass performances. Graphs show that after pass 1000 the reasoning ...

How Google’s 'internal RL' could unlock long-horizon AI agents

Google researchers introduce ‘Internal RL,’ a technique that steers an models' hidden activations to solve long-horizon tasks ...

Geeky Gadgets

Chinese Researchers Crack OpenAI’s o3 Groundbreaking AI Models

Researchers from Fudan University and Shanghai AI Laboratory have conducted an in-depth analysis of OpenAI’s o1 and o3 models, shedding light on their advanced reasoning capabilities. These models, ...

NextBigFuture

OpenAI o1 Model Sets New Math and Complex Reasoning Records

OpenAI o1 is a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding ...

Science News

A look under the hood of DeepSeek’s AI models doesn’t provide all the answers

It’s been almost a year since DeepSeek made a major AI splash. In January, the Chinese company reported that one of its large language models rivaled an OpenAI counterpart on math and coding ...

SiliconANGLE

Microsoft releases small but mighty Phi-4 reasoning AI models that outperform larger models

Microsoft Corp. has released three new advanced small language models artificial intelligence models extending its “Phi” range of AI models that include reasoning capability. The new model releases ...

Analytics India Magazine

New DeepSeek Research Shows Architectural Fix Can Boost Reasoning at Scale

DeepSeek has released new research showing that a promising but fragile neural network design can be stabilised at scale, delivering measurable performance gains in large language models without ...

SiliconANGLE

OpenAI finds DeepSeek used its data to train R1 reasoning model

OpenAI believes its data was used to train DeepSeek’s R1 large language model, multiple publications reported today. DeepSeek is a Chinese artificial intelligence provider that develops open-source ...

Hosted on MSN

Microsoft introduces Phi-4 reasoning SLM models — Still "making big leaps in AI" while its partnership with OpenAI frays

While Microsoft's multi-billion-dollar partnership with OpenAI has seemingly begun fraying, the company is still keen on making its mark in the generative AI landscape first-hand. The software giant's ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results