Large language models

Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory

We use information theory to analyze and improve chain-of-thought monitorability, proposing training methods that improve monitor accuracy while preventing CoT degeneration.

Replacing thinking with tool usage enables reasoning in small language models

We replace natural language "thinking" with structured tool interactions, enabling even 3B-parameter models to learn effective test-time compute scaling.