AIToday

Researchers explain why larger language models learn rare tasks that smaller ones cannot master, even with extended training

THE DECODER19h ago2 min read
Researchers explain why larger language models learn rare tasks that smaller ones cannot master, even with extended training

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    A study by researchers at Anthropic, Stanford, and other institutions found that small models can fail to reliably learn tasks that make up just 0.25 percent of the training data, while only larger models learn such rarely interspersed tasks. The team trained OLMo models ranging from 4 million to 4 billion parameters on up to 210 billion tokens from the Dolma corpus, mixing in artificial tasks like number comparison and modular addition at varying frequencies.

  2. 2

    The mechanism: smaller models fall into an 'update-and-forget' loop where rare task signals are erased by subsequent training steps on frequent tasks, whereas larger models retain enough capacity to hold onto rare signals between observations. Once a large model masters frequent tasks, freed-up capacity allows it to build on rare patterns, while small models rarely reach that point.

  3. 3

    The study proposes a practical alternative: instead of scaling up model size, increasing the frequency of a target task in training data can anchor a specific skill in smaller models, rather than requiring larger models to learn the rare task.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →