記事一覧に戻る

Researcher trains small LLM on pre-1900 text to test whether it can derive quantum mechanics and relativity from experimental observations

Hacker News · 2026年4月28日

AI要約

  • A researcher preprocessed ~22 billion tokens of clean text from before 1900, sourced from Institutional books, British Library books, and American Stories newspapers on HuggingFace, with aggressive filtering to remove post-1900 information leaks (including documents mentioning Einstein, quantum mechanics, or relativity).
  • The project aims to train a transformer-based model to generate conceptually correct explanations for four historical physics breakthroughs: UV catastrophe and Planck's Law, the photoelectric effect, special relativity, and general relativity. Success is defined as the model producing coherent explanations with physically correct phrases such as "discrete packets of light" and "mass bending spacetime."
  • The experiment is framed as a test of whether modern LLM methods can perform meaningful out-of-distribution reasoning—the ability to derive new insights from data outside their training distribution—by replicating the intellectual feats of Einstein and Planck using only pre-1900 scientific knowledge.

関連記事

AIニュースを毎日お届け

200以上のソースから厳選したAIニュースを毎日無料でお届けします。

無料で始める