記事一覧に戻る

New StoSignSGD algorithm fixes SignSGD's convergence problems for training large language models on non-smooth objectives

arXiv cs.LG · 2026年4月20日

AI要約

  • SignSGD has been popular for distributed learning and foundation model training but fails to converge on non-smooth objectives common in modern ML (ReLUs, max-pools, mixture-of-experts)
  • StoSignSGD introduces structural stochasticity into the sign operator while keeping updates unbiased, solving SignSGD's fundamental convergence limitations
  • Theoretical analysis proves StoSignSGD achieves sharp convergence rates matching lower bounds in convex optimization and improves performance in challenging non-convex non-smooth settings
  • The algorithm maintains the computational efficiency benefits of sign-based methods while extending applicability to modern neural network architectures

関連記事

AIニュースを毎日お届け

200以上のソースから厳選したAIニュースを毎日無料でお届けします。

無料で始める