New StoSignSGD algorithm fixes SignSGD's convergence problems for training large language models on non-smooth objectives

arXiv cs.LGApr 20, 20261 min read

Summaries like this, in your inbox every morning.

3 Key Points

SignSGD has been popular for distributed learning and foundation model training but fails to converge on non-smooth objectives common in modern ML (ReLUs, max-pools, mixture-of-experts)
StoSignSGD introduces structural stochasticity into the sign operator while keeping updates unbiased, solving SignSGD's fundamental convergence limitations
Theoretical analysis proves StoSignSGD achieves sharp convergence rates matching lower bounds in convex optimization and improves performance in challenging non-convex non-smooth settings
The algorithm maintains the computational efficiency benefits of sign-based methods while extending applicability to modern neural network architectures

No comments yet. Be the first to share your thoughts!

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack