Google launches Gemini 3.5 Flash with 1M-token context and Gemini Omni video generation at I/O 2026; processes 3.2 quadrillion tokens/month across 900M+ monthly users

Latent SpaceMay 20, 20262 min read

Summaries like this, in your inbox every morning.

3 Key Points

Gemini 3.5 Flash is now generally available with 1M-token context window, 65k max output tokens, four thinking levels (minimal, low, medium, high), and thought preservation across multi-turn conversations; Google claims it runs 4x faster than comparable frontier models and up to 12x faster in Antigravity (a platform for running background agents and long-horizon tasks).
Gemini Omni, a new family combining Gemini reasoning with generative media capabilities, takes text/image/video/audio inputs and produces video edits and generation in Gemini, Flow, Shorts, and later via APIs; Omni Flash is available in Gemini and Flow today for paid users and in Shorts and Create starting this week for free users.
Google reports processing 3.2 quadrillion tokens/month, up 7x year-over-year from 480 trillion/month; Gemini app has 900M+ monthly users and is available in 230+ countries and 70+ languages. Artificial Analysis benchmarks show Gemini 3.5 Flash at Intelligence Index 55 (+9 vs. Gemini 3 Flash), GDPval-AA 1656 Elo, and pricing of $1.50 / $9.00 per 1M input/output tokens; independent Arena reports the model at #9 in Text Arena and #9 in Code Arena: Frontend with a score of 1507 (+70 over Gemini 3 Flash).

No comments yet. Be the first to share your thoughts!

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack