How does Gemini 3.5 Flash's computer control performance compare to other models?

On the OSWorld benchmark, Gemini 3.5 Flash scores 78.4, beating Gemini 3 Flash (65.1) and GPT-5.4 mini (72.1). GPT-5.5 scores 78.7 and Anthropic's Opus 4.8 leads at 83.4.

How is Google protecting against misuse of this screen-control feature?

Google uses adversarial training and offers two optional enterprise safeguards: one requires user confirmation for sensitive or irreversible actions, while the other automatically stops tasks when it detects indirect prompt injections. Google also recommends sandboxing, human oversight, and strict access controls.

Where can developers access this feature?

The feature is available through the Gemini API and the Gemini Enterprise Agent Platform. A Browserbase demo and a GitHub reference implementation are also available.

Back to articlesLarge Language Models

Large Language Models

Google embeds screen-control AI directly into Gemini 3.5 Flash, letting developers build agents that can see and operate computers, browsers, and phones automatically.

THE DECODER19h ago5 min read

Key takeaway

Google has integrated computer-control capability directly into its Gemini 3.5 Flash AI model, enabling it to see and operate screens across computers, browsers, and mobile devices. On benchmark tests, the model ranks highly for this task, making it practical for developers to build automated agents for software testing and office work. The feature includes built-in safeguards and is available now through Google's Gemini API and Enterprise Agent Platform.

Summaries like this, in your inbox every morning.

3 Key Points

What happened
Google integrated "Computer Use" into Gemini 3.5 Flash, allowing the model to see, understand, and interact with computers, browsers, and mobile devices on its own. Previously this capability was only available as a separate Gemini 2.5 model. The feature is now available through the Gemini API and the Gemini Enterprise Agent Platform.
Why it matters
On the OSWorld benchmark, Gemini 3.5 Flash scores 78.4, beating Gemini 3 Flash (65.1) and GPT-5.4 mini (72.1), putting it among the top-performing models for computer interaction tasks. This opens the door for developers to build agents that automate software testing, office tasks, and browser workflows across multiple device types.
What to watch
Google has built in two optional enterprise safeguards—one requiring user confirmation for sensitive actions, and another automatically stopping tasks when indirect prompt injections are detected. The company also recommends sandboxing, human oversight, and strict access controls to guard against abuse.

FAQ

How does Gemini 3.5 Flash's computer control performance compare to other models?: On the OSWorld benchmark, Gemini 3.5 Flash scores 78.4, beating Gemini 3 Flash (65.1) and GPT-5.4 mini (72.1). GPT-5.5 scores 78.7 and Anthropic's Opus 4.8 leads at 83.4.
How is Google protecting against misuse of this screen-control feature?: Google uses adversarial training and offers two optional enterprise safeguards: one requires user confirmation for sensitive or irreversible actions, while the other automatically stops tasks when it detects indirect prompt injections. Google also recommends sandboxing, human oversight, and strict access controls.
Where can developers access this feature?: The feature is available through the Gemini API and the Gemini Enterprise Agent Platform. A Browserbase demo and a GitHub reference implementation are also available.

Discussion

No comments yet. Be the first to share your thoughts!

Unable to generate summary — the article body provided contains no news content, only a newsletter registration prompt.

Nikkei AI Stocks1h ago

GitHub Copilot's shared AI harness matches rival model-vendor tools on task completion while using fewer tokens, letting developers mix-and-match models without sacrificing performance.

GitHub Copilot Blog4h ago

OpenAI will release GPT-5.6 in limited preview with Trump administration case-by-case approval, a less restrictive arrangement than the export controls imposed on rival Anthropic.

The Verge AI4h ago

The Trump administration is pressuring OpenAI to limit early access to its new GPT 5.6 model to select partners before a broader public release, shifting from its previous hands-off AI stance.

TechCrunch AI4h ago

Visa partners with AI and stablecoin firms to build payments infrastructure for autonomous agents and digital assets, diversifying revenue beyond traditional card fees.

Top Companies AI — US (1/2)7h ago

TrueFoundry acquires MLOps pioneer Seldon AI to combine infrastructure for deploying AI agents at scale, addressing a gap where only 14% of enterprises have moved AI pilots into production.

Top Companies AI — US (1/2)7h ago

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →