AIToday

Google embeds screen-control AI directly into Gemini 3.5 Flash, letting developers build agents that can see and operate computers, browsers, and phones automatically.

THE DECODER19h ago5 min read
Google embeds screen-control AI directly into Gemini 3.5 Flash, letting developers build agents that can see and operate computers, browsers, and phones automatically.

Key takeaway

Google has integrated computer-control capability directly into its Gemini 3.5 Flash AI model, enabling it to see and operate screens across computers, browsers, and mobile devices. On benchmark tests, the model ranks highly for this task, making it practical for developers to build automated agents for software testing and office work. The feature includes built-in safeguards and is available now through Google's Gemini API and Enterprise Agent Platform.

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  • What happened

    Google integrated "Computer Use" into Gemini 3.5 Flash, allowing the model to see, understand, and interact with computers, browsers, and mobile devices on its own. Previously this capability was only available as a separate Gemini 2.5 model. The feature is now available through the Gemini API and the Gemini Enterprise Agent Platform.

  • Why it matters

    On the OSWorld benchmark, Gemini 3.5 Flash scores 78.4, beating Gemini 3 Flash (65.1) and GPT-5.4 mini (72.1), putting it among the top-performing models for computer interaction tasks. This opens the door for developers to build agents that automate software testing, office tasks, and browser workflows across multiple device types.

  • What to watch

    Google has built in two optional enterprise safeguards—one requiring user confirmation for sensitive actions, and another automatically stopping tasks when indirect prompt injections are detected. The company also recommends sandboxing, human oversight, and strict access controls to guard against abuse.

FAQ

How does Gemini 3.5 Flash's computer control performance compare to other models?
On the OSWorld benchmark, Gemini 3.5 Flash scores 78.4, beating Gemini 3 Flash (65.1) and GPT-5.4 mini (72.1). GPT-5.5 scores 78.7 and Anthropic's Opus 4.8 leads at 83.4.
How is Google protecting against misuse of this screen-control feature?
Google uses adversarial training and offers two optional enterprise safeguards: one requires user confirmation for sensitive or irreversible actions, while the other automatically stops tasks when it detects indirect prompt injections. Google also recommends sandboxing, human oversight, and strict access controls.
Where can developers access this feature?
The feature is available through the Gemini API and the Gemini Enterprise Agent Platform. A Browserbase demo and a GitHub reference implementation are also available.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →