On Wednesday, Google unveiled Gemini 2.0, the subsequent era of its AI-model household, beginning with an experimental launch known as Gemini 2.0 Flash. The mannequin household can generate textual content, photos, and speech whereas processing a number of forms of enter together with textual content, photos, audio, and video. It is just like multimodal AI fashions like GPT-4o, which powers OpenAI’s ChatGPT.
“Gemini 2.0 Flash builds on the success of 1.5 Flash, our hottest mannequin but for builders, with enhanced efficiency at equally quick response instances,” mentioned Google in an announcement. “Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the pace.”
Gemini 2.0 Flash—which is the smallest mannequin of the two.0 household by way of parameter rely—launches in the present day by means of Google’s developer platforms like Gemini API, AI Studio, and Vertex AI. Nonetheless, its picture era and text-to-speech options stay restricted to early entry companions till January 2025. Google plans to combine the tech into merchandise like Android Studio, Chrome DevTools, and Firebase.
The corporate addressed potential misuse of generated content material by implementing SynthID watermarking know-how on all audio and pictures created by Gemini 2.0 Flash. This watermark seems in supported Google merchandise to establish AI-generated content material.
Google’s latest bulletins lean closely into the idea of agentic AI methods that may take motion for you. “During the last 12 months, we now have been investing in growing extra agentic fashions, that means they’ll perceive extra concerning the world round you, suppose a number of steps forward, and take motion in your behalf, along with your supervision,” mentioned Google CEO Sundar Pichai in an announcement. “At present we’re excited to launch our subsequent period of fashions constructed for this new agentic period.”