Our new AI model for the Agentic era

[ad_1]

Last year we continued to make incredible progress in artificial intelligence. Today we are launching the first model in the family of Gemini 2.0 models: an experimental version of Gemini 2.0 flash. It is our battle horse model with low latency and improved performance in the avant -garde of our technology on scale.

We are also sharing the boundaries of our Agês research, displaying prototypes activated by the native multimodal resources of GEMINI 2.0.

Gemini 2.0 Flash

The Gemini 2.0 Flash is based on 1.5 flash success, our most popular model for developers, with performance enhanced at equally fast response times. Notably, Flash 2.0 exceeds 1.5 Pro in -chave benchmarks twice at speed. 2.0 Flash also comes with new features. In addition to supporting multimodal inputs such as images, video and audio, 2.0 Flash now supports multimodal outlet Generated images mixed with text and multilingual audio of speech text (TTS). It can also call natively, such as Google search, code execution and functions defined by the third party user.

Our goal is to put our models in the hands of people safely and quickly. Last month, we share early experimental versions of Gemini 2.0, receiving great feedback from developers.

Gemini 2.0 Flash is now available as an experimental model for developers through the Gemini API in Google Ai Studio and Vertex there With multimodal input and text output available to all developers and text generation and speech and generation of native images available for early access partners. The overall availability will follow in January, along with more model sizes.

To help developers create dynamic and interactive applications, we are also launching a new multimodal API with real -time audio input, video input, and the ability to use multiple combined tools. More information about 2.0 flash and the Multimodal Live API can be found in our Developer blog.

Gemini 2.0 available in the Gemini application, our AI assistant

Also starting today, Twins Users globally can access an optimized version of experimental 2.0 flash chat by selecting it from the desktop and mobile web suspended menu and will be available in the Mobile Gemini app soon. With this new model, users can try an even more useful Gemini assistant.

Earlier next year, we will expand the Gemini 2.0 to more Google products.

Unlocking Agência Experiences with Gemini 2.0

The action capabilities of the native Gemini 2.0 Flash user interface, along with other improvements, such as multimodal reasoning, long -term context understanding, complex instruction and planning and planning, called compositional function, use of native tools and enhanced latency, all work together to enable a new class of agency experiences.

The practical application of AI agents is a research area full of exciting possibilities. We are exploring this new border with a series of prototypes that can help people perform tasks and do things. This includes an update to the Astra project, our research prototype exploring future resources of a universal AI assistant; The new Mariner project, which explores the future of human-agent interaction, starting with its navigator; And Jules, an AI code agent that can help developers.

We are still in the early stages of development, but we are excited to see how reliable testers use these new features and what lessons we can learn so that we can make them more widely available on products in the future.

https://www.youtube.com/watch?v=FS0T6SDODD8

Astra Project: Agents using multimodal understanding in the real world

Since we introduced Astra Project At I/O, we learn from reliable testers using android phones. His valuable feedback helped us better understand how a universal AI assistant could work in practice, including implications for safety and ethics. Improvements in the latest version built with Gemini 2.0 include:

New use of the tool: With Gemini 2.0, Project Astra can use Google search, lenses and maps, making it more useful as an assistant in your daily life.

Best Memory: We have improved Project Astra’s ability to remember things, keeping you in control. Now it has up to 10 minutes of memory at the session and may remember more conversations you had with her in the past, so it’s better personalized for you.

We are working to bring these types of capabilities to Google products as Twins App, our AI assistant and other forming factors such as glasses. And we are starting to expand our reliable testers program to more people, including a small group that will soon begin to test Project Astra in prototype glasses.

https://www.youtube.com/watch?v=hiiljt8jeri

Mariner Project: Agents that can help you perform complex tasks

Project Mariner is an initial prototype of research built with Gemini 2.0 that explores the future of human-agent interaction, starting with your browser. As a research prototype, he is able to understand and reason information on the browser screen, including pixels and web elements such as text, code, images and forms, and then uses this information through an experimental extension of Chrome to complete tasks for you.

When evaluated against the WebVoyager Benchmarkthat tests agent’s performance in the tasks of the end to end, the Mariner project reached a state -of -the -art result of 83.5% working as a single agent configuration.

It’s still early, but Project Mariner shows that it is becoming technically possible to navigate inside a browser, even if it is not always necessary and slow to complete tasks today, which will quickly improve over time.

To build this safely and responsibly, we are conducting active research on new types of risks and mitigations, keeping humans in circuit. For example, Project Mariner can only type, roll or click on the active tab on your browser and asks users to be final confirmation before taking certain confidential actions, such as buying something.

Trusted testers are starting to test Project Mariner using an experimental Chrome extension now and we are starting conversations with the web ecosystem in parallel.

https://www.youtube.com/watch?v=2xjqlpqhtyo

Jules: agents for developers

Then we are exploring how AI agents can help developers with AI-AI Code Code agent who is directly integrated into a Github workflow. It can solve a problem, develop a plan and execute it, all under the direction and supervision of a developer. This effort is part of our long -term goal of building AI agents that are useful in all domains, including coding.

More information about this in progress experiment can be found in our Developer Blog Posting.

Agents in games and other domains

Google Deepmind has a far away history The use of games to help IA models become better following rules, planning and logic. Last week, for example, we present Genie 2Our AI model that can create an endless variety of playable 3D worlds- all from a single image. Based on this tradition, we build agents using Gemini 2.0 that can help you navigate the virtual world of video games. You can reason about the game -based game only on the screen and offer suggestions for what to do next in real -time conversations.

We are collaborating with major game developers like Supercell to explore how these agents work, testing their ability to interpret rules and challenges in a diversified range of games, from strategy titles such as “Clash of Clans” to agriculture simulators as “Hay Day”.

In addition to acting as virtual game mates, these agents may even explore Google Search to connect it with the wealth of web game knowledge.

https://www.youtube.com/watch?v=IKUNNHJBGSC

In addition to exploring the agency capabilities in the virtual world, we are experiencing agents that can help in the physical world by applying Gemini 2.0 spatial reasoning capabilities to robotics. Although it is still early, we are excited about the potential of agents that can help in the physical environment.

You can learn more about these prototypes and research experiments in Labs.google.

Building responsibly in the agentic era

Gemini 2.0 Flash and our research prototypes allow us to test and iterate in new features at AI survey avant -garde that will eventually make Google products more useful.

As we develop these new technologies, we recognize the responsibility it implies and the many questions of AI agents open for security. This is why we are adopting an exploratory and gradual approach to development, conducting research on various prototypes, implementing safety training, working with reliable testers and external experts and performing extensive risk assessments and safety and warranty assessments.

For example:

As part of our safety process, we work with our Responsibility and Safety Committee (RSC), our longtime internal review group, to identify and understand potential risks.
Gemini 2.0’s reasoning resources have allowed great advances in our AI-watched red team approach, including the ability to go beyond simply to automatically detect risks assessments and training data to mitigate them. This means that we can more efficiently optimize the safety model on scale.
As the multimodality of Gemini 2.0 increases the complexity of possible outputs, we will continue to evaluate and train the model in image and audio input and output to help improve safety.
With Project Astra, we are exploring possible mitigations against users, involuntarily sharing confidential information with the agent, and we have already built privacy controls that facilitate the exclusion of sessions. We also continue to research ways to ensure that AI agents act as reliable sources of information and do not take unintentional actions on their behalf.
With Project Mariner, we are working to ensure that the model learn to prioritize user instructions during immediate injection part attempts so that it can identify potentially malicious instructions from external sources and prevent misuse. This prevents users from being exposed to attempts at fraud and phishing through malicious instructions hidden in and emails, documents or websites.

We firmly believe that the only way to build was being responsible from the beginning and we will continue to prioritize the safety and responsibility of a key element of our models development process as we advance to our models and agents.

Gemini 2.0, AI agents and beyond

Today The releases mark a new chapter for our Gemini model. With the launch of Gemini 2.0 Flash and the series of research prototypes that exploit possibilities of agency, we have reached an exciting milestone in the Gemini era. And we are looking forward to continuing to safely exploit all the new possibilities within reach as we built for AGI.

By Demis Hassabis, CEO of Google Deepmind and Kooray Kavukcuoglu, Google Deepmind Cto on behalf of the Gemini team

[ad_2]

Source link