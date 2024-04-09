According to the announcement at the Google Next event, from now on Gemini 1.5 Pro will hear its users — in practice, the model will be able to listen to uploaded audio files or extract information from calls and videos without the need to download transcripts.

Gemini 1.5 Pro itself was first launched in February and is currently Google’s most powerful language model (surpassing Gemini Ultra in terms of performance). Undoubtedly, its main feature is the amount of context that the model can process: from 128,000 to 1 million tokens. One million tokens is equivalent to about 700,000 words or about 30,000 lines of code — it is about four times more data than the Anthropic’s flagship model, Claude 3and about eight times more than GPT-4 Turbo max by OpenAI.

Gemini 1.5 Pro will be available in preview on Vertex AI — a new platform where Google’s business customers can create their own chatbots.

The Imagen 2 text-to-picture model has also been updated — and now offers the «inpainting» and «outpainting» functions, which allow users to add or remove elements from images. All images generated by the neural network can also receive the SynthID mark — an invisible watermark that indicates the origin of the image.

Source: The Verge, Techcrunch