Google Unveils Gemini 1.5 Pro With 1 Million Token Context Window
Google announced the public preview of Gemini 1.5 Pro on February 15, 2024, featuring a groundbreaking 1 million token context window. This significant expansion allows the model to process and reason over substantially larger amounts of information, including entire codebases, multiple lengthy documents, or up to an hour of video. Previously, Gemini 1.5 Pro was available with a 128,000 token context window in a limited preview.
The enhanced context window is a key differentiator, enabling developers to build more sophisticated applications that require understanding and synthesizing information from extensive inputs. For instance, users can input a 1,500-page PDF document, a 400-page book, or 11 hours of audio and expect Gemini 1.5 Pro to analyze it effectively. This capability is particularly beneficial for tasks such as summarizing complex legal documents, analyzing long-form research papers, or understanding intricate code structures.
In addition to the expanded context window, Gemini 1.5 Pro also introduces a new "Mixture-of-Experts" (MoE) architecture. This architectural change allows the model to be more efficient by activating only specific parts of its neural network for any given task, leading to faster inference times and reduced computational costs. Google stated that this MoE architecture is a significant step towards more efficient and scalable large language models.
The model's multimodal capabilities have also been improved, allowing it to process and understand various data formats, including text, images, audio, and video, within its large context window. This integrated approach to multimodal reasoning is expected to unlock new possibilities for AI-powered tools and services across different industries. Developers can access Gemini 1.5 Pro via the Google AI Studio and Vertex AI platforms.
Original source — read the full reporting at the publisher:
Read on The Economist