Google Unleashes Gemini - Transforming Bard into a Multimodal AI Powerhouse

Google has revealed that Google Bard, the advanced generative AI chatbot and a contender to ChatGPT, will now be powered by Gemini, its latest and most advanced AI language model.

Gemini comes in three distinct versions tailored for various applications. The Nano variant is designed for native offline use on Android devices, providing enhanced accessibility and functionality. The Pro iteration is positioned to power numerous Google AI services, serving as the foundational technology for the revamped chatbot Bard. Meanwhile, the Ultra version, the most robust among the three, is slated for launch in the coming year, with a focus on catering to data centers and enterprise applications.

“This is the biggest single quality improvement of Bard since we’ve launched. Gemini Pro will first power text-based prompts in Bard to start, but it will expand to multimodal support — meaning texts and images or other modalities — in the coming months, “ said Sissie Hsiao, VP and GM of Assistant and Bard at Google.

Google's Gemini is believed to be the answer to OpenAI's GPT-4 release a couple of months back.

Understanding both video and audio content

One of Gemini's key strengths lies in its remarkable ability to comprehend and interact with both video and audio content. Sundar Pichai envisions Gemini's integration across various Google products, including search, ad products, and the Chrome browser, thereby influencing a wide spectrum of user experiences.

Although Gemini is initially available in English, Google has plans to expand language support, demonstrating its commitment to a global user base. Starting on December 13th, developers and enterprise users can harness the capabilities of Gemini Pro through Google Generative AI Studio or Vertex AI in Google Cloud, ushering in a new era of AI-driven applications and services.

Beyond its multifaceted applications, Gemini boasts improved efficiency, surpassing its predecessors in terms of speed and cost-effectiveness. It was trained on Google's Tensor Processing Units, reflecting a commitment to technological advancement and resource optimization.

The improvements brought by Gemini aim to enhance Bard's capabilities in understanding and summarizing content, reasoning, brainstorming, writing, and planning. Initially, Gemini Pro will power text-based prompts in Bard, with plans to expand to multimodal support, incorporating texts and images or other modalities in the coming months.

Further plans for Bard

Bard Advanced will be introduced in 2024, featuring Gemini's most advanced model, Ultra. This upgrade will enable Bard to understand and act on diverse types of information, including text, images, audio, video, and code, with multimodal reasoning capabilities. Gemini Ultra can also comprehend, explain, and generate high-quality code in popular programming languages, in addition to understanding audio and video content.

Google will initiate a tester program for Bard Advanced before a broader user release early next year, subjecting it to additional safety checks. This update follows several enhancements to Bard since its debut eight months ago, with recent features including the ability to answer questions about YouTube videos, integration with Google apps like Gmail, Docs, Drive, and more, and the capability to double-check answers to prevent AI "hallucination" based on false information.

Google Unleashes Gemini – Transforming Bard into a Multimodal AI Powerhouse

Understanding both video and audio content

Further plans for Bard