It seems everyone is talking about artificial intelligence and wondering how it will affect the world around us. Here, you can find out more about Google’s new Gemini AI system and read about Invertase’s new extensions.
OpenAI’s ChatGPT chatbot launched on November 30th 2022 and provided the mass market with a glimpse of AI’s potential, even if many considered this system (based on an extensive language model) more of a gimmick than ‘real’ AI.
Nonetheless, Microsoft joined the AI chatbot scene in February 2023, launching Copilot as a bolt-on feature of its Bing search engine and Edge web browser. All the while, the world held its collective breath and waited to see Google’s take on artificial intelligence.
That came on March 21st 2023 in the form of Bard – a ‘conversational AI’ initially powered by Google’s LaMDA (Language Model for Dialogue Applications). Wasting no time, in May 2023 Google introduced PaLM 2 (Pathways Language Model) for Bard, bringing advancements in reasoning, coding and mathematics, translations and more.
Introducing Google Gemini
And then the big one – Google’s DeepMind artificial intelligence laboratory (based in London, UK) announced Gemini on December 6th 2023, hailing it “the most significant quality improvement Bard has seen since its launch.” Essentially one family of multimodal large language models (LLMs), Gemini can not only understand, generate and translate text, it can analyse and create code, interpret images and video, and process audio.
On February 8th 2024, Google announced that it was dropping the Bard name and running with simply ‘Gemini’. Google also revealed details of a new mobile experience for Gemini and Gemini Advanced with the launch of new Android and iOS apps.
Gemini Ultra, Pro and Nano
Gemini was launched in three forms: Ultra, Pro and Nano. Gemini Ultra is Google’s most capable and largest model for highly complex tasks. Gemini Pro is pitched as Google’s best model for scaling across a wide range of tasks. And finally, Gemini Nano is billed as Google’s most efficient model for on-device tasks.
How to access Google Gemini
Google’s Gemini AI (previously called Bard), which is widely available to the public, started using a version of Gemini Pro for its English language queries in December 2023. While initially limited to text-based input, it has since gained the ability to access and process information from the real world and engage in multi-turn conversations.
From February 8th, Gemini Advanced was immediately available in more than 150 countries and territories in English, with more languages to follow. Gemini Advanced is part of a brand new Google One AI Premium Plan costing $19.99 (£18.99) per month, starting with a two-month free trial.
How does Gemini compare to ChatGPT?
ChatGPT-4 is the latest system from OpenAI. In all but a few instances, Google reckons its Gemini Ultra AI model outperforms its rival. “From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development,” reports Google.
“With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.”
Google also states that “Gemini Ultra also achieves a state-of-the-art score of 59.4% on the new MMMU (Massive Multi-discipline Multimodal Understanding) benchmark, which consists of multimodal tasks spanning different domains requiring deliberate reasoning.”
How might Google Gemini AI be useful to businesses?
Wondering how Google Gemini AI can improve your business? Here are three ways it can give your operation a boost…
1. Enhanced Customer Experience.
Personalised interactions: Gemini’s sophisticated language understanding can power chatbots and virtual assistants that provide genuinely personalised experiences for customers. Imagine an AI assistant that not only answers questions but also recommends products based on a customer’s past purchases and preferences, building relationships and boosting loyalty.
Conversational content creation: Gemini’s expertise in text generation can be combined with Gemini’s reasoning to create engaging and informative content for customer interactions. Think personalised FAQ sections, tailored product descriptions, or even AI-powered blog posts that address specific customer concerns, all in a natural and conversational style.
2. Streamlined Internal Operations:
Automated data analysis and reporting: Gemini’s ability to process and understand complex information can be harnessed to analyse vast amounts of internal data. Imagine automatically generating reports on employee performance, identifying potential supply chain bottlenecks, or summarising customer feedback, freeing up human resources for higher-level tasks.
Content curation and summarisation: Gemini’s text generation capabilities can be used to summarise lengthy reports, extract key information from various sources, or even create training materials based on existing documentation. This saves employees time and ensures everyone has access to the most relevant information.
3. Innovation and Product Development:
Brainstorming and Idea Generation: Gemini can be used as brainstorming partners, generating new ideas, exploring different possibilities, and identifying potential solutions to challenges. This can be invaluable for product development teams seeking fresh perspectives and innovative approaches.
Rapid Prototyping and Testing: Gemini ability to generate different text formats can be used to create rough prototypes for marketing materials, user interfaces, or even product features. This allows for rapid testing and iteration, accelerating the development process and ensuring products meet user needs.
These are just a few examples of how Google’s Gemini AI can benefit businesses across various industries. As the technology continues to evolve, we can expect even more innovative and transformative applications to emerge, revolutionising the way businesses operate and interact with their customers.
Invertase hands-on
Leveraging its experience with Google’s PaLM API extensions, Invertase eagerly adopted Gemini for its advanced capabilities. The team couldn’t wait to get started with Gemini and take advantage of its groundbreaking features.
What are Firebase Extensions?
First, a quick lesson on extensions and specifically Firebase Extensions. Put simply, Firebase Extensions are pre-built solutions, often created by Google and its partners, that simplify and expedite the development process. Firebase Extensions can save you the hours and effort of implementing your own custom code and logic. Installation and configuration are all possible through UI.
What are the new extensions?
Chatbot with Gemini
With the success of Chatbot with PaLM, Firebase with the help of Invertase have built its successor. This extension allows developers to instantly deploy a chatbot, with chat state and history managed through a Cloud Firestore collection. Wildcards in the specified collection path make supporting multiple chat sessions just as simple.
Both providers (Google AI and Vertex AI) of Gemini are supported by the extension. For Google AI you will have to provide a valid API key during the extension installation, which is kept securely as a GCP secret. If Vertex AI is selected instead, the extension’s service account is granted the aiplatform.user
role during installation, and no API key is required.
The extension is also configurable, allowing you to preface prompts with custom context, per instance of the extension or per chat instance/session. With Gemini’s 32k token context window, this allows developers to extensively customise the chatbot they provide to their users. You can also configure the sampling and response parameters provided to Gemini. Parameters like the nucleus sampling threshold (“top p”) give you more control over the vocabulary that Gemini will sample from, and temperature controls the randomness and diversity of the output, as two examples.
Multimodal Tasks with Gemini
The Multimodal Tasks with Gemini extension watches a Cloud Firestore collection. When a document is written, the extension extracts pre-configured fields from the document and substitutes them into a pre-configured Handlebars prompt template. As an example, the prompt could be Write me a short story about {{ subject }}
. The extension will extract the subject field from the incoming document and insert it into the prompt for Gemini to generate a response.
The most exciting feature of this extension is how it leverages Gemini Pro Vision and its multimodal prompt support. For example, suppose we have a Cloud Storage bucket with movie posters in it. We could give the extension the following prompt:Answer the following question about this movie: {{ question }}
When a document is written to the specified collection (or collection group) with a question field and an image field containing a cloud storage link to an image, Gemini will be prompted with both the completed prompt and the image pulled from storage, and the generated response will be written back to Firestore.
Summary
In conclusion, Gemini represents a significant leap in AI development, offering a versatile and powerful platform for creating sophisticated, multi-modal applications. Whether for small-scale experiments or large-scale deployments, these new Firebase Extensions deliver a flexible, time-efficient, and innovative approach to building apps which leverage Generative AI. Chatbot with Gemini and Multimodal Tasks with Gemini not only speed up the development process, but open up new horizons for creative and practical applications, helping make Gemini an indispensable tool in the evolving landscape of AI technology.