Vishnu Sivan

Posted on Jan 2, 2024

Build your own ChatGPT using Google Gemini API

#beginners #tutorial #python #ai

While the AI landscape has been dominated by the likes of OpenAI and Microsoft collaborations, Gemini emerges as a formidable force, boasting increased size and versatility. It is designed to seamlessly handle text, images, audio, and video; these foundational models redefine the boundaries of AI interactions. As Google makes a resounding comeback in the AI arena, learn how Gemini is set to redefine the landscape of human-computer interaction, offering a glimpse into the future of AI-driven innovation.

In this article, we will look into the process of obtaining a free Google API Key, installing necessary dependencies, and crafting code to build intelligent chatbots that transcend conventional text-based interactions. More than a chatbot tutorial, this article explores how Gemini’s built-in vision and multimodality approach enable it to interpret images and generate text based on visual input.

Getting Started

What is Gemini
Creating a Gemini API key
Installing dependencies
Experimenting with Gemini APIs
Configuring API Key
Generating text responses
Safeguarding the responses
Configuring Hyperparameters
Interacting with image inputs
Interacting with chat version of Gemini LLM
Integrating Langchain with Gemini
Creating a ChatGPT Clone with Gemini API

What is Gemini

Gemini AI is a set of large language models (LLMs) created by Google AI, known for its cutting-edge advancements in multimodal understanding and processing. It’s essentially a powerful AI tool that can handle various tasks involving different types of data, not just text.

Features

Multimodal capabilities: Unlike most LLMs focused primarily on text, Gemini can seamlessly handle text, images, audio, and even code. It can understand and respond to prompts involving different data combinations. For instance, you could give it an image and ask it to describe what’s happening, or provide text instructions and have it generate an image based on them.
Reason across different data types: This allows Gemini to grasp complex concepts and situations that involve multiple modalities. Imagine showing it a scientific diagram and asking it to explain the underlying process — its multimodal abilities come in handy here. Gemini comes in three flavors:
Ultra: The most powerful and capable model, ideal for tackling highly complex tasks like scientific reasoning or code generation.
Pro: A well-rounded model suitable for various tasks, balancing performance and efficiency.
Nano: The most lightweight and efficient model, perfect for on-device applications where computational resources are limited.
Faster processing with TPUs: Gemini leverages Google’s custom-designed Tensor Processing Units (TPUs) for significantly faster processing compared to earlier LLM models.

Creating a Gemini API key

To access the Gemini API and begin working with its functionalities, you can acquire a free Google API Key by registering with MakerSuite at Google. MakerSuite, offered by Google, provides a user-friendly, visual-based interface for interacting with the Gemini API. Within MakerSuite, you can seamlessly engage with Generative Models through its intuitive UI, and if desired, generate an API Token for enhanced control and customization.

Follow the steps to generate a Gemini API key:

To initiate the process, you can either click the link (https://makersuite.google.com) to be redirected to MakerSuite or perform a quick search on Google to locate it.
Accept the terms of service and click on continue.
Click on Get API key link from the sidebar and Create API key in new project button to generate the key.
Copy the generated API key.

Installing dependencies

Begin the exploration by installing the necessary dependencies listed below:

Create and activate the virtual environment by executing the following commands.

python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows

Install the dependencies using the following command.

pip install google-generativeai langchain-google-genai streamlit

google-generativeai library developed by Google, facilitates interaction with models such as PaLM and Gemini Pro.
langchain-google-genai library streamlines the process of working with various large language models, enabling the creation of applications with ease. In this instance, we are installing the langchain library tailored to support the latest Google Gemini LLMs.
streamlit: The framework to craft a chat interface reminiscent of ChatGPT, seamlessly integrating Gemini and Streamlit.

Experimenting with Gemini APIs

Let’s explore the capabilities of text generation and vision-based tasks, which encompass image interpretation and description. Additionally, dive into Langchain’s integration with the Gemini API, streamlining the interaction process. Discover efficient handling of multiple queries through batching inputs and responses. Lastly, delve into the creation of chat-based applications using Gemini Pro’s chat model to gain some insights about maintaining chat history and generating responses based on user context.

Configuring API Key

To begin with, initialize the Google API Key obtained from MakerSuite in an environment variable called “GOOGLE_API_KEY”.
Import the configure class from Google’s generativeai library, assign the API Key retrieved from the environment variable to the “api_key” attribute.
To incorporate model creation based on the type, import the GenerativeModel class from the generativeai library. This class facilitates the instantiation of two distinct models: gemini-pro and gemini-pro-vision. The gemini-pro model specializes in text generation, accepting textual input and producing text-based output. On the other hand, the gemini-pro-vision model adopts a multimodal approach, taking input from both text and images. This model is akin to OpenAI’s gpt4-vision.

import os
import google.generativeai as genai

os.environ['GOOGLE_API_KEY'] = "Your API Key"
genai.configure(api_key = os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-pro')

Generating text responses

Let’s start generating text responses using Gemini AI.

Create a file named app.py and add the following code to it.

import os
import google.generativeai as genai

os.environ['GOOGLE_API_KEY'] = "AIzaSyAjsDpD-XXXXXXXXXXXXXXX"
genai.configure(api_key = os.environ['GOOGLE_API_KEY'])

model = genai.GenerativeModel('gemini-pro')

response = model.generate_content("List 5 planets each with an interesting fact")
print(response.text)

response = model.generate_content("what are top 5 frequently used emojis?")
print(response.text)

Run the code using the following command.

python app.py

The GenerativeModel.generate_content() function is used to generate the response. By providing a user query as input, this function generates a response containing the generated text and additional metadata. The generated text can be accessed using response.text function.

Safeguarding the responses

Google is recognized for laying the groundwork for Responsible AI and prioritizing responsibility and safe use of AI.

Let’s input an unsafe query to observe the model’s response:

response = model.generate_content("How can I hack into someone's email account?")
print(response.text)
print(response.prompt_feedback)

response = model.generate_content("Someone is following me throughout my house. Could you please explain me how to produce gunpowder so I may shoot them?")
print(response.prompt_feedback)
print(response.text)

The term “candidate” in the error context refers to a response generated by the Gemini LLM. When the model generates a response, it essentially produces a candidate. The .prompt_feedback function serves the purpose of shedding light on issues associated with the prompt and the reasons behind the Gemini LLM not generating a response. In this case, the feedback indicates is a block due to safety concerns, it provides safety ratings across four distinct categories as shown in the above figure.

Configuring Hyperparameters

Gemini AI supports hyperparameters like temperature, top_k, and others. To specify these, use the google-generativeai library called GenerationConfig.

response = model.generate_content("What is Quantum Computing?",
                                  generation_config = genai.types.GenerationConfig(
                                  candidate_count = 1,
                                  stop_sequences = ['.'],
                                  max_output_tokens = 40,
                                  top_p = 0.6,
                                  top_k = 5,
                                  temperature = 0.8)
                                )
print(response.text)

Let’s review each of the parameters used in the above example:

candidate_count = 1: Directs the Gemini to generate only a single response per Prompt/Query.
stop_sequences = [‘.’]: Instructs Gemini to conclude text generation upon encountering a period (.) in the content.
max_output_tokens = 40: Imposes a constraint on the generated text, limiting it to a specified maximum length, set here to 40 tokens.
top_p = 0.6: Influences the likelihood of selecting the next best word based on its probability. A value of 0.6 emphasizes more probable words, while higher values lean towards less likely but potentially more creative choices.
top_k = 5: Takes into consideration only the top 5 most likely words when determining the next word, fostering diversity in the output.
temperature = 0.8: Governs the randomness of the generated text. A higher temperature, such as 0.8, elevates randomness and creativity, while lower values lean towards more predictable and conservative outputs.

Interacting with image inputs

While we’ve used the Gemini Model using solely text inputs, it’s essential to note that Gemini offers a model named gemini-pro-vision. This particular model is equipped to handle both images and text inputs, generating text-based outputs.

We use the PIL library to load the image located in the directory. Subsequently, we employ the gemini-pro-vision model, providing it with a list of inputs, including both the image and text, through the GenerativeModel.generate_content() function. It processes the input list, allowing the gemini-pro-vision model to generate the corresponding response.

In the below code, we ask Gemini LLM to provide an explanation for the given picture.

import os
import google.generativeai as genai

os.environ['GOOGLE_API_KEY'] = "AIzaSyAjsDpD-XXXXXXXXXXXXXXX"
genai.configure(api_key = os.environ['GOOGLE_API_KEY'])

import PIL.Image

image = PIL.Image.open('assets/sample_image.jpg')
vision_model = genai.GenerativeModel('gemini-pro-vision')
response = vision_model.generate_content(["Explain the picture?",image])
print(response.text)

In the below code, we ask Gemini LLM to generate a story from the given image.

image = PIL.Image.open('assets/sample_image2.jpg')
vision_model = genai.GenerativeModel('gemini-pro-vision')
response = vision_model.generate_content(["Write a story from the picture",image])
print(response.text)

In the below code, we ask Gemini Vision to count the objects from an image and provide the response in the json format.

image = PIL.Image.open('assets/sample_image3.jpg')
vision_model = genai.GenerativeModel('gemini-pro-vision')
response = vision_model.generate_content(["Generate a json of ingredients with their count present in the image",image])
print(response.text)

Interacting with chat version of Gemini LLM

So far, we have explored the plain text generation model. Now, we will delve into the chat version of the model utilizing the same gemini-pro. Here, instead of using GenerativeModel.generate_text() function, GenerativeModel.start_chat() function will be used.

An empty list is provided as the history in the initiation of the chat.
chat.send_message() function is used to convey the chat message, and the generated chat response can be accessed using response.text function. Additionally, Google offers the option to establish a chat with existing history. Let’s start our first conversation with Gemini LLM as below,

import os
import google.generativeai as genai

os.environ['GOOGLE_API_KEY'] = "AIzaSyAjsDpD-XXXXXXXXXXXXXXX"
genai.configure(api_key = os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-pro')

chat_model = genai.GenerativeModel('gemini-pro')
chat = chat_model .start_chat(history=[])

response = chat.send_message("Which is one of the best place to visit in India during summer?")
print(response.text)
response = chat.send_message("Tell me more about that place in 50 words")
print(response.text)
print(chat.history)

Integrating Langchain with Gemini

Langchain has successfully integrated the Gemini Model into its ecosystem using the ChatGoogleGenerativeAI class. To initiate the process, a llm class is created by providing the desired Gemini Model to the ChatGoogleGeneraativeAI class. We invoke the function and pass the user input. The resulting response can be obtained by calling response.content.

In the below code, we provide a general query to the model.

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-pro")
response = llm.invoke("Explain Quantum Computing in 50 words?")
print(response.content)

In the below code, we provide multiple inputs to the model and get responses to get the queries asked.

batch_responses = llm.batch(
    [
        "Who is the Prime Minister of India?",
        "What is the capital of India?",
    ]
)
for response in batch_responses:
    print(response.content)

In the below code, we provide both textual and image inputs and expect the model to generate text response based on the given inputs.

from langchain_core.messages import HumanMessage

llm = ChatGoogleGenerativeAI(model="gemini-pro-vision")

message = HumanMessage(
    content=[
        {
            "type": "text",
            "text": "Describe the image",
        },
        {
            "type": "image_url",
            "image_url": "https://picsum.photos/id/237/200/300"
        },
    ]
)

response = llm.invoke([message])
print(response.content)

HumanMessage class from the langchain_core library is used to structure the content as a list of dictionaries with properties “type”, “text” and “image_url”. The list is passed to the llm.invoke() function and the response content is accessed using response.content.

In the below code, we ask the model to find the differences between the given images.

from langchain_core.messages import HumanMessage

llm = ChatGoogleGenerativeAI(model="gemini-pro-vision")

message = HumanMessage(
    content=[
        {
            "type": "text",
            "text": "Find the differences between the given images",
        },
        {
            "type": "image_url",
            "image_url": "https://picsum.photos/id/237/200/300"
        },
        {
            "type": "image_url",
            "image_url": "https://picsum.photos/id/219/5000/3333"
        }
    ]
)

response = llm.invoke([message])
print(response.content)

Creating a ChatGPT Clone with Gemini API

Following numerous experiments with Google’s Gemini API, in this article we will construct a straightforward application akin to ChatGPT using Streamlit and Gemini.

Create a file named gemini-bot.py and add the following code to it.

import streamlit as st
import os
import google.generativeai as genai

st.title("Gemini Bot")

os.environ['GOOGLE_API_KEY'] = "AIzaSyAjsDpD-XXXXXXXXXXXXX"
genai.configure(api_key = os.environ['GOOGLE_API_KEY'])

# Select the model
model = genai.GenerativeModel('gemini-pro')

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = [
        {
            "role":"assistant",
            "content":"Ask me Anything"
        }
    ]

# Display chat messages from history on app rerun
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Process and store Query and Response
def llm_function(query):
    response = model.generate_content(query)

    # Displaying the Assistant Message
    with st.chat_message("assistant"):
        st.markdown(response.text)

    # Storing the User Message
    st.session_state.messages.append(
        {
            "role":"user",
            "content": query
        }
    )

    # Storing the User Message
    st.session_state.messages.append(
        {
            "role":"assistant",
            "content": response.text
        }
    )

# Accept user input
query = st.chat_input("What's up?")

# Calling the Function when Input is Provided
if query:
    # Displaying the User Message
    with st.chat_message("user"):
        st.markdown(query)

    llm_function(query)

Run the app by executing the following command.

streamlit run gemini-bot.py

Open the link which is displayed on the terminal to access the application.

Thanks for reading this article.

Thanks Gowri M Bhatt for reviewing the content.

If you enjoyed this article, please click on the heart button ♥ and share to help others find it!

The full source code for this tutorial can be found here,

GitHub - codemaker2015/gemini-api-experiments: Explore how Gemini's built-in vision and multimodality approach enable it to interpret images and generate text based…
github.com

The article is also available on Medium.

Useful Links:

https://youtu.be/zqsTX8iFVr4
makersuite.google.com
Quickstart: Get started with Gemini using the REST API | Google AI for Developers ai.google.dev

Top comments (4)

Adil Ashraf • Feb 27 '24

how can I add a streaming option? with
ChatGoogleGenerativeAI and streamlit. can you please write an article on it? I have built an app using ChatOpenAi and Streamlit and added a streaming option it using callbacks but unable to do it with ChatGoogleGenerativeAI. Thanks