DEV Community

Connie Leung for Google Developer Experts

Posted on • Originally published at blueskyconnie.com

[GDE] Mastering Live Sports Data with Gemini 3: URL Context, Grounding & Structured Output

Retrieving accurate, up-to-date sports statistics with LLMs is notoriously difficult due to hallucinations and outdated training data. In this post, I will explore retrieving the Premier League 2025/2026 Player statistics using the Gemini 3 Flash Preview model, URL Context, Grounding with Google Search, and structured output. I will describe lessons learned and how I extracted data from the official Premier League Player profile pages when available and used Google Search to find missing data on websites.


Introduction

Gemini 3 Flash Preview model supports structured output, URL Context, and Grounding with Google Search. When the response MIME type is application/json and the response JSON schema is provided, Gemini 3 is remarkably consistent at returning valid JSON objects. Moreover, Pydantic library has functions to convert a generic JSON object to a Pydantic model.

In this post, I walk through the code in the Colab notebook, demonstrate writing a prompt to use the provided URLs as the primary source. When the URL is invalid or the data is missing (e.g., net worth, preferred foot, and height), the Google Search tool is used. When an answer is generated for each field, I apply grounding to verify the model does not hallucinate and cites reputable sources.


Prerequisites

To run the provided Colab notebook, you will need:

  • Vertex AI in Express Mode: I utilize Gemini via Vertex AI due to regional availability (Hong Kong), but these features function identically in the public Gemini API. If you want to use Gemini in Vertex AI for free, you can sign up for Vertex AI in express mode using your Gmail account.
  • Google Cloud API Key: Properly configured within your environment.
  • Google Colab VS Code Extension - Visit Visual Studio Marketplace to install the extension to run the demo in the VS Code environment.
  • Python: Have Python 3.12+ and the google-genai SDK installed

Demo Overview

We will see how to force the model to stop guessing and start reading. The demo attempts to retrieve player statistics for the Premier League 2025/2026 season.

The Gemini 3 Flash Preview model is given some pages:

 url_list = [
    "https://www.premierleague.com/en/players/141746/bruno-fernandes/stats",
    "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "https://www.premierleague.com/en/players/97032/virgil-van-dijk/stats",
    "https://www.premierleague.com/en/players/244851/cole-palmer/stats"
]

urls = "\n".join(url_list)
Enter fullscreen mode Exit fullscreen mode

When the requested player is Bruno Fernandes, Erling Haaland, Virgil Van Dijk, or Cole Palmer, the model will execute the URL Context tool to find the facts from the above URLs. Otherwise, the model will execute the Google Search tool to find the facts from reputable sources.

Name verification is performed before data retrieval. If it is not a player or not a Premier League player in 2025/2026 season, the structured output assigns false to is_professional_player and an explanation to verification_status. When it is a Premier League player, the prompt instructs the model to return a JSON object containing name, net worth, is_professional_player, verification_status, height, shirt_number, preferred_foot, goals, goal_assists, appearances, and minutes_played. Then, the notebook converts the generic JSON object to a Pydantic PlayerStats model, and displays the value, source quotation, uri, and source type to ensure accuracy.


Architecture

High level architecture of Retrieving Premier League Player Stats with URL Context and Google Search Tools

The url context and Google Search tools enable the Gemini 3 Flash Preview model to extract the performance metrics of the Premier League 2025/2026 season from the Premier League Player profile pages, while the personal information such as net worth, preferred foot, and height, is searched from the web. Even though the model's internal data is outdated, it can access an external system to obtain up-to-date information.


Environment Variable

Copy .env.example to .env and replace the placeholder of GOOGLE_CLOUD_API_KEY with the API key.

GOOGLE_CLOUD_API_KEY=<GOOGLE CLOUD API KEY>
Enter fullscreen mode Exit fullscreen mode

The create_vertexai_client function constructs and returns a Gemini client. I use the client to call Gemini 3 Flash Preview model to retrieve data from the provided URLs, and fallback to the Google Search tool when URL is not provided or it does not provide the details after deep search.

The result of create_vertexai_client is assigned to a global client variable.

genai is a unified SDK for Gemini API and Gemini in Vertex AI. The vertexai=True indicates that Gemini in Vertex AI is being used in the Colab notebook.

from google import genai
from dotenv import load_dotenv
from google.genai import types
from pydantic import BaseModel, Field
from typing import Literal
import os

def create_vertexai_client():

    cloud_api_key = os.getenv("GOOGLE_CLOUD_API_KEY")
    if not cloud_api_key:
        raise ValueError("GOOGLE_CLOUD_API_KEY not found in .env file")

    # Configure the client with your API key
    client = genai.Client(
        vertexai=True, 
        api_key=cloud_api_key, 
    )

    return client
Enter fullscreen mode Exit fullscreen mode
load_dotenv()

client = create_vertexai_client()
Enter fullscreen mode Exit fullscreen mode

Installation

The demo uses the newer Google Gen AI SDK, so I install the google-genai library.

%pip install google-genai
%pip install dotenv
%pip install pydantic
Enter fullscreen mode Exit fullscreen mode

Lessons Learned

I am going to describe the challenges faced in this demo and how I tackled them one by one.

1. Lazy Behavior of Gemini 3 Flash Preview Model

Problem:

Gemini prioritized its internal training data (or generic search) over the specific url_context tool unless explicitly constrained. It relied on Google Search or internal knowledge to find facts for each field instead of the given Premier League Player profile pages.

Solution:

I added a DYNAMIC SOURCE IDENTIFICATION section to the prompt to always execute the URL Context tool when a player's profile page is available. When a player's profile page is not available, the Grounding with Google Search tool is used to find the facts for each field in the JSON object. Finally, I emphasized the priority of the official URL, web citations, and internal training data.

### **1. DYNAMIC SOURCE IDENTIFICATION**
1.  **IF a Premier League URL is provided:**
    *   You **MUST** execute the `url_context` tool first. This is your **Primary Source**.
2.  **IF NO URL is provided (or if the player is non-Premier League):**
    *   The **Web Citations** (Google Search results) become your **Primary Source**. 
3.  **PRIORITY:** Official URL > Web Citations > Internal Training Data.
Enter fullscreen mode Exit fullscreen mode

2. Chunk ID Hallucination and Grounding Metadata is a low confidence source of truth

Problem:

I wanted to verify that AI did not hallucinate and gave incorrect value to the fields in the JSON object. Therefore, I declared a Pydantic model, PlayerField, to store the chunk id that gave the answer to the field.

class PlayerField(BaseModel):
    value: str | int | float
    chunk_id: int = Field(..., description="Index of the chunk in the grounding metadata that provide the value")

class PlayerStats(BaseModel):
    name: str
    net_worth: PlayerField | None = Field(..., description="Net worth of the PL player.")
    is_professional_player: bool = Field(..., description="Must be True if found in PL records, False otherwise")
    verification_status: str = Field(..., description="Explanation of where the data was found or why it failed")
    height: PlayerField
    shirt_number: PlayerField
    preferred_foot: PlayerField
    goals: PlayerField
    goal_assists: PlayerField
    appearances: PlayerField
    minutes_played: PlayerField
Enter fullscreen mode Exit fullscreen mode

Moreover, I wrote functions to obtain the web and retrieved context URIs from the grounding metadata to verify the chunk id.

def get_citations(response: types.GenerateContentResponse) -> tuple[int, list[dict]]:

    citations: list[dict] = []
    if response.candidates is not None and len(response.candidates) > 0:
        candidate = response.candidates[0]
        if candidate.grounding_metadata:
            grounding_chunks = candidate.grounding_metadata.grounding_chunks or []
            num_chunks = len(grounding_chunks)
            for i, chunk in enumerate(grounding_chunks):
                if chunk and chunk.web and chunk.web.uri:
                    citations.append({
                        "chunk_id": i,
                        "uri": chunk.web.uri,
                        "title": chunk.web.title
                    })
                elif chunk and chunk.retrieved_context and chunk.retrieved_context.uri:
                    citations.append({
                        "chunk_id": i,
                        "uri": chunk.retrieved_context.uri,
                        "title": chunk.retrieved_context.text
                    })

    return num_chunks, citations

def print_citations_by_response(response: types.GenerateContentResponse):
    num_chunks, citations = get_citations(response)

    print("num_chunks ->", num_chunks)
    for i, citation in enumerate(citations):
        print(f"Citation {i}: {citation}")
Enter fullscreen mode Exit fullscreen mode

The chunk_id in the PlayerField was larger than num_chunks, which implied that the model hallucinated and created chunk_id that did not exist in the grounding metadata.

Bad Chunk ID

Number of chunks: 5. Chunk ID range from 0 to 4.

{
  "value":  "Right",
  "chunk_id": 10    (Incorrect chunk id)
}
Enter fullscreen mode Exit fullscreen mode

Good Chunk ID

Number of chunks: 5. Chunk ID range from 0 to 4.

{
  "value":  "Right",
  "chunk_id": 2    (Correct chunk id)
}
Enter fullscreen mode Exit fullscreen mode

Solution:

I considered the grounding metadata was an unreliable source of truth, and did not use it to verify chunk id. I modified PlayerField model, removed the chunk_id field, add added source_quote, uri and source_type fields.

This was an architectural shift: moving from relying on grounding metadata to asking the model to generate the citation inline as part of the uri field.

class PlayerField(BaseModel):
    value: str | int | float
    source_quote: str
    uri: str | None = Field(None, description="The EXACT, UNEDITED URL provided by the tool. Do not guess or shorten. None if the source is internal training data")
    source_type: Literal["GOOGLE_SEARCH", "URL_CONTEXT", "INTERNAL_KNOWLEDGE"] = Field(None, description="Categorize the source of this specific field. Must not be None when uri is not null.")
Enter fullscreen mode Exit fullscreen mode

This forces Gemini to read the pages to cite specific information rather than hallucinating an answer for each field. The uri field stored either one of the provided URLs or a URL of the Google Search result. If Gemini knew the fact and gave the answer, the uri field is None. When uri is None, the source_type must be INTERNAL_KNOWLEDGE. When uri contains vertexaisearch, the source_type is GOOGLE_SEARCH. Otherwise, the source_type is URL_CONTEXT.

3. Broken or Invalid URIs in the JSON object

Problem:

The uri in the JSON object was often a vertexaisearch proxy link or a hallucinated string, rather than the direct source URL. When uri is neither None nor one of the provided Premier League Player profile pages, it is a Google Search result. Since I am using Gemini in Vertex AI, the base URL for search results is https://vertexaisearch.cloud.google.com. If the URI has a different base URL, it is very likely broken or invalid.

Solution:

I added a URI EXTRACTION RULES (STRICT) section to specify the rules to determine the uri field in the PlayerField model.

### **4. URI EXTRACTION RULES (STRICT):**
1.  **NO GUESSING:** You are strictly forbidden from constructing, autocompleting, or guessing a URL based on the website name. 
2.  **LITERAL COPY:** You must copy the `uri` exactly as it appears in the search result that provided the `source_quote`. 
3.  **THE JOIN RULE:** Before finalizing the JSON, verify that the `source_quote` actually appears in the content/snippet associated with the `uri` you provided.
4.  **IF IN DOUBT:** If you found a fact in your training data but cannot find a specific, working URI for it in the search results, you MUST set `source_type` to `INTERNAL_KNOWLEDGE` and `uri` to `null`.
Enter fullscreen mode Exit fullscreen mode

Re-run the Colab notebook, and the uri contained vertexaisearch. When I pasted a vertexaisearch URI into the browser, it redirected me to the real page.

Pro Tip: The vertexaisearch.cloud.google.com links are specific to Grounding with Google Search in Vertex AI and act as a bridge to the original source.

This was the result of Erling Haaland (Incorrect Output before the next fix).

{
  "name": "Erling Haaland",
  "net_worth": {
    "value": "$80 million",
    "source_quote": "Haaland's net worth, which according to Forbes is $80m (£59.5m)",
    "uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwe5-b036A0QaSzU9G0uAgOVZs1OJ0NMe0MmnMy-wRh4Qh9PtR_PhO6UV4S4-6ZJXJL4cicMEt0CFm_hwEER-d5WI5uY3H6DySLRcQZOmbeXF0PQU68ny6xyWF-64jJ_2Jnht_hkr7Kk3FasVjBki_2Q-n8jvr6PIcYUkTrFyJa2wZjPt4jkkdrFDo4Y6A2_OahUrg7unsUbzWJBxPp6e52tzg9w==",
    "source_type": null
  },
  "is_professional_player": true,
  "verification_status": "Confirmed active for Manchester City in the Premier League for the 2025/26 season via official Premier League statistics.",
  "height": {
    "value": 1.95,
    "source_quote": "Height, 1.95 m (6 ft 5 in).",
    "uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHFUkt6uBYUgbC5fgMofMPjKps_9QszoJClzsHz1zkpovNWK1-OueQEj_2oFq--1dOaL7L4X9Aef7iLamfT1f4iN2aH3BxaYfJUznngJ2vXuawp-SAWl4x3R1nxHpiSCKF_pjnPg-rL",
    "source_type": null
  },
  "shirt_number": {
    "value": 9,
    "source_quote": "Man City• 9",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": null
  },
  "preferred_foot": {
    "value": "left",
    "source_quote": "Foot: left",
    "uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHf-kM-at4W05tsLLdo3dzthpWxntkWhnfwd4fWKBHxvl8qWHioz9jaDrI5ZkmhVlo71D7RQhctvDTXFqywdvl40_Q6CC2S24FujOcmdg1rhVGJiJudqtzVqLtpQ-kIWlJIkevddD68Fe40se8gRmljvgmqx4O5ATDve4F23gRmljvgmqx4O5ATDve4F23g==",
    "source_type": null
  },
  "goals": {
    "value": 20,
    "source_quote": "Goals 20",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": null
  },
  "goal_assists": {
    "value": 4,
    "source_quote": "Assists 4",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": null
  },
  "appearances": {
    "value": 22,
    "source_quote": "Appearances 22",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": null
  },
  "minutes_played": {
    "value": 1906,
    "source_quote": "Minutes Played 1,906",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": null
  }
}
Enter fullscreen mode Exit fullscreen mode

One of the redirected URLs led me to https://www.sofascore.com/football/player/erling-haaland/839956 to view the preferred foot of the football player.

4. Valid URI has null Source Type

Problem:

When the player is Erling Haaland, the source_type is null even though uri is https://www.premierleague.com/en/players/223094/erling-haaland/stats or valid Google Search results.

When the player is Kaoru Mitoma, the model worked correctly and assigned theGOOGLE_SEARCH to the source_type field of all the URIs.

It was because the model did not classify the source_type after successfully using the URL Context tool. Therefore, the source type is null for Erling Haaland.

When the player was Kaoru Mitoma, the model worked hard to find the results via Google Search and classified the source_type as GOOGLE_SEARCH.

Solution:

The solution was to provide instructions for categorizing the source_type field in the PlayerField model.

The prompt had a new MANDATORY SOURCE_TYPE CLASSIFICATION RULES section to derive the source_type field based on the uri field.

### **2. MANDATORY SOURCE_TYPE CLASSIFICATION RULES**
You are strictly forbidden from returning `null` for `source_type` if a `uri` is present.
*   **MATCHING RULE:** If the `uri` matches one of the URLs provided below, you MUST use "URL_CONTEXT".
*   **SEARCH RULE:** If the `uri` is a search result (e.g., Transfermarkt, Wikipedia, vertexaisearch links), you MUST use "GOOGLE_SEARCH".
*   **FALLBACK RULE:** If no tool found the data and you use internal memory, `uri` must be `null` and `source_type` must be "INTERNAL_KNOWLEDGE".
Enter fullscreen mode Exit fullscreen mode

Re-run the Colab notebook, and the source_type field was None when the uri field was None.

{
  "name": "Erling Haaland",
  "net_worth": {
    "value": "$80 million",
    "source_quote": "Haaland's net worth, which according to Forbes is $80m (£59.5m)",
    "uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwe5-b036A0QaSzU9G0uAgOVZs1OJ0NMe0MmnMy-wRh4Qh9PtR_PhO6UV4S4-6ZJXJL4cicMEt0CFm_hwEER-d5WI5uY3H6DySLRcQZOmbeXF0PQU68ny6xyWF-64jJ_2Jnht_hkr7Kk3FasVjBki_2Q-n8jvr6PIcYUkTrFyJa2wZjPt4jkkdrFDo4Y6A2_OahUrg7unsUbzWJBxPp6e52tzg9w==",
    "source_type": "GOOGLE_SEARCH"
  },
  "is_professional_player": true,
  "verification_status": "Confirmed active for Manchester City in the Premier League for the 2025/26 season via official Premier League statistics.",
  "height": {
    "value": 1.95,
    "source_quote": "Height, 1.95 m (6 ft 5 in).",
    "uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHFUkt6uBYUgbC5fgMofMPjKps_9QszoJClzsHz1zkpovNWK1-OueQEj_2oFq--1dOaL7L4X9Aef7iLamfT1f4iN2aH3BxaYfJUznngJ2vXuawp-SAWl4x3R1nxHpiSCKF_pjnPg-rL",
    "source_type": "GOOGLE_SEARCH"
  },
  "shirt_number": {
    "value": 9,
    "source_quote": "Man City• 9",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": "URL_CONTEXT"
  },
  "preferred_foot": {
    "value": "left",
    "source_quote": "Foot: left",
    "uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHf-kM-at4W05tsLLdo3dzthpWxntkWhnfwd4fWKBHxvl8qWHioz9jaDrI5ZkmhVlo71D7RQhctvDTXFqywdvl40_Q6CC2S24FujOcmdg1rhVGJiJudqtzVqLtpQ-kIWlJIkevddD68Fe40se8gRmljvgmqx4O5ATDve4F23gRmljvgmqx4O5ATDve4F23g==",
    "source_type": "GOOGLE_SEARCH"
  },
  "goals": {
    "value": 20,
    "source_quote": "Goals 20",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": "URL_CONTEXT"
  },
  "goal_assists": {
    "value": 4,
    "source_quote": "Assists 4",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": "URL_CONTEXT"
  },
  "appearances": {
    "value": 22,
    "source_quote": "Appearances 22",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": "URL_CONTEXT"
  },
  "minutes_played": {
    "value": 1906,
    "source_quote": "Minutes Played 1,906",
    "uri": "https://www.premierleague.com/en/players/223094/erling-haaland/stats",
    "source_type": "URL_CONTEXT"
  }
}
Enter fullscreen mode Exit fullscreen mode

The Final Prompt Template

This is the complete prompt after applying the lessons to it. It has the objective (Retrieve player statistics), instructions (execute URL context tool first, and Google search second), constraints (must be an active Premier League player in 2025/2026 season), context (the provided URLS and player name), and the output format (JSON object).

**OBJECTIVE:**
Search and identify the Premier League 2025/2026 Player Statistics of {player}.

---

### **1. DYNAMIC SOURCE IDENTIFICATION**
1.  **IF a Premier League URL is provided:**
    *   You **MUST** execute the `url_context` tool first. This is your **Primary Source**.
2.  **IF NO URL is provided (or if the player is non-PL):**
    *   The **Web Citations** (Google Search results) become your **Primary Source**. 
3.  **PRIORITY:** Official URL > Web Citations > Internal Training Data.

### **2. MANDATORY SOURCE_TYPE CLASSIFICATION RULES**
You are strictly forbidden from returning `null` for `source_type` if a `uri` is present.
*   **MATCHING RULE:** If the `uri` matches one of the URLs provided below, you MUST use "URL_CONTEXT".
*   **SEARCH RULE:** If the `uri` is a search result (e.g., Transfermarkt, Wikipedia, vertexaisearch links), you MUST use "GOOGLE_SEARCH".
*   **FALLBACK RULE:** If no tool found the data and you use internal memory, `uri` must be `null` and `source_type` must be "INTERNAL_KNOWLEDGE".

### **3. INACTIVE / NON-PROFESSIONAL PLAYER LOGIC**
If the player cannot be found in active professional records for the 2025/26 season:
*   `is_professional_player`: `false`.
*   **All Numeric Fields:** `{{ "value": 0, "source_quote": null, "uri": null, "source_type": null }}`.
*   **All String Fields:** `{{ "value": "n/a", "source_quote": null, "uri": null, "source_type": null }}`.
*   **Verification Status:** "Player not found in active professional databases."

### **4. URI EXTRACTION RULES (STRICT):**
1.  **NO GUESSING:** You are strictly forbidden from constructing, autocompleting, or guessing a URL based on the website name. 
2.  **LITERAL COPY:** You must copy the `uri` exactly as it appears in the search result that provided the `source_quote`. 
3.  **THE JOIN RULE:** Before finalizing the JSON, verify that the `source_quote` actually appears in the content/snippet associated with the `uri` you provided.
4.  **IF IN DOUBT:** If you found a fact in your training data but cannot find a specific, working URI for it in the search results, you MUST set `source_type` to `INTERNAL_KNOWLEDGE` and `uri` to `null`.

### **5. DATA VALIDATION & AUDIT**
*   **`net_worth`**: Must be a string (e.g., `100 million dollars`).
*   **`height`**: Must be a float (e.g., `1.85`).

### PROVIDED URLS:
{ urls }

### OUTPUT FORMAT:
Return a JSON object exactly as follows:
{{
    "name": "string",
    "net_worth": {{ "value": "string", "source_quote": "...", "uri": "...", "source_type": "Google Search" }},
    "is_professional_player": boolean,
    "verification_status": "Detailed confirmation of Premier League status for 2025/26",
    "height": {{ "value": float, "source_quote": "...", "uri": "...", "source_type": "URL Context" }},
    "shirt_number": {{ "value": int, "source_quote": "...", "uri": "...", "source_type": "URL Context" }},
    "preferred_foot": {{ "value": "string", "source_quote": "...", "uri": "...", "source_type": "URL Context" }},
    "goals": {{ "value": int, "source_quote": "...", "uri": "...", "source_type": "URL Context" }},
    "goal_assists": {{ "value": int, "source_quote": "...", "uri": "...", "source_type": "URL Context" }},
    "appearances": {{ "value": int, "source_quote": "...", "uri": "...", "source_type": "URL Context" }},
    "minutes_played": {{ "value": int, "source_quote": "...", "uri": "...", "source_type": "URL Context" }}
}}
Enter fullscreen mode Exit fullscreen mode

The above is a Python F-string, where the double braces {{ }} escape the JSON structures so that Python does not treat them as variables.

After formulating a prompt that generates structured output, I move on to defining Pydantic models that align with it.


Final Pydantic Models

The PlayerField model has value field that is either a string, an integer or a floating point number.

The PlayerStats model represents the structured output containing performance metrics and personal information such as net worth, height, and name. The is_professional_player and verification_status confirm whether or not the player is playing in the Premier League in the 2025/2026 season.

These field descriptions are passed to the model as part of the JSON schema and they act as "micro-prompts".

class PlayerField(BaseModel):
    value: str | int | float
    source_quote: str
    uri: str | None = Field(None, description="The EXACT, UNEDITED URL provided by the tool. Do not guess or shorten. None if the source is internal training data")
    source_type: Literal["GOOGLE_SEARCH", "URL_CONTEXT", "INTERNAL_KNOWLEDGE"] = Field(None, description="Categorize the source of this specific field. Must not be None when uri is not null.")

class PlayerStats(BaseModel):
    name: str
    net_worth: PlayerField = Field(..., description="Net worth of the PL player.")
    is_professional_player: bool = Field(..., description="Must be True if found in PL records, False otherwise")
    verification_status: str = Field(..., description="Explanation of where the data was found or why it failed")
    height: PlayerField
    shirt_number: PlayerField
    preferred_foot: PlayerField
    goals: PlayerField
    goal_assists: PlayerField
    appearances: PlayerField
    minutes_played: PlayerField
Enter fullscreen mode Exit fullscreen mode

Generate Structured Output with Tools

The get_player_stats function substitutes the urls and player variables of the prompt with the concatenated URLs and player name. Then, I pass the prompt to the Gemini 3 Flash Preview model to obtain a response.

The configuration specifies that the response_mime_type is application/json and the response_json_schema is the JSON schema of the PlayerStats model. The thinking level is set to High. The model also includes the URL Context tool to read Premier League Player pages and the Google Search tool to query for missing details on web pages.

ThinkingLevel.High also adds latency and cost. A high thinking level causes the model to take more reasoning steps and use more thinking tokens before generating the JSON object.

def get_player_stats(player: str) -> types.GenerateContentResponse:

    prompt = f"""<original prompt from the above section>"""

    response = client.models.generate_content(
        model='gemini-3-flash-preview',
        contents=types.Content(
            role="user",
            parts=[types.Part(text=prompt)]
        ),
        config=types.GenerateContentConfig(
            response_mime_type="application/json",
            response_json_schema=PlayerStats.model_json_schema(),
            thinking_config=types.ThinkingConfig(
                thinking_level=types.ThinkingLevel.HIGH,
            ),
            tools=[
                types.Tool(url_context=types.UrlContext()),
                types.Tool(google_search=types.GoogleSearch()),
            ]
        )
    )

    return response
Enter fullscreen mode Exit fullscreen mode

Get the Results

The model is very good at returning JSON object and it can be found in response.parsed. Then, I use PlayerStats.model_validate(...) to parse the response.parsed to player_stats instance.

When response_mime_type="application/json" is specified, the parsed field is the expected path, and parsing text is a fallback for edge cases.

If response.parsed is None, the fallback is to parse the response.text with PlayerStats.model_validate_json(...). Gemini in Vertex AI wraps the text response with Markdown json code block, so the clean_json_string removes the enclosed code block to ensure proper parsing.

def clean_json_string(raw_string):
    # Remove the markdown code blocks
    clean_str = raw_string.strip()
    if clean_str.startswith("<JSON code block>"):
        clean_str = clean_str[7:]
    if clean_str.endswith("<JSON code block>"):
        clean_str = clean_str[:-3]
    return clean_str.strip()

def print_player_stats(response: types.GenerateContentResponse):
    if response.parsed:
        player_stats = PlayerStats.model_validate(response.parsed)
    else:
        player_stats = PlayerStats.model_validate_json(clean_json_string(response.text))

    print(player_stats.model_dump_json(indent=2))
Enter fullscreen mode Exit fullscreen mode

When get_player_stats function is called with Erling Haaland, the model triggers the URL Context tool first, and then the Google Search tool. In contrast, the model triggers the Google Search tool when the player is Kaoru Mitoma and his profile page is not included in the provided URL list.

response = get_player_stats(player="Erling Haaland")
print_player_stats(response=response)

response = get_player_stats(player="Kaoru Mitoma")
print_player_stats(response)
Enter fullscreen mode Exit fullscreen mode

Conclusion

I provided the URL Context and Google Search tools to allow Gemini 3 to interact with external knowledge. The internal knowledge of the Gemini 3 Flash Preview model does not have the latest player statistics of the Premier League 2025/2026 season, so it must call tools to find the statistics from the official Premier League player profile pages, and the missing net worth, height, and preferred foot from other sports pages.

I hope you enjoy the content and appreciate the ease of use of the Gemini API.

Thank you.


Resources

  1. GitHub Example: Retrieve Premier League Player Stats Colab.
  2. Use VertexAI for free: Vertex AI in express mode.
  3. Run the demo in VS Code: Google Colab VS Code Extension

Top comments (0)