Welcome!
Hello and welcome to Part 2 of this miniseries. In the first part I covered:
- Our primary goal: to give my coding agent (I use Gemini CLI) in-depth and up-to-date knowledge about the Google Agent Development Kit. (But it could be any repo or folder.)
- How
llms.txtis a great standard for allowing LLMs (like Gemini) to understand the structure of a folder or repo, and to help the LLM to immediately lookup the most appropriate documents to respond to your queries. - How we can use the free and open-source MCP LLMS-TXT Doc Server to provide an off-the-shelf MCP server to guide the LLM to read an
llms.txt, and use the links inside it it to find the most appropriate material. - How easy it is to integrate such an MCP server into your client tool, like Gemini CLI.
- How the ADK-Docs repo contains two sort-of
llmsfiles:llms.txtandllms-full.txt. But they do not align to thellms.txtstandard, and they are not well-suited for our goal.
What You Will Find in This Part
I built a multi-agent solution using the Google ADK, that will create the llms.txt file for any supplied folder or repo.
Here I’ll talk you through:
- How I designed it — the overall design, requirements, and design decisions.
- Setting up our development environment, including project folder structure,
TODO.md,pyproject.toml,.env, and aMakefilefor convenience. - Initial utility functions for configuration and logging. (We’ll cover the actual agent code in the next part.)
Let’s go!
Solution Design
I’ll start by providing a brief approximation of a solution architecture design document. (You know I love a good solution architecture doc!!) I’m a strong believer in writing an upfront design before jumping into building the solution — even if it’s just for a home project like this one!
Solution Goals
The goal of the LLMS-Generator is to create a llms.txt file for any given code repository or folder. The llms.txt file is designed to be easily parsable by Large Language Models (LLMs), providing them with a structured and summarised understanding of the repo’s content and layout. This enables more accurate and context-aware interactions with the codebase by AI agents; e.g. with Gemini CLI.
Functional Requirements
- The application must be initiated via a Command Line Interface (CLI).
- The user must provide the absolute path to the target repository.
- The user can optionally specify an output path for the generated
llms.txtfile. - The application must intelligently discover relevant files (e.g.
.md,.py) while ignoring irrelevant directories and files (e.g..git,__pycache__,.venv). - The application must generate a concise summary for each discovered file.
- The application must generate a high-level summary of the entire project/folder/repo.
- The application must construct the
llms.txtfile in the specified format, including the project and file summaries.
The llms.txt file will adhere to a specific format, including:
- A main header (
H1) with the project’s name. - A high-level overview of the project’s purpose.
- Sections (
H2) representing repository directories. A list of markdown links to files within each section, with a concise summary for each file.
The application must handle both local repositories (using relative file paths) and GitHub repositories (using full GitHub URLs).
The application’s behavior should be configurable through environment variables (e.g. log level, maximum number of files to process).
Quality Attributes / Architecturally Significant Requirements
- Concurrency: This is a developer-centric application. Initially it will run locally, and there is no need for concurrent use. This could be added later.
- Reliability: The application should be robust, with graceful error handling for issues like invalid paths or API failures. The system must be resilient to API rate limiting by implementing a retry mechanism.
- High availability and DR: As an infrequently and locally run developer-centric application, there is no requirement for HA or DR.
- Performance: While summarisation is time-intensive, the application should be reasonably performant.
- Extensibility: The agent-based architecture should allow for easy addition of new features and modification of existing logic.
- Maintainability: The codebase will be modular, with a clear separation of concerns (CLI, agent logic, tools), to facilitate easy maintenance.
- Testability: The project should include a suite of unit tests.
The LLMS-Generator is implemented as an agentic application using the google-adk. The architecture is composed of a CLI, an orchestrator agent, plus sub-agents and tools. The solution design below shows component interactions, and the arrow labels show the sequence of interactions:
It’s fair to say that some aspects of this design evolved during during the implementation. But we’ll get to that!
Design Decisions
- Use Generative AI: since we need to summarise artifacts (including documentation and code) in a folder or repo, a generative AI solution is ideal. AI provides the core functionality of the application.
- Use Gemini-2.5-Flash: Gemini is a leading multi-modal foundation model, well-suited to the task of document summarisation. Gemini also has a very large context window, which is useful for when reading large numbers of documents for summarisation. Flash will be used because it is both faster and cheaper than Gemini Pro, and we do not need the more sophisticated reasoning capabilities of the Pro model. We have no need for a custom trained model. And finally, the model is fully-managed by Google and we can consume it using the standard Gemini APIs.
-
Agent-Based Architecture (
google-adk): This provides a modular and extensible framework. By breaking down the logic into independent agents and tools, the system is easier to develop, test, and maintain. It also allows for the orchestration of complex workflows by the LLM. - Orchestrator Agent: A primary ADK coordinator agent will orchestrate the entire workflow.
- Sequential agent for Summarisation: The summarisation process is naturally a two-step sequence: read files, then summarize them. Using a sequential agent ensures this order of operations, leading to a more reliable and predictable workflow.
-
Command Line Interface with
Typer: A CLI is a standard and efficient interface for a developer-focused tool. TheTyperpackage simplifies the creation of a clean and professional CLI in Python, complete with automatic help generation and argument parsing. -
Schema Validation with
pydantic: We can define the expected output schema for the summarisation agent, to make the system more robust. This ensures that the data passed between agents is in the correct format, reducing the likelihood of runtime errors. - Exponential Backoff for API Calls: The application may make frequent calls to the model in a short amount of time. This will lead to 429 errors. We can mitigate with exponential backoff.
- No persistence required: it is expected that the entire flow can be accomplished without any need for external working storage or databases. If the workflow exceeds what is possible within model context, we can implement external persistence later. E.g. we could implement a simple Firestore database to store content we have gathered, and to build up the summaries, before returning them to the agent.
Building the Application
Now I’ll walk you through my experience of actually building the application. To follow along, you can find the complete code in my GitHub repo.
Getting Started
For this project I didn’t start with the Agent Starter Pack. The starter pack provides a bunch of stuff that I was unlikely to leverage. So I decided to just start from scratch and create this folder hierarchy with some empty files:
llms-gen/
├── notebooks/
│ └── generate_llms_experiments.ipynb
├── src/
│ ├── client_fe/
│ │ └── __init__.py
│ ├── common_utils/
│ │ └── __init__.py
│ ├── llms_gen_agent/
│ │ ├── __init__.py
│ │ ├── agent.py
│ │ └── tools.py
│ └── tests/
│ └── __init__.py
├── .gitattributes # reused from other projects
├── .gitignore # reused from other projects
├── README.md
└── TODO.md
Then I populated my README.md with the project overview. (I’ll build out the README.md as we go.)
Add a TODO
These days, I always have a TODO.md. It helps me work through my plan, but it also helps my code assist agent too. Although it started out a fair bit shorter than this, here’s what my TODO.md looks like today, at the time of writing this blog:
# TODO
- [x] Create project scaffold, including `README`, `src`, agent folder, `.gitignore`, `.gitattributes`
- [x] Create initial `TODO`
- [x] Create `pyproject.toml`
- [x] Create `.env` and point to a Google Cloud project
- [x] Create environment setup script
- [x] Create `Makefile`
- [x] Create `GEMINI.md`
- [x] Create configuration and logging modules
- [x] Create Coordinator Agent
- [x] Create Discover Files tool
- [x] Create File Reader Agent and file read tool
- [x] Create Content Summariser Agent
- [x] Create initial unit tests
- [x] Create experimentation Jupyter notebook
- [x] Parameterise number of files to process
- [x] Implement pydantic to enforce output schema
- [x] Add sequential agent such that all files are read first, and then all content is summarised second.
- [x] Add callback to clean any JSON preamble or erroneous markup from the model.
- [x] Complete project summarisation step.
- [x] Eliminate 429/quote issues when calling Gemini, particularly from `document_summariser_agent`
- [x] Add callback to capture the output of read files and store in session state.
- [x] Fewer sections, controlled by folder depth.
- [x] Complete final `llms.txt` file creation.
- [x] Provide a client way to run the application without having to send a prompt, e.g. using CLI arguments.
- [x] Make repo public.
- [ ] Write blog.
- [ ] Increase test coverage by adding unit tests for the agents and other utility functions.
- [ ] Replace LangChain File Read tool with custom tool; eliminate need for callback.
- [ ] Add integration tests to test the end-to-end functionality of the agent.
- [ ] Make the list of excluded directories in `discover_files` configurable, in a deterministic way.
- [ ] Exclude also based on .gitignore.
- [ ] Make the solution iterate, e.g. if output is incomplete, or nearing filling context window.
By the way, this is part of my global .gemini/GEMINI.md context file:
## Project Plan
- Check for a `TODO.md` file in the current project. If it exists, this file captures the overall plan for this project. It can be used to determine what we've achieved so far, and what other tasks we need to do.
- When you believe you have completed a step in the `TODO.md`, offer to mark it closed.
This helps guide Gemini CLI / Code Assist Agent to properly use my TODO.md.
Create the Pyproject.toml
When you’re building a Python project, managing dependencies can be a pain. pyproject.toml is the modern solution. Think of it as the master blueprint for your Python project. A single, structured file that defines everything about your project's build system and dependencies. It's a comprehensive configuration file that ensures consistency and reproducibility.
And the blazingly fast package manager uv reads the pyproject.toml blueprint and makes it a reality. It creates virtual environments for you and installs all your dependencies without the usual fuss.
(Wondering what happened to pip and requirements.txt? We don’t need them. They are replaced by uv and proproject.toml, respectively.)
For this project, I started by copying an existing pyproject.toml from a previous project, and tweaked it as required. So we end up with something like this:
[project]
name = "llms-generator"
version = "0.1.0"
description = "An agentic solution designed to create a `llms.txt` file for any given repo or folder"
authors = [
{name = "Dazbo (Darren Lester)", email = "my.email@address.com"},
]
dependencies = [
"google-adk",
"google-genai",
"google-cloud-logging",
"google-cloud-aiplatform[adk,evaluation,agent_engines]",
"python-dotenv",
# Web framework
"fastapi~=0.115.14",
"uvicorn~=0.34.3", # means >= 0.34.3 but < 0.35
"pyyaml",
]
requires-python = ">=3.12,<3.13"
[dependency-groups]
dev = [
"pytest",
"pytest-asyncio",
"nest-asyncio",
]
[project.optional-dependencies]
jupyter = [
"jupyter",
"ipython"
]
lint = [
"ruff>=0.4.6",
"mypy~=1.17.0",
"codespell~=2.4.1",
"types-pyyaml~=6.0.12",
"types-requests~=2.32.4",
]
[tool.ruff]
line-length = 130
target-version = "py312"
[tool.ruff.lint]
select = [
"E", # pycodestyle
"F", # pyflakes
"W", # pycodestyle warnings
"I", # isort
"C", # flake8-comprehensions
"B", # flake8-bugbear
"UP", # pyupgrade
"RUF", # ruff specific rules
]
ignore = [
"E302", # expected two blank lines between defs
"W291", # trailing whitespace
"W293" # line contains whitespace
]
[tool.ruff.lint.isort]
known-first-party = ["src"] # Because this is where my source lives
[tool.mypy]
disallow_untyped_calls = false # Prohibit calling functions that lack type annotations.
disallow_untyped_defs = false # Allow defining functions without type annotations.
disallow_incomplete_defs = true # Prohibit defining functions with incomplete type annotations.
no_implicit_optional = true # Require `Optional[T]` for variables that can be `None`.
check_untyped_defs = true # Type-check the body of functions without annotations. Catch potential mismatches.
disallow_subclassing_any = true # Prohibit a class from inheriting from a value of type `Any`.
warn_incomplete_stub = true # Warn about incomplete type stubs (`.pyi` files).
warn_redundant_casts = true # Warn if a type cast is unnecessary.
warn_unused_ignores = true # Warn about `# type: ignore` comments that are no longer needed.
warn_unreachable = true # Warn about code that is unreachable.
follow_imports = "silent" # Type-check imported modules but suppress errors from them.
ignore_missing_imports = true # Suppress errors about unresolved imports.
explicit_package_bases = true # Enforce explicit declaration of package bases.
disable_error_code = ["misc", "no-any-return", "no-untyped-def"]
exclude = [".venv", ".git"]
[tool.codespell]
ignore-words-list = "rouge"
skip = "./locust_env/*,uv.lock,.venv,./src/frontend,**/*.ipynb"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.pytest.ini_options]
pythonpath = "."
asyncio_default_fixture_loop_scope = "function"
testpaths = ["src/tests"] # This helps pytest to find tests, making collection faster
[tool.hatch.build.targets.wheel]
packages = ["src/llms_gen_agent", "src/common_utils", "src/client_fe"]
It’s pretty self-explanatory. A couple of things worth noting:
- I can install all my dependencies using the
uvcommanduv sync. - There are optional dependencies that we only install when we need them. E.g.
uv sync --dev --extra jupyter --extra lint - We’re using
rufffor linting and formatting,mypyfor static type checking, andcodespellto look for spelling mistakes in the repo.
.env
Now I create .env for my local environment setup. Note that this file should not be checked-in to source control. For the LLMS-Generator application it should look something like this:
# .env
export GOOGLE_CLOUD_STAGING_PROJECT="your-staging-project-id"
export GOOGLE_CLOUD_PRD_PROJECT="your-prod-project-id"
# These Google Cloud variables will be set by the scripts/setup-env.sh script
# GOOGLE_CLOUD_PROJECT=""
# GOOGLE_CLOUD_LOCATION="global"
export PYTHONPATH="src"
# Agent variables
export AGENT_NAME="llms_gen_agent" # The name of the agent
export MODEL="gemini-2.5-flash" # The model used by the agent
export GOOGLE_GENAI_USE_VERTEXAI="True" # True to use Vertex AI for auth; else use API key
export LOG_LEVEL="INFO"
Again, we’ll build on this as we go.
Create a Makefile for Convenience
I love a Makefile! It’s really convenient for installing dependencies, running ruff/mypy/codespell, running tests, and launching the application.
| Command | Description |
|---|---|
source scripts/setup-env.sh |
Setup Google Cloud project and auth with Dev/Staging |
make install |
Install all required dependencies using uv
|
make playground |
Launch UI for testing agent locally and remotely. This runs uv run adk web src
|
make test |
Run unit and integration tests |
make lint |
Run code quality checks (codespell, ruff, mypy) |
make generate |
Execute the Llms-Generator command line application |
And we can configure our make targets so that they check pre-reqs before running. For example, when I run make test it will first check that my GOOGLE_CLOUD_PROJECT environment variable has been set. If it hasn’t, then it means I probably haven’t yet run my setup-env.sh script; and my tests will certainly fail.
# Run unit and integration tests
test:
@test -n "$(GOOGLE_CLOUD_PROJECT)" || (echo "Error: GOOGLE_CLOUD_PROJECT is not set. Setup environment before running tests" && exit 1)
uv run pytest src/tests/unit
Provide Context to Gemini CLI / Gemini Code Assist
This is a good time to create a GEMINI.md. It builds on my “global” .gemini/GEMINI.md with context that is specific for this project. Here’s what it looks like:
# Project: LLMS-Generator
---
***IMPORTANT: Run this check at the start of EVERY session!***
Google Cloud configuration is achieved through a combination of `.env` and the `scripts/setup-env.sh` script.
Before providing your FIRST response in any conversation, you MUST perform the following steps:
1. Run `printenv GOOGLE_CLOUD_PROJECT` to check the environment variable.
2. Based only on the output of that command, state whether the variable is set.
3. If it is not set, advise me to run `scripts/setup-env.sh` before resuming the conversation.
The presence of this environment variable indicates that the script has been run.
The absence of this variable indicates that the script has NOT been run.
Note that failures with Google Cloud are likely if this script has not been run. For example, tests will fail. If tests are failing, we should check if the script has been run.
---
## Project Overview
_LLMS-Generator_ is an agentic solution designed to create a `llms.txt` file for any given repo or folder.
The `llms.txt` file is an AI/LLM-friendly markdown file that enables an AI to understand the purpose of the a repo, as well as have a full understanding of the repo site map and the purpose of each file it finds. This is particularly useful when providing AIs (like Gemini) access to documentation repos.
The `llms.txt` file will have this structure:
- An H1 with the name of the project or site
- An overview of the project / site purpose.
- Zero or more markdown sections delimited by H2 headers, containing appropriate section summaries.
- Each section contains a list of of markdown hyperlinks, in the format: `[name](url): summary`.
See [here](https://github.com/AnswerDotAI/llms-txt) for a more detailed description of the `llms.txt` standard.
## Building and Running
### Dependencies
- **uv:** Python package manager
- **Google Cloud SDK:** For interacting with GCP services
- **make:** For running common development tasks
Project dependencies are managed in `pyproject.toml` and can be installed using `uv`. The `make` commands streamline many `uv` and `adk` commands.
## Development Guide
- **Configuration:** Project dependencies and metadata are defined in `pyproject.toml`.
- **Dependencies:** Project dependencies are managed in `pyproject.toml`. The `[project]` section defines the main dependencies, and the `[dependency-groups]` section defines development and optional dependencies.
- **Source code:** Lives in the `src/` directory. This includes agents, frontends, notebooks and tests.
- **Notebooks:** The `notebooks/` directory contains Jupyter notebooks for prototyping, testing, and evaluating the agent.
- **Testing:** The project includes unit and integration tests in `src/tests/`. Tests are written using `pytest` and `pytest-asyncio`. They can be run with `make test`
- **Linting:** The project uses `ruff` for linting and formatting, `mypy` for static type checking, and `codespell` for checking for common misspellings. The configuration for these tools can be found in `pyproject.toml`. We can run linting with `make lint`.
- **AI-Assisted Development:** The `GEMINI.md` file provides context for AI tools like Gemini CLI to assist with development.
## Project Plan
- The `TODO.md` captures the overall plan for this project.
Note how this GEMINI.md:
- Helps Gemini understand the purpose of this project.
- Forces Gemini to check that my
setup-env.shscript has been run before doing commencing any conversations. - Helps Gemini understand the folder structure and conventions I’m following.
Google Cloud Project
This application will make use of the Google Gemini-2.5-Flash model / API. To do so, we either have to setup Google ADC locally and point it to a Google Cloud project that has this API enabled; or we need to provide an API key.
I’m doing the former. I didn’t bother creating a new Google Cloud project for this, as I already have a “scratch” project that I tend to use for this kind of development. And — at this point in time — I’m not planning on deploying the application itself into Google Cloud. I will only run it locally. But you might need to (or prefer to) create new project.
Environment Setup Script
This application requires a few things to happen with each new session:
- We need to set our environment variables, by reading the
.env - We need to authenticate to Google Cloud, in order to use Google Cloud APIs. (Specifically: Gemini.)
- We need to install dependencies, as defined in
pyproject.toml.
So I’ve created a script to automate this: /scripts/setup-env.sh:
#!/bin/bash
# This script is meant to be sourced to set up your development environment.
# It configures gcloud, installs dependencies, and activates the virtualenv.
#
# Usage:
# source ./setup-env.sh [--noauth] [-t|--target-env <DEV|PROD>]
#
# Options:
# --noauth: Skip gcloud authentication.
# -t, --target-env: Set the target environment (DEV or PROD). Defaults to DEV.
# --- Color and Style Definitions ---
RESET='\033[0m'
BOLD='\033[1m'
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
BLUE='\033[0;34m'
# --- Parameter parsing ---
TARGET_ENV="DEV"
AUTH_ENABLED=true
while [[ $# -gt 0 ]]; do
case "$1" in
-t|--target-env)
if [[ -n "$2" && "$2" != --* ]]; then
TARGET_ENV="$2"
shift 2
else
echo "Error: --target-env requires a non-empty argument."
return 1
fi
;;
--noauth)
AUTH_ENABLED=false
shift
;;
*)
shift
;;
esac
done
# Convert TARGET_ENV to uppercase
TARGET_ENV=$(echo "$TARGET_ENV" | tr '[:lower:]' '[:upper:]')
echo -e "${BLUE}${BOLD}--- ☁️ Configuring Google Cloud environment ---${RESET}"
# 1. Check for .env file
if [ ! -f .env ]; then
echo -e "${RED}❌ Error: .env file not found.${RESET}"
echo "Please create a .env file with your project variables and run this command again."
return 1
fi
# 2. Source environment variables and export them
echo -e "Sourcing variables from ${BLUE}.env${RESET} file..."
set -a # automatically export all variables (allexport = on)
source .env
set +a # disable allexport mode
# 3. Set the target project based on the parameter
if [ "$TARGET_ENV" = "PROD" ]; then
echo -e "Setting environment to ${YELLOW}PROD${RESET} ($GOOGLE_CLOUD_PRD_PROJECT)..."
export GOOGLE_CLOUD_PROJECT=$GOOGLE_CLOUD_PRD_PROJECT
else
echo -e "Setting environment to ${YELLOW}DEV/Staging${RESET} ($GOOGLE_CLOUD_STAGING_PROJECT)..."
export GOOGLE_CLOUD_PROJECT=$GOOGLE_CLOUD_STAGING_PROJECT
fi
# 4. Authenticate with gcloud and configure project
if [ "$AUTH_ENABLED" = true ]; then
echo -e "\n🔐 Authenticating with gcloud and setting project to ${BOLD}$GOOGLE_CLOUD_PROJECT...${RESET}"
gcloud auth login --update-adc 2>&1 | grep -v -e '^
' -e 'WSL' -e 'xdg-open' # Suppress any annoying WSL messages
gcloud config set project "$GOOGLE_CLOUD_PROJECT"
gcloud auth application-default set-quota-project "$GOOGLE_CLOUD_PROJECT"
else
echo -e "\n${YELLOW}Skipping gcloud authentication as requested.${RESET}"
gcloud config set project "$GOOGLE_CLOUD_PROJECT"
fi
echo -e "\n${BLUE}--- Current gcloud project configuration ---${RESET}"
gcloud config list project
echo -e "${BLUE}------------------------------------------${RESET}"
# 5. Get project numbers
echo "Getting project numbers..."
export STAGING_PROJECT_NUMBER=$(gcloud projects describe $GOOGLE_CLOUD_STAGING_PROJECT --format="value(projectNumber)")
export PROD_PROJECT_NUMBER=$(gcloud projects describe $GOOGLE_CLOUD_PRD_PROJECT --format="value(projectNumber)")
echo -e "${BOLD}STAGING_PROJECT_NUMBER:${RESET} $STAGING_PROJECT_NUMBER"
echo -e "${BOLD}PROD_PROJECT_NUMBER:${RESET} $PROD_PROJECT_NUMBER"
echo -e "${BLUE}------------------------------------------${RESET}"
# 6. Sync Python dependencies and activate venv
echo "Activating Python virtual environment..."
source .venv/bin/activate
echo "Syncing python dependencies with uv..."
uv sync --dev --extra jupyter
echo -e "\n${GREEN}✅ Environment setup complete for ${BOLD}$TARGET_ENV${RESET}${GREEN} with project ${BOLD}$GOOGLE_CLOUD_PROJECT${RESET}${GREEN}. Your shell is now configured.${RESET}"
You run it with this command:
source scripts/setup-env.sh [--noauth]
And when it runs, it looks like this:
These days, I tend to use the same script in all of my Google Cloud-related applications. You might find it useful too!
Configuration and Logging
I like to start with a config.py convenience module, for loading configuration from environment variables.
"""This module provides configuration for the LLMS-Generator agent."""
import os
from collections.abc import Callable
from dataclasses import dataclass
import google.auth
from common_utils.exceptions import ConfigError
from common_utils.logging_utils import setup_logger
# --- Constants for default environment variables ---
DEFAULT_AGENT_NAME = "llms_gen_agent"
DEFAULT_GCP_LOCATION = "global"
DEFAULT_MODEL = "gemini-2.5-flash"
DEFAULT_GENAI_USE_VERTEXAI = "True"
DEFAULT_MAX_FILES_TO_PROCESS = "0"
DEFAULT_BACKOFF_INIT_DELAY = "2"
DEFAULT_BACKOFF_ATTEMPTS = "5"
DEFAULT_BACKOFF_MAX_DELAY = "60"
DEFAULT_BACKOFF_MULTIPLIER = "2"
agent_name = os.environ.setdefault("AGENT_NAME", DEFAULT_AGENT_NAME)
logger = setup_logger(agent_name)
@dataclass
class Config:
"""Holds application configuration."""
agent_name: str
project_id: str
location: str
model: str
genai_use_vertexai: bool
max_files_to_process: int # 0 means no limit
backoff_init_delay: int
backoff_attempts: int
backoff_max_delay: int
backoff_multiplier: int
valid: bool = True # Set this to False to force config reload from env vars
def invalidate(self):
"""
Invalidate current config. This forces the config to be refreshed
from the environment when get_config() is next called.
"""
logger.debug("Invalidating current config.")
self.valid = False
def __str__(self):
return (
f"Agent Name: {self.agent_name}\n"
f"Project ID: {self.project_id}\n"
f"Location: {self.location}\n"
f"Model: {self.model}\n"
f"GenAI Use VertexAI: {self.genai_use_vertexai}\n"
f"Max Files To Process: {self.max_files_to_process}\n"
f"Backoff Init Delay: {self.backoff_init_delay}\n"
f"Backoff Attempts: {self.backoff_attempts}\n"
f"Backoff Max Delay: {self.backoff_max_delay}\n"
f"Backoff Multiplier: {self.backoff_multiplier}\n"
)
def _get_env_var(key: str, default_value: str, type_converter: Callable=str):
"""Helper to get environment variables with a default and type conversion."""
return type_converter(os.environ.setdefault(key, default_value))
current_config = None
def setup_config() -> Config:
"""Gets the application configuration by reading from the environment.
The expensive Google Auth call to determine the project ID is only performed once.
If the current_config is invalid, the config will be refreshed from the environment.
Otherwise, the cached config is returned.
Returns:
Config: An object containing the current application configuration.
Raises:
ConfigError: If the GCP Project ID cannot be determined on the first call.
"""
global current_config
# Load env vars
location = _get_env_var("GOOGLE_CLOUD_LOCATION", DEFAULT_GCP_LOCATION)
model = _get_env_var("MODEL", DEFAULT_MODEL)
genai_use_vertexai = _get_env_var("GOOGLE_GENAI_USE_VERTEXAI", DEFAULT_GENAI_USE_VERTEXAI, lambda x: x.lower() == "true")
max_files_to_process = _get_env_var("MAX_FILES_TO_PROCESS", DEFAULT_MAX_FILES_TO_PROCESS, int)
backoff_init_delay = _get_env_var("BACKOFF_INIT_DELAY", DEFAULT_BACKOFF_INIT_DELAY, int)
backoff_attempts = _get_env_var("BACKOFF_ATTEMPTS", DEFAULT_BACKOFF_ATTEMPTS, int)
backoff_max_delay = _get_env_var("BACKOFF_MAX_DELAY", DEFAULT_BACKOFF_MAX_DELAY, int)
backoff_multiplier = _get_env_var("BACKOFF_MULTIPLIER", DEFAULT_BACKOFF_MULTIPLIER, int)
if current_config:
# If we've already loaded the config before
if current_config.valid:
# return it as is
return current_config
else:
# Current config invalid - we need to update it
current_config.location=location
current_config.model=model
current_config.genai_use_vertexai=genai_use_vertexai
current_config.max_files_to_process=max_files_to_process
current_config.backoff_init_delay=backoff_init_delay
current_config.backoff_attempts=backoff_attempts
current_config.backoff_max_delay=backoff_max_delay
current_config.backoff_multiplier=backoff_multiplier
logger.debug(f"Updated config:\n{current_config}")
return current_config
# If we're here, then we've never created a config before
_, project_id = google.auth.default()
if not project_id:
raise ConfigError("GCP Project ID not set. Have you run scripts/setup-env.sh?")
current_config = Config(
agent_name=agent_name,
project_id=project_id,
location=location,
model=model,
genai_use_vertexai=genai_use_vertexai,
max_files_to_process=max_files_to_process,
backoff_init_delay=backoff_init_delay,
backoff_attempts=backoff_attempts,
backoff_max_delay=backoff_max_delay,
backoff_multiplier=backoff_multiplier
)
logger.debug(f"Loaded config:\n{current_config}")
return current_config
And here’s src/common_utils/logging_utils.py:
"""
This module provides a shared logging utility for the application.
It offers a centralized `setup_logger` function that configures
and returns a standardized logger instance. This ensures consistent
logging behavior, formatting, and level across the entire application.
To use the logger in any module, import the `setup_logger` function
and call it with a name, typically `__name__`, to get a logger
instance specific to that module.
Example:
```
from common_utils.logging_utils import setup_logger
logger = setup_logger(__name__)
```
In this application we setup up the logger in `config.py`, and then expose
that logger to other modules. E.g.
```
from llms_gen_agent.config import get_config, logger
```
"""
import logging
import os
def setup_logger(app_name: str) -> logging.Logger:
# Suppress verbose logging from ADK and GenAI libraries - INFO logging is quite verbose
logging.getLogger("google_adk").setLevel(logging.ERROR)
logging.getLogger("google_genai").setLevel(logging.ERROR)
# Suppress "Unclosed client session" warnings from aiohttp
logging.getLogger('asyncio').setLevel(logging.CRITICAL)
"""Sets up and a logger for the application."""
log_level = os.environ.get("LOG_LEVEL", "INFO").upper()
app_logger = logging.getLogger(app_name)
log_level_num = getattr(logging, log_level, logging.INFO)
app_logger.setLevel(log_level_num)
# Add a handler only if one doesn't exist to prevent duplicate logs
if not app_logger.handlers:
handler = logging.StreamHandler()
formatter = logging.Formatter(
fmt="%(asctime)s.%(msecs)03d:%(name)s - %(levelname)s: %(message)s",
datefmt="%H:%M:%S",
)
handler.setFormatter(formatter)
app_logger.addHandler(handler)
app_logger.propagate = False # Prevent propagation to the root logger
app_logger.info("Logger initialised for %s.", app_name)
app_logger.debug("DEBUG level logging enabled.")
return app_logger
This logging module also disables some of the verbose logging from the google-adk, google-genai, and asyncio packages.
What’s Next?
Now we’ve designed our solution and setup our project and development environment, we’re ready to code the agents themselves. This is what we’ll do in Part 3. See you there!
You Know What To Do!
- Please share this with anyone that you think will be interested. It might help them, and it really helps me!
- Please give me 50 claps! (Just hold down the clap button.)
- Feel free to leave a comment 💬.
- Follow and subscribe, so you don’t miss my content.
Useful Links and References
The LLMS-Generator Repo
Google Cloud ADK
Llms.Txt
- https://llmstxt.org/
- MCP Llms.txt Documentation Server
- Give Your AI Agents Deep Understanding With LLMS.txt



Top comments (0)