Build a real tool-using AI agent that understands YouTube videos and answers questions. In pure PHP
For years, using AI in PHP meant little more than sending prompts to an API and displaying the result. Valuable, yes, but also limited, fragile, and prone to hallucinations.
Today, the industry is moving toward a new paradigm: Agentic AI.
Instead of a single prompt-response interaction, we now build agents: goal-oriented AI programs that can reason, decide when to use tools, call real code, and operate inside well-defined constraints.
And yes: PHP is absolutely ready for this.
Thanks to libraries like Neuron AI, Symfony AI, Prism, and soon Laravel AI, the PHP ecosystem is becoming a first-class platform for building production-grade AI systems.
The PHP ecosystem is becoming a first-class platform for building production-grade AI systems.
To showcase the power and practicality of this approach, we will build a concrete example: an agent that acts as an assistant, capable of answering questions about a YouTube video. To achieve this, we will give our agent the ability to fetch the video’s transcript (and we will see how easy this is to do in PHP) and then use that transcript as its source of context and knowledge.
By the end of this article, you will have an agent that you can interrogate about any specific video. Whether you want to summarize a tutorial, extract key insights, or ask precise questions about a long talk or conference recording, you will be able to do it using an agent you built yourself.
The PHP tools/libraries we are going to use for building the YouTube AI agent:
- Neuron AI as the PHP agenting framework https://www.neuron-ai.dev/
- Gemini as the LLM provider https://ai.google.dev/gemini-api/docs
- The
gemini-2.5-flashmodel, but of course, you can use a richer model https://ai.google.dev/gemini-api/docs/models - The PHP YouTube Transcript open-source library, designed to fetch transcripts for YouTube videos https://github.com/MrMySQL/youtube-transcript.
By the end, you’ll have an agent that can answer questions about a YouTube video using only its transcript.
What is an Agent?
An agent is a goal-driven AI program that can reason, plan, and decide when to use tools to complete a task.
In practice, an agent:
- Has clear instructions
- Has rules and boundaries
- Has tools it can use
- Can call real code
- Can chain multiple steps
- And can operate reliably inside constraints
This is the foundation of modern AI applications.
What is a Tool?
A Tool is a function that the AI is allowed to call.
Instead of guessing or hallucinating data, the model can:
- Query your database with your current data
- Call external APIs
- Read files
- Execute business logic
- Fetch external resources
The flow becomes:
- User asks a question
- Agent reasons (following predefined steps and instructions)
- Agent decides to call a tool
- Tool executes real PHP code
- Agent receives the result from the Tool
- Agent uses it to answer
This is how you build reliable AI systems.
Agentic AI in the PHP Ecosystem
The PHP ecosystem has always been driven by strong, pragmatic, and highly active communities—and the current wave of AI adoption is no exception. Over the past few months, we’ve seen an increasing number of serious, well-designed libraries emerge, built by people who have a deep understanding of both PHP and real-world production needs.
Rather than treating AI as a gimmick or a thin API wrapper, these projects are focusing on architecture, maintainability, and integration into existing applications. As a result, the PHP ecosystem is evolving extremely fast to support AI more and more, and the number of available tools is growing almost every week, especially thanks to the energy of the Symfony, Laravel, and independent open-source communities.
Today, we already have excellent building blocks, such as Neuron AI, which provides a complete agent framework with tool support and orchestration; Symfony AI, which brings first-class AI integration into the Symfony ecosystem; and Prism, which offers a clean abstraction layer over different LLM providers. And this is only the beginning: a dedicated Laravel AI ecosystem is already on the way.
In this article, we’ll focus on Neuron AI, because it already offers a mature and complete Agent + Tool architecture, making it an ideal foundation for building real-world, production-grade Agentic AI systems in PHP.
Our goal: a YouTube transcript QA agent
We will build an agent that:
- Accepts a YouTube video ID
- Uses a specific tool to fetch the transcript (we are going to implement the Agent tool)
- Answers questions using the transcript
Installation and dependencies
Before we start building our agent, we need to install a few libraries. Each of them plays a very specific role in the architecture of our application, and together they form a clean, modular, and production-ready stack.
You can install everything using Composer:
composer require neuron-core/neuron-ai
composer require mrmysql/youtube-transcript
composer require symfony/http-client nyholm/psr7
composer require vlucas/phpdotenv
Let’s briefly look at why we need each of these packages.
Neuron AI
composer require neuron-core/neuron-ai
This is the core of our system. Neuron AI provides the Agent abstraction, the Tool system, the orchestration layer, and the provider integrations (such as Gemini, OpenAI, Ollama, etc.). It is the framework that allows us to define instructions, expose tools to the model, and let the agent reason about when and how to use them. Without it, we would just be sending prompts to an API; with it, we are building a real Agentic system.
YouTube Transcript fetcher
composer require mrmysql/youtube-transcript
This library is a small but very useful utility that allows us to fetch the transcript of a YouTube video programmatically. Instead of scraping HTML or dealing with undocumented APIs ourselves, we can rely on a clean, tested PHP library. This will be the “real-world data source” that our agent will access through a tool.
Symfony HTTP Client and PSR-7 Implementation
composer require symfony/http-client nyholm/psr7
The YouTube transcript library relies on PSR-18 (HTTP client) and PSR-7 (HTTP messages) interfaces. Here, we use Symfony’s HTTP client as a robust, production-ready implementation, together with Nyholm’s PSR-7 implementation for request and response objects. This keeps everything standards-based and framework-agnostic.
PHP dotenv
composer require vlucas/phpdotenv
Finally, we use phpdotenv to load our Gemini API key from a .env file instead of hardcoding it into the source code. This is a best practice for both security and portability, and it allows us to easily change credentials between environments without touching the code.
This set of dependencies already shows an important aspect of modern PHP: we are building our AI system using small, composable, standards-based components, rather than a single monolithic library.
The complete Agent
<?php
use MrMySQL\YoutubeTranscript\TranscriptListFetcher;
use NeuronAI\Agent;
use NeuronAI\Chat\Messages\UserMessage;
use NeuronAI\Providers\AIProviderInterface;
use NeuronAI\Providers\Gemini\Gemini;
use NeuronAI\SystemPrompt;
use NeuronAI\Tools\PropertyType;
use NeuronAI\Tools\Tool;
use NeuronAI\Tools\ToolProperty;
use Nyholm\Psr7\Factory\Psr17Factory;
use Symfony\Component\HttpClient\HttpClient;
use Symfony\Component\HttpClient\Psr18Client;
require "./vendor/autoload.php";
class YouTubeTranscriptQAAgent extends Agent
{
protected function provider(): AIProviderInterface
{
$dotenv = Dotenv\Dotenv::createImmutable(__DIR__);
$dotenv->load();
return new Gemini(
key: $_ENV["GEMINI_API_KEY"],
model: "gemini-2.5-flash",
);
}
public function instructions(): string
{
return (string) new SystemPrompt(
background: [
"You are a helpful assistant that answers questions about a YouTube video using ONLY the provided transcript.",
"You must use the tool to retrieve the transcript.",
"If the transcript does not contain the answer, say so clearly and ask for another video id.",
"Never use external knowledge.",
],
steps: [
"Ask the user for a YouTube video URL or ID if not provided.",
"Call the get_transcription tool to retrieve the transcript.",
"Read the transcript carefully.",
"Answer the user's question strictly using the transcript.",
],
output: [
"Be concise and precise.",
"If the transcript does not contain the answer, reply: 'The transcript does not contain this information.'",
],
);
}
protected function tools(): array
{
return [
Tool::make(
"get_transcription",
"Retrieve the transcription of a YouTube video.",
)
->addProperty(
new ToolProperty(
name: "video_id",
type: PropertyType::STRING,
description: "The YouTube video ID (e.g. wT1lcJ_zn18).",
required: true,
),
)
->setCallable(function (?string $video_id): string {
if (is_null($video_id)) {
return "";
}
return $this->fetchTranscript($video_id);
}),
];
}
private function fetchTranscript(string $videoId, string $language = "en"): string
{
$httpClient = HttpClient::create();
$psr18Client = new Psr18Client($httpClient);
$psr17Factory = new Psr17Factory();
$fetcher = new TranscriptListFetcher(
$psr18Client,
$psr17Factory,
$psr17Factory,
);
$transcriptList = $fetcher->fetch($videoId);
$transcript = $transcriptList->findTranscript([$language]);
$lines = [];
foreach ($transcript->fetch() as $item) {
$lines[] = html_entity_decode(
$item["text"],
ENT_QUOTES | ENT_HTML5,
"UTF-8",
);
}
return implode(PHP_EOL, $lines);
}
}
Understanding the Agent: a guided tour of the code
Now that we’ve seen the complete implementation, let’s walk through the most important parts of the agent step by step. The goal here is not just to understand what the code does, but how the different pieces work together to form a real Agentic AI system.
1. The Agent class
class YouTubeTranscriptQAAgent extends Agent
This is the heart of the application. By extending the Agent class provided by Neuron AI, we are not building a simple wrapper around an LLM, we are defining a full AI agent with:
- Its own instructions
- Its own tools
- Its own model provider
- And its own behavior and constraints
From this point forward, Neuron AI will handle orchestration, tool invocation, and conversation flow.
2. Choosing the model provider
protected function provider(): AIProviderInterface
This method instructs Neuron AI which provider (in this case, via the Gemini class provided by Neuron AI, but you can explore other providers, such as Ollama) and which model should power the agent.
return new Gemini(
key: $_ENV["GEMINI_API_KEY"],
model: "gemini-2.5-flash",
);
Here, we use Gemini as a provider, but one of the strengths of Neuron AI is that this is easily swappable. The rest of the agent does not care whether you are using Gemini, OpenAI, or Ollama, the architecture stays exactly the same.
We also load the API key from a .env file, which is a best practice for security and configuration management.
3. The system prompt: the Agent’s brain
public function instructions(): string
The instructions() method defines the entire behavioral contract of the agent.
Instead of a single free-form prompt, we build a structured SystemPrompt with:
- A
background: what the agent is and what it must never do - A set of
steps: how to approach the task - An
outputcontract: how it should respond
This is one of the most important parts of the system. These instructions:
- Force the agent to use the tool
- Forbid it from using external knowledge
- Greatly reduce hallucinations
- Make the agent more predictable and reliable
In other words, this is where we transform a generic LLM into a specialized, constrained assistant.
4. Defining the Tool
protected function tools(): array
Here we declare what the agent is allowed to do in the real world.
Tool::make("get_transcription", "Retrieve the transcription of a YouTube video.")
This creates a tool named get_transcription. The model can decide to call it whenever it thinks it needs the transcript.
5. Describing the Tool interface
->addProperty(
new ToolProperty(
name: "video_id",
type: PropertyType::STRING,
required: true,
),
)
This is crucial: we are not just exposing a PHP function, we are defining a formal schema for it.
Thanks to this, the model knows:
- What arguments are required
- What type must they have
- How to call the tool correctly
This is what makes tool calling reliable and structured, rather than fragile and heuristic.
6. Connecting the Tool to real PHP code
->setCallable(function (?string $video_id): string {
return $this->fetchTranscript($video_id);
})
This is the bridge between the AI world and your application.
When the model decides to call the tool, this PHP function is executed. At this point:
- The AI stops “imagining”
- And your real, deterministic PHP code takes over
This is the exact moment where the agent becomes more than a chatbot.
7. The actual transcript fetching logic
private function fetchTranscript(...)
Everything here is our old, expected, reliable PHP, and that’s a good thing.
We use:
- Symfony’s HTTP client
- PSR-7 / PSR-18 interfaces
- The
mrmysql/youtube-transcriptlibrary
To fetch the transcript, normalize it, and return it as a simple text variable.
From the agent’s point of view, this is just a tool. From your perspective, this provides full control over data access and behavior.
Using our new Agent
Once the agent is defined, using it in your application is simple.
$agent = new YouTubeTranscriptQAAgent();
$response = $agent->chat(
new UserMessage("What is the main topic of the video 'wT1lcJ_zn18'?"),
);
echo $response->getContent();
We start by instantiating our agent just like any other PHP object. At this point, we are not dealing with a “model client” or a low-level API wrapper, but with a fully configured AI agent that already knows:
- Which model to use
- What its instructions are
- Which tools it is allowed to call
- And how it should behave
When we call the chat() method, we send a UserMessage to the agent. This message becomes part of the agent’s conversation context and triggers the whole reasoning process.
Behind the scenes, Neuron AI orchestrates everything:
- The agent analyzes the user’s question.
- It follows the rules and steps defined in the system prompt.
- It realizes that it needs the video transcript to answer.
- It decides to call the
get_transcriptiontool. - Because the
get_transcriptiontool requires the video identifier, the video id is extracted from the question (if it is not present or detected, the answer will be a message about the missing video id) - The Tool PHP code fetches the transcript.
- The agent receives the result (a long text with the transcription) from the Tool and uses it as context.
- Finally, it generates an answer that strictly respects the defined constraints.
The return value of chat() is a response object. By calling:
echo $response->getContent();
We get and print the final answer produced by the agent as a string.
From the outside, this appears to be a single method call. In reality, it is a full agentic workflow involving reasoning, tool usage, and controlled generation, all hidden behind a clean, idiomatic PHP API.
Why this is real Agentic AI
This approach goes far beyond simply sending prompts to a language model. What we have built here is a real agentic system, where the model is no longer a passive text generator but an active component inside a well-defined software architecture.
The model can autonomously decide when to call a tool, and it can utilize your PHP code as real, concrete capabilities rather than merely as suggestions. At the same time, it operates under strict and explicit rules defined by the system prompt, which constrain its behavior and guide its reasoning.
Thanks to the carefully designed context, background description, step-by-step instructions, and clearly defined output expectations, the agent is firmly grounded in the task it must perform. This dramatically reduces the risk of hallucinations and makes the whole system far more reliable, predictable, and suitable for real-world applications.
Final thoughts
The PHP ecosystem is clearly entering a new era. What only a short time ago looked like an experimental trend is quickly becoming a solid and mature part of everyday application architecture.
With tools like Neuron AI, Symfony AI, Prism, and soon Laravel AI, we can now build reliable, production-grade AI agents entirely in PHP, and, more importantly, integrate them naturally into our existing applications and workflows.
The agent we built in this article is intentionally simple, but it already shows the real power of this approach. From here, you could extend it in many directions: you could add memory to enable reasoning across multiple interactions, introduce multiple tools and let the agent choose between them, connect it to a vector database for semantic search and retrieval-augmented generation, or even orchestrate multiple agents that collaborate on more complex tasks.
This is not about replacing your application logic with an LLM. It is about augmenting your software with reasoning, decision-making, and controlled autonomy, while keeping PHP firmly in charge of execution and structure.
Agentic AI is not a distant future.
It is something you can start building today.
Top comments (0)