Working with Large Language Models (LLMs) like Google Gemini often presents a significant challenge: how do you effectively handle large context data without hitting token limits or incurring excessive costs? This article dives deep into a practical PHP implementation, the Gemini_Handler class, that demonstrates advanced strategies for managing extensive inputs and orchestrating multi-turn, agentic workflows with Gemini.
Whether you're generating complex code, detailed reports, or intricate UI designs, understanding how to feed large datasets and refine LLM outputs iteratively is crucial for robust AI applications. We'll break down the techniques used in this class, making them accessible even for beginners looking to level up their LLM integration skills.
The Gemini_Handler Class: An Overview for Gemini Large Context Handling
The Gemini_Handler class is designed to streamline interactions with the Google Gemini API. It encapsulates API key management, model configuration, and, most importantly, sophisticated methods for handling large inputs and implementing agentic generation patterns. Let's look at its core structure:
namespace FToE;
defined('ABSPATH') || exit;
class Gemini_Handler {
private $api_key;
private $model;
private $top_p;
private $temperature;
private $max_tokens;
private $thinking_budget;
public function __construct() {
$this->api_key = defined('GEMINI_API_KEY') ? GEMINI_API_KEY : '';
$this->model = 'gemini-2.5-flash';
$this->temperature = 0.7;
$this->max_tokens = 65535;
$this->top_p = 0.9;
$this->thinking_budget = 2048;
}
// ... rest of the class methods ...
}
Key properties like $api_key, $model, $temperature, and $max_tokens are set in the constructor, allowing for easy configuration of the Gemini API calls.
Efficient Gemini Large Context Handling with File API
One of the biggest hurdles with LLMs is the token limit. When you have very large text documents (like extensive JSON configurations) or high-resolution images, sending them directly in the prompt can quickly exceed limits and become expensive. The Gemini_Handler class intelligently addresses this using Google's Generative Language File API.
prepare_payload(): Orchestrating Input
This method is the gateway for all messages sent to Gemini. It iterates through the input messages and decides how each 'part' (text or image) should be handled.
private function prepare_payload(array $messages) {
$contents = [];
foreach ($messages as $message) {
$content = ['role' => $message['role']];
$parts = [];
// ... logic to assemble parts ...
foreach ($incoming_parts as $part) {
if (isset($part['text'])) {
$parts[] = $this->handle_large_text_part($part['text']);
}
// ... image handling and other types ...
elseif (isset($part['type']) && $part['type'] === 'image_url') {
$raw_b64 = str_replace('data:image/png;base64,', '', $part['image_url']['url']);
$parts[] = $this->handle_large_image_part($raw_b64);
}
// ... other inlineData handling ...
}
$content['parts'] = $parts;
$contents[] = $content;
}
return [
'contents' => $contents,
'generationConfig' => [
'temperature' => $this->temperature,
'maxOutputTokens' => $this->max_tokens
]
];
}
handle_large_text_part(): Smart Text Input
For text exceeding a certain size (e.g., 20KB, as per industry standards for efficiency), this method temporarily uploads the text content to Google's File API. This means instead of embedding the entire text in the prompt (which counts against token limits), Gemini receives a URI pointing to the uploaded file. This drastically reduces prompt token usage and allows for much larger textual inputs.
private function handle_large_text_part($text) {
if (strlen($text) > 20000) { // If text > 20KB
$temp_file = wp_upload_dir()['path'] . '/gemini_txt_' . uniqid() . '.txt';
file_put_contents($temp_file, $text);
try {
$file_info = $this->upload_file_to_llm($temp_file); // Uploads to Google's File API
@unlink($temp_file);
return [
'fileData' => [
'mimeType' => 'text/plain',
'fileUri' => $file_info['fileUri']
]
];
} catch (\Exception $e) {
error_log("Text File API upload failed: " . $e->getMessage());
}
}
return ['text' => $text]; // For smaller texts, send inline
}
handle_large_image_part(): Optimized Image Input
Similarly, images, especially high-resolution ones, can consume a lot of tokens. This method first compresses base64 image data to a maximum width (e.g., 1024px) to reduce its size. If even after compression, the image data is still large (e.g., >50KB), it's uploaded via the File API, again using a URI instead of inline data.
private function handle_large_image_part($b64_data) {
$compressed = $this->compress_base64_image($b64_data); // First, compress the image
if (strlen($compressed) > 50000) { // If compressed image > 50KB
$temp_file = wp_upload_dir()['path'] . '/gemini_img_' . uniqid() . '.png';
file_put_contents($temp_file, base64_decode($compressed));
try {
$file_info = $this->upload_file_to_llm($temp_file); // Uploads to Google's File API
@unlink($temp_file);
return [
'fileData' => [
'mimeType' => $file_info['mimeType'],
'fileUri' => $file_info['fileUri']
]
];
} catch (\Exception $e) {
error_log("Image File API upload failed: " . $e->getMessage());
}
}
return [
'inlineData' => [
'mimeType' => 'image/png',
'data' => $compressed // For smaller images, send inline
]
];
}
This dual-strategy (compression + File API upload) ensures that Gemini receives inputs optimally, saving tokens and improving performance, especially for visual-heavy tasks like converting design screenshots.
Agentic Workflows for Enhanced Generation with Gemini Large Context Handling
For complex generation tasks, a single LLM call might not be enough. A Gemini Agentic Workflow involves multiple turns, where the LLM's output from one step informs the next, allowing for iterative refinement and strategic guidance. The generate_content_with_agentic method orchestrates this process.
The Multi-Turn Process
- Phase 1: Strategic Context (
inject_strategic_context): In the initial turn, the system can inject a "strategic prompt" based on the desired output (e.g., a critique for redesign, or a framework-specific strategy). This guides the LLM from the start. - Phase 2: Generation with Continuation Logic (
execute_continuation_loop): The LLM generates content. If it hits itsmaxOutputTokenslimit, the system detects this (finish_reason === 'MAX_TOKENS') and prompts Gemini toCONTINUE_PRECISIONexactly from where it left off. This loop ensures even very long outputs are fully generated. - Phase 3: Final Review or Next Turn Preparation (
perform_final_quality_review,prepare_next_turn_payload): After generation, especially if multiple iterations are specified, the system can perform a "quality review." This might involve feeding the current draft back to the LLM with a critique, asking it to refine its output based on specific instructions. This iterative feedback loop is key to achieving high-quality, production-ready results.
generate_content_with_agentic(): Orchestrating Iterations
public function generate_content_with_agentic(array $messages = [], array $c_messages = [], array $el_messages = [], string $output, int $iterations) {
// ... initialization ...
for ($turn_count = 1; $turn_count <= $total_iterations; $turn_count++) {
try {
// Phase 1: Strategic Context
if ($turn_count == 1) {
$this->inject_strategic_context($payload, $c_messages, $el_messages, $output);
}
// Phase 2: Generation with Continuation Logic
$assembled = $this->execute_continuation_loop($payload);
$total_estimated_cost += $assembled['cost'];
$result = [/* ... */];
// Phase 3: Final Review or Next Turn Preparation
if ($turn_count == $total_iterations) { // Simplified condition for final review
return $this->perform_final_quality_review($result, $payload, $c_messages, $el_messages);
}
$this->prepare_next_turn_payload($payload, $result['content']);
} catch (\Exception $e) {
// ... error handling ...
}
}
return $result;
}
This agentic approach allows the LLM to tackle more complex problems by breaking them down into manageable steps and leveraging self-correction or external feedback.
Deeper Dive: Core Helper Functions
compress_base64_image(): Image Optimization in Detail
Before even considering the File API, images are compressed to save bandwidth and processing time. This helper function ensures images are resized to a max_width (e.g., 1024px) while preserving aspect ratio and transparency.
private function compress_base64_image($base64_string, $max_width = 1024) {
$image_data = base64_decode($base64_string);
$src = imagecreatefromstring($image_data);
if (!$src) return $base64_string;
$width = imagesx($src);
$height = imagesy($src);
if ($width > $max_width) {
$new_width = $max_width;
$new_height = floor($height * ($max_width / $width));
$tmp = imagecreatetruecolor($new_width, $new_height);
imagealphablending($tmp, false);
imagesavealpha($tmp, true);
imagecopyresampled($tmp, $src, 0, 0, 0, 0, $new_width, $new_height, $width, $height);
ob_start();
imagepng($tmp, null, 7);
$compressed_data = ob_get_clean();
imagedestroy($src);
imagedestroy($tmp);
return base64_encode($compressed_data);
}
imagedestroy($src);
return $base64_string;
}
API Interaction (make_api_request, process_response)
The make_api_request and process_response methods handle the actual communication with the Gemini API, including error checking, status code validation, and extracting the generated content. These functions encapsulate the network logic, making the main workflow cleaner and easier to manage.
Robust LLM Output Parsing (parse_content_llm, parse_structured_content_llm)
LLM outputs often require careful parsing to extract the desired content, especially when dealing with structured data like JSON or HTML that might be wrapped in markdown fences (e.g., json, html). The parse_content_llm and parse_structured_content_llm static methods are responsible for intelligently extracting this usable content, handling potential edge cases like malformed output or extra backticks. The parse_structured_content_llm function serves as an example of how to parse framework-specific structured output, in this case, a UI builder's JSON.
private static function parse_content_llm($content) {
if (is_array($content) || is_object($content)) return $content;
if (strpos($content, '```
json') !== false) {
preg_match('/
```json\s*([\s\S]*?)\s*```
/', $content, $matches);
$content = $matches[1] ?? $content;
} elseif (strpos($content, '
```html') !== false) {
preg_match('/```
html\s*([\s\S]*?)\s*
```/', $content, $matches);
$content = $matches[1] ?? $content;
}
$content = preg_replace('/^`+|`+$/', '', $content);
if (preg_match('/^\s*[\{\[]/', $content)) {
$decoded = json_decode($content, true);
if (json_last_error() === JSON_ERROR_NONE) return $decoded;
}
return trim($content);
}
public static function parse_ele_content_llm($content) {
if (is_array($content)) return isset($content['ele_json']) ? $content['ele_json'] : $content;
if (preg_match('/```
ele_json\s*([\s\S]*?)\s*\n
```/s', $content, $matches)) {
$json_content = trim($matches[1]);
$json_content = preg_replace('/^[\s]*```
(?:ele_json)?[\s]*/m', '', $json_content);
$json_content = preg_replace('/
```[\s]*$/m', '', $json_content);
$decoded = json_decode($json_content, true);
if (json_last_error() === JSON_ERROR_NONE) return $decoded;
}
if (preg_match('/ele_json\s*(\{[\s\S]*?\})\s*action_response/s', $content, $matches)) {
$json_content = trim($matches[1]);
$json_content = preg_replace('/^[\s]*\{+/', '{', $json_content);
$json_content = preg_replace('/\}+[\s]*$/', '}', $json_content);
$decoded = json_decode($json_content, true);
if (json_last_error() === JSON_ERROR_NONE) return $decoded;
}
$decoded = json_decode($content, true);
return (json_last_error() === JSON_ERROR_NONE) ? $decoded : $content;
}
Practical Example: Implementing an Agentic Generation
Let's imagine a scenario where you want to generate a complex structured UI component layout from a high-level description and an initial image, potentially refining it over several turns.
// Assuming you have a Gemini_Handler instance and necessary messages prepared
$geminiHandler = new \FToE\Gemini_Handler();
// Example system prompt for UI component generation
$systemPrompt = 'You are an expert UI designer. Generate production-ready structured UI component data.';
// Initial user message with a placeholder image and description
$initialMessages = [
['role' => 'user', 'content' => $systemPrompt],
['role' => 'user', 'content' => [
['type' => 'text', 'text' => 'Generate a structured UI section for a modern tech landing page hero section. It should include a catchy headline, a short description, and a call-to-action button. Incorporate elements from the provided design screenshot.'],
['type' => 'image_url', 'image_url' => ['url' => 'data:image/png;base64,PLACEHOLDER_BASE64_IMAGE_DATA']]
]]
];
// Placeholder for framework-specific strategic messages (e.g., design system guidelines)
$frameworkStrategyMessages = [
['role' => 'user', 'content' => 'Ensure all sections use flexbox containers. Prioritize mobile responsiveness. Use global colors if applicable.']
];
// The 'sample_ui_data' string here is an internal identifier for the specific parsing logic
// used within the class. Replace with your framework's identifier if adapting.
$outputType = 'sample_ui_data';
$iterations = 3; // Perform 3 turns of generation and refinement
try {
$finalResult = $geminiHandler->generate_content_with_agentic(
$initialMessages,
[], // No redesign critique messages for this example
$frameworkStrategyMessages,
$outputType,
$iterations
);
echo "\nFinal Generated Content:\n";
print_r($finalResult['content']);
echo "\nEstimated Total Cost: $" . $finalResult['usage']['estimated_cost'] . "\n";
} catch (\Exception $e) {
echo "Error: " . $e->getMessage();
}
This example showcases how you'd initiate an agentic generation. The generate_content_with_agentic method would then internally manage the multi-turn conversation, ensuring large inputs are handled efficiently and outputs are iteratively refined based on the defined strategy.
Key Takeaways
- Token Management is Key: For large inputs (text or images), leveraging Google's File API to upload content and reference it via URIs is crucial for staying within token limits and optimizing costs.
- Agentic Workflows Enhance Quality: Multi-turn conversations with strategic context, continuation logic, and iterative review allow LLMs to produce higher-quality, more complex, and refined outputs.
- Image Compression is a Must: Before even considering File API uploads, client-side image compression reduces data size, improving overall efficiency.
- Robust Parsing is Essential: LLM outputs often require careful parsing to extract the desired content, especially when dealing with structured data like JSON or HTML that might be wrapped in markdown fences.
By implementing these strategies, developers can push the boundaries of what's possible with Gemini for large context data, building more sophisticated and reliable AI-powered applications.
What are your thoughts?
Have you implemented similar strategies for Gemini large context handling or agentic workflows in your projects? Share your experiences, tips, or challenges in the comments below! If you found this article helpful, consider following for more in-depth analyses of AI and development topics.
Top comments (0)