DEV Community

Cover image for Building AI Agents in React Native With Gemini & Llama APIs
Sherry Walker
Sherry Walker

Posted on

Building AI Agents in React Native With Gemini & Llama APIs

Mobile apps powered by AI agents are transforming user experiences across industries. By 2025, 27.2% of professional developers actively use React Native, making it the prime framework for integrating intelligent agents that can reason, act, and learn.

This guide shows you how to build AI agents in React Native using Gemini and Llama APIs. You'll discover setup steps, code implementations, and performance optimization strategies that work on both iOS and Android.

Understanding AI Agents in Mobile Development

AI agents are autonomous software programs that analyze data, make decisions, and execute tasks with minimal human intervention. Unlike simple chatbots, these agents learn from interactions and optimize their behavior over time.

React Native provides the perfect foundation for AI agents because it supports cross-platform development with a single codebase. You write your agent logic once and deploy it to millions of devices running iOS or Android.

Why React Native for AI Agents

The framework's component-based architecture makes it easy for large language models to understand and generate code. Tools like Expo standardize project scaffolding, resulting in cleaner AI-assisted development workflows.

React Native's JavaScript ecosystem offers access to powerful AI libraries. You can integrate TensorFlow.js for on-device machine learning, connect to cloud APIs like Gemini for advanced reasoning, or run Llama models locally using llama.rn bindings.

Key Capabilities of Mobile AI Agents

Mobile AI agents excel at three core functions. First, they perceive information from user behavior, app analytics, and system data. Second, they reason by analyzing patterns and generating insights. Third, they take action by executing tasks like code generation, interface design, or personalized recommendations.

These capabilities enable agents to handle complex workflows. A mobile app development project in Ohio recently used AI agents to automate 70% of repetitive coding tasks, cutting development time from weeks to days.

Setting Up Gemini API Integration

Google's Gemini API brings multimodal AI capabilities to React Native apps. As of January 2025, the Gemini 2.5 Flash model costs $0.30 per million input tokens and supports a 1 million token context window.

Installing Required Dependencies

Start by installing the official Google Generative AI SDK. Open your terminal in your React Native project directory and run this command.

npm install @google/generative-ai

The package provides methods to initialize models, generate content, and stream responses. It works seamlessly with React Native's JavaScript runtime without requiring native module bridges.

Obtaining Your API Key

Visit Google AI Studio at aistudio.google.com and sign in with your Google account. Click "Get API key" in the sidebar and create a new key for your project.

Store this key securely in environment variables. Never commit API keys directly to your codebase as they provide access to your paid usage tier.

Initializing Gemini in React Native

Create a configuration file to set up the Gemini client. Import the SDK and initialize it with your API key.

import { GoogleGenerativeAI } from '@google/generative-ai'; const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY); const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });

This setup gives you access to Gemini's text generation, vision understanding, and function calling capabilities. The model instance handles request formatting and response parsing automatically.

Building Your First Agent with Gemini

Design an agent that processes user queries and generates contextual responses. Create a function that accepts user input and returns AI-generated content.

const runAgent = async (userPrompt) => { try { const result = await model.generateContent(userPrompt); const response = result.response; const text = response.text(); return text; } catch (error) { console.error('Agent error:', error); return 'Failed to process request'; } };

This basic agent handles single-turn conversations. For multi-turn dialogues, use the chat session API that maintains conversation history across multiple interactions.

Implementing Streaming Responses

Streaming improves perceived performance by displaying responses as they generate. Users see text appearing word by word instead of waiting for complete responses.

const streamAgent = async (userPrompt, onChunk) => { const result = await model.generateContentStream(userPrompt); for await (const chunk of result.stream) { const chunkText = chunk.text(); onChunk(chunkText); } };

Pass a callback function to handle each text chunk. Update your UI state with each chunk to create a typewriter effect that engages users during generation.

Integrating Llama Models for On-Device AI

Llama models run directly on mobile devices without requiring internet connectivity. This approach protects user privacy and eliminates API costs for production apps.

Installing llama.rn

The llama.rn library provides React Native bindings for llama.cpp, enabling efficient inference of GGUF-formatted models on mobile hardware.

npm install llama.rn react-native-fs

For iOS, run npx pod-install after installation. Android requires adding ProGuard rules if minification is enabled in your build configuration.

Downloading and Loading Models

Small models with 1-3 billion parameters work best on mobile devices. Popular choices include Llama 3.2 1B Instruct, DeepSeek R1 Distill Qwen 1.5B, and Qwen 2 0.5B Instruct.

Download GGUF models from Hugging Face and store them in your app's document directory. Use react-native-fs to manage file operations.

import RNFS from 'react-native-fs'; import { initLlama } from 'llama.rn'; const downloadModel = async (url, fileName) => { const filePath = `${RNFS.DocumentDirectoryPath}/${fileName}`; const download = RNFS.downloadFile({ fromUrl: url, toFile: filePath, progressDivider: 10, begin: (res) => console.log('Download started'), progress: (res) => { const progress = (res.bytesWritten / res.contentLength) * 100; console.log(`Progress: ${progress.toFixed(2)}%`); } }); await download.promise; return filePath; };

This function downloads models with progress tracking. Users see download status and can cancel if needed.

Initializing Local Llama Models

Load the downloaded model using llama.rn's initialization function. Configure context size, batch processing, and GPU acceleration based on device capabilities.

const initializeModel = async (modelPath) => { const context = await initLlama({ model: modelPath, n_ctx: 2048, n_batch: 512, n_gpu_layers: 0 }); return context; };

Set n_gpu_layers to a positive value to offload computation to the GPU on supported Android devices with Qualcomm Adreno 700+ chips.

Creating an Offline Agent

Build an agent that generates responses entirely on-device. This agent works without internet access and processes sensitive data locally.

const offlineAgent = async (context, userMessage) => { const completion = await context.completion({ prompt: userMessage, n_predict: 256, stop: ['User:', '\n\n'], temperature: 0.7 }); return completion.text; };

Adjust temperature to control creativity. Lower values produce consistent responses while higher values generate more varied outputs.

Comparing Gemini and Llama for Mobile Agents

Choosing between Gemini and Llama depends on your app's requirements. Each approach offers distinct advantages for different use cases.

Performance and Speed

Gemini provides fast responses through cloud infrastructure. Typical latency ranges from 200 to 500 milliseconds depending on network conditions and model size.

Local Llama models respond in under 50 milliseconds because data never leaves the device. This speed advantage matters for real-time features like autocomplete or instant translations.

Cost Considerations

Gemini charges per token with prices starting at $0.30 per million tokens for the Flash model. Heavy usage can cost hundreds monthly for production apps with thousands of users.

Llama models require upfront development time but eliminate ongoing API costs. Users download models once and run unlimited inference locally.

Privacy and Data Security

Cloud APIs process data on remote servers. This raises concerns for apps handling medical records, financial information, or personal messages.

On-device models keep all data local. Photos, voice recordings, and text never leave the device, providing inherent privacy protection.

Model Capabilities

Gemini excels at complex reasoning tasks. The model handles multimodal inputs including text, images, audio, and video in a single unified process.

Llama models on mobile focus on text generation and basic reasoning. They work well for chat interfaces, content summarization, and simple question answering.

Building Advanced Agent Features

Production AI agents need more than basic text generation. Add function calling, context management, and error handling to create robust applications.

Implementing Function Calling

Function calling lets agents execute specific actions in your app. The agent decides when to call functions based on user requests.

Define available functions with clear descriptions. Gemini analyzes user intent and suggests appropriate function calls with parameters.

const functions = [ { name: 'getWeather', description: 'Get current weather for a location', parameters: { type: 'object', properties: { location: { type: 'string', description: 'City name' } }, required: ['location'] } } ]; const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash", tools: [{ functionDeclarations: functions }] });

Parse the model's response to detect function calls. Execute the requested function and return results to the agent for final response generation.

Managing Conversation Context

Agents need memory to maintain coherent multi-turn conversations. Store conversation history and include it in each new request.

const chat = model.startChat({ history: [ { role: 'user', parts: 'Hello, I need help' }, { role: 'model', parts: 'Hello! How can I assist you today?' } ] }); const result = await chat.sendMessage('Tell me about React Native');

Limit history length to prevent exceeding context windows. Keep the most recent exchanges and summarize older conversations to save tokens.

Error Handling and Retry Logic

Network failures and API rate limits require robust error handling. Implement exponential backoff for retries and graceful degradation when services fail.

const retryWithBackoff = async (fn, maxRetries = 3) => { for (let i = 0; i < maxRetries; i++) { try { return await fn(); } catch (error) { if (i === maxRetries - 1) throw error; const delay = Math.pow(2, i) * 1000; await new Promise(resolve => setTimeout(resolve, delay)); } } };

Show users helpful error messages when agents fail. Offer offline alternatives or cached responses to maintain app functionality.

Optimizing Agent Performance

Production apps require optimization to balance performance, cost, and user experience. Apply these strategies to improve your AI agents.

Reducing Token Usage

Every token costs money with cloud APIs. Craft concise prompts that provide necessary context without excessive detail.

Use system instructions to set agent behavior once instead of repeating guidelines in every prompt. This approach saves tokens across all requests.

Implementing Response Caching

Cache frequently requested responses to avoid repeated API calls. Store common queries and their responses in local storage.

import AsyncStorage from '@react-native-async-storage/async-storage'; const getCachedResponse = async (key) => { const cached = await AsyncStorage.getItem(key); return cached ? JSON.parse(cached) : null; }; const cacheResponse = async (key, data) => { await AsyncStorage.setItem(key, JSON.stringify(data)); };

Set expiration times for cached data. Refresh stale entries periodically to ensure users receive current information.

Choosing the Right Model Size

Gemini offers multiple model variants with different capabilities and costs. Flash models cost less but provide strong performance for most tasks.

Start with smaller models and upgrade only when necessary. Many applications work perfectly with Flash instead of requiring Pro-level reasoning.

Batching Requests

Process multiple prompts together when possible. Batch API endpoints offer 50% discounts compared to standard requests.

Collect user inputs over time and submit them as a batch during off-peak hours. This strategy works well for non-urgent tasks like content analysis or data processing.

Security Best Practices for AI Agents

AI agents handle sensitive data and make autonomous decisions. Implement security measures to protect users and prevent misuse.

Protecting API Keys

Never store API keys in your React Native app code. Use backend services to proxy API requests and keep credentials server-side.

For development, store keys in environment variables using react-native-config. Load them at build time without exposing values in the final app bundle.

Input Validation and Sanitization

Validate user inputs before sending them to AI models. Block malicious prompts that attempt prompt injection or jailbreaking.

const validateInput = (text) => { if (text.length > 5000) return false; const bannedPatterns = ['ignore previous', 'forget instructions']; return !bannedPatterns.some(pattern => text.toLowerCase().includes(pattern) ); };

This basic validation prevents common attack vectors. Expand checks based on your app's specific security requirements.

Rate Limiting User Requests

Implement client-side rate limits to prevent abuse and control costs. Track request counts per user and enforce daily or hourly limits.

Show users their remaining quota. Reset limits after appropriate time periods and offer premium tiers with higher allowances.

Real-World Applications of AI Agents

AI agents power diverse use cases across industries. These examples demonstrate practical implementations in production apps.

E-commerce Personal Shopping Assistants

Retail apps use agents to recommend products based on browsing history and preferences. The agent analyzes user style, budget constraints, and past purchases to suggest relevant items.

Users describe what they're looking for in natural language. The agent searches inventory, compares options, and generates personalized recommendations with explanations.

Healthcare Symptom Checkers

Medical apps deploy agents that gather symptom information through conversational interfaces. The agent asks follow-up questions based on reported symptoms.

These agents run on-device to protect patient privacy. Health data never leaves the phone while users receive immediate preliminary assessments.

Financial Advisory Agents

Banking apps integrate agents that analyze spending patterns and suggest budget adjustments. The agent monitors transactions and identifies opportunities to save money.

Users ask questions about their finances in plain English. The agent retrieves account data, performs calculations, and explains recommendations in simple terms.

Educational Tutoring Systems

Learning apps create personalized tutors that adapt to student progress. The agent identifies knowledge gaps and generates practice problems at appropriate difficulty levels.

Students work through problems with real-time hints and explanations. The agent tracks performance over time and adjusts teaching strategies accordingly.

Testing and Debugging AI Agents

AI agents behave unpredictably compared to traditional code. Establish testing strategies that catch issues before users encounter them.

Creating Test Scenarios

Build a suite of test prompts covering common use cases and edge cases. Include ambiguous queries, multi-step requests, and error conditions.

Document expected behaviors for each test case. Compare actual agent responses against expectations to identify regressions.

Monitoring Agent Responses

Log all agent interactions in development and staging environments. Review logs regularly to spot concerning patterns or failures.

const logInteraction = async (prompt, response, metadata) => { await analytics.track('agent_interaction', { prompt: prompt.substring(0, 100), responseLength: response.length, timestamp: Date.now(), ...metadata }); };

Track metrics like response time, token usage, and error rates. Set alerts for unusual patterns that might indicate problems.

A/B Testing Agent Configurations

Test different models, prompts, and parameters to find optimal configurations. Split users into groups and compare satisfaction scores.

Measure user engagement metrics like session length, retention, and task completion rates. Choose configurations that deliver the best user experience.

Future Trends in Mobile AI Agents

AI agent technology evolves rapidly. These emerging trends will shape mobile development in 2026 and beyond.

Multimodal Agents

Future agents will process combinations of text, images, audio, and video simultaneously. Users will point their camera at objects and ask questions about what they see.

Gemini already supports multimodal inputs. Expect more powerful on-device models that handle images and audio without cloud processing.

Improved On-Device Models

Smartphone chips gain AI acceleration capabilities with each generation. Newer devices run larger models faster while consuming less battery.

Gemini Nano and similar on-device models will handle complex reasoning locally. This shift reduces latency and costs while improving privacy.

Agent Collaboration

Multiple specialized agents will work together on complex tasks. One agent handles research, another writes content, and a third edits and refines.

These multi-agent systems distribute work efficiently and produce better results than single agents.

Frequently Asked Questions

What's the difference between Gemini Flash and Pro models?

Gemini Flash costs $0.30 per million input tokens and handles most tasks well, including chat, summarization, and basic reasoning. Gemini Pro costs $1.25 per million tokens and excels at complex multi-step reasoning and coding tasks.

Flash delivers responses faster with lower latency. Choose Flash for real-time features and Pro for tasks requiring deep analysis or code generation.

Can I run Llama models on older mobile devices?

Yes, but performance varies significantly. Devices with 4GB+ RAM handle 1B parameter models smoothly. Phones with 2-3GB RAM struggle with even small models.

Test on your target devices before committing. Quantized models like Q4 or Q5 variants run faster on limited hardware than full precision versions.

How do I handle API rate limits from Gemini?

Free tier users face limits of 15 requests per minute and 1,500 requests per day. Implement request queuing to stay within bounds automatically.

Upgrade to the paid tier for higher limits of 360 requests per minute. Monitor usage through Google Cloud Console and set alerts before hitting thresholds.

Are AI agent conversations private in React Native apps?

Privacy depends on your implementation. Cloud APIs like Gemini process data on Google's servers. On-device Llama models keep everything local.

For paid Gemini tiers, your data isn't used to improve Google products. Free tier usage may contribute to model training.

What's the average response time for mobile AI agents?

Cloud-based agents respond in 200-500 milliseconds depending on network speed and prompt complexity. On-device models generate responses in under 50 milliseconds.

Streaming reduces perceived latency by showing partial responses immediately. Users see text appearing instead of waiting for complete generation.

How much does it cost to run a production AI agent app?

Costs depend on usage patterns. Apps with moderate traffic spend $50-200 monthly using Gemini Flash. Heavy usage can exceed $500 monthly.

On-device models eliminate API costs but require development time and increased app size. Calculate break-even points based on your expected user base.

Can AI agents work completely offline?

Yes, using on-device models like Llama through llama.rn. Download models during app installation or first launch when users have WiFi.

Offline agents handle text generation, question answering, and basic reasoning without internet. They can't access current information or use cloud-based tools.

Making Your Decision

Building AI agents in React Native with Gemini and Llama APIs opens powerful possibilities for mobile apps. Cloud-based Gemini provides advanced reasoning and multimodal capabilities at competitive prices. On-device Llama models deliver privacy and eliminate API costs.

Choose cloud APIs for complex tasks requiring current information and sophisticated reasoning. Pick local models for privacy-sensitive applications and cost-conscious deployments.

Start with small experiments using free tiers. Test both approaches with real user scenarios. Measure performance, costs, and user satisfaction before committing to production architecture.

Top comments (0)