The Challenge: Moving Beyond Cloud Dependencies (And My Hatred of Making Slides)
Let me be honest - this project started because I absolutely despise creating PowerPoint presentations. The tedious process of outlining content, formatting slides, finding relevant images, and making everything look professional drives me crazy. I'd rather spend hours coding than 30 minutes making slides.
So naturally, I thought: "What if I could just talk to a device and have it generate the entire presentation for me?"
But here's where it gets interesting. Most AI applications today rely on cloud inference - sending data to remote servers, waiting for responses, and dealing with latency, costs, and privacy concerns. I wanted to explore whether modern edge hardware could handle something more ambitious: a complete multi-agent AI workflow running entirely local.
The goal became twofold: solve my personal PowerPoint problem AND push the boundaries of what's possible on edge hardware. Create a voice-controlled presentation generator that could understand speech, orchestrate multiple AI agents, generate structured content, and synthesize speech responses - all on a single edge device, with zero internet dependency.
Demo: See It In Action
Before diving into the technical details, here's the system working end-to-end:
The complete pipeline: "Create slides on electrical engineering" → AI processing → formatted presentation with detailed content, all running locally on Jetson Orin Nano.
Prerequisites and Setup
Installing CAMEL-AI Framework
# Install CAMEL-AI with all dependencies
pip install camel-ai[all]
# Or minimal installation
pip install camel-ai
# Additional dependencies for this project
pip install python-pptx faster-whisper sounddevice soundfile TTS
Setting up llama.cpp for Local Inference
# Clone and build llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# For Jetson Orin (ARM64 with CUDA)
mkdir build
cd build
cmake .. -DLLAMA_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=87
make -j$(nproc)
# Download your model (example: Qwen 2.5 7B)
wget https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/resolve/main/qwen2.5-7b-instruct-q4_k_m.gguf
# Start the server
./build/bin/llama-server --model qwen2.5-7b-instruct-q4_k_m.gguf \
--port 8000 \
--host 0.0.0.0 \
--ctx-size 4096 \
--threads 4
System Service Configuration
For production deployment, create a systemd service:
# /etc/systemd/system/llama-server.service
[Unit]
Description=Local LLM Server (Qwen 2.5 7B on llama.cpp)
After=network.target
[Service]
Type=simple
User=your_user
WorkingDirectory=/home/your_user/llama.cpp
ExecStart=/home/your_user/llama.cpp/build/bin/llama-server \
--model /home/your_user/models/qwen2.5-7b-instruct-q4_k_m.gguf \
--port 8000 \
--host 0.0.0.0 \
--ctx-size 4096 \
--threads 4
Restart=always
[Install]
WantedBy=multi-user.target
# Enable and start the service
sudo systemctl enable llama-server.service
sudo systemctl start llama-server.service
Initializing CAMEL-AI Components
Setting up the multi-agent framework requires initializing the model, agents, and toolkits:
from camel.agents import ChatAgent
from camel.models import ModelFactory
from camel.messages import BaseMessage
from camel.toolkits import PPTXToolkit
from camel.types import RoleType, ModelPlatformType
# Initialize the model factory pointing to your local llama.cpp server
model = ModelFactory.create(
model_platform=ModelPlatformType.OLLAMA, # For llama.cpp compatibility
model_type="Qwen 2.5 7B",
url="http://localhost:8000/v1",
model_config_dict={
"temperature": 0.1,
"max_tokens": 512,
"top_p": 0.9,
}
)
# Load the PPTXToolkit for presentation generation
ppt_toolkit = PPTXToolkit()
tools = ppt_toolkit.get_tools()
# Create specialized agents
conversation_agent = ChatAgent(
system_message=BaseMessage(
role_name="assistant",
role_type=RoleType.ASSISTANT,
content="""You are Jetson, a helpful AI assistant that can have conversations and create PowerPoint presentations when asked.
When users ask you to create slides or presentations, tell them you'll create slides for them.
For regular conversation, respond naturally and helpfully.""",
meta_dict={}
),
model=model,
tools=[] # No tools needed for general conversation
)
slide_agent = ChatAgent(
system_message=BaseMessage(
role_name="assistant",
role_type=RoleType.ASSISTANT,
content="""You are a PowerPoint presentation assistant with access to presentation creation tools.
When asked to create slides about a topic, follow these steps:
Step 1: Create a new presentation
- Use the create_presentation function to start a new PowerPoint presentation
Step 2: Add multiple informative slides
- Use add_slide function for each slide
- Create slides with clear, descriptive titles
- Include bullet-point content that is educational and well-structured
- Make sure content is relevant to the requested topic
- Aim for 4-6 slides per presentation
Step 3: Save the presentation
- Use save_presentation function to save the file
- Save with a descriptive filename ending in .pptx
Example workflow for "Introduction to AI":
1. Create a new presentation
2. Add slide: "Introduction to Artificial Intelligence" with overview content
3. Add slide: "Types of AI" with different AI categories
4. Add slide: "Key Technologies" with AI technologies
5. Add slide: "Applications" with real-world uses
6. Add slide: "Future of AI" with trends and outlook
7. Save the presentation as "ai_introduction.pptx"
Be direct and use the available tools step by step. Focus on creating educational, well-organized content.""",
meta_dict={}
),
model=model,
tools=tools # PPTXToolkit functions available
)
# Agent usage example
def handle_request(user_input):
if "slides" in user_input.lower():
# Route to slide generation agent
response = slide_agent.step(BaseMessage(
role_name="user",
role_type=RoleType.USER,
content=user_input,
meta_dict={}
))
else:
# Route to conversation agent
response = conversation_agent.step(BaseMessage(
role_name="user",
role_type=RoleType.USER,
content=user_input,
meta_dict={}
))
return response.msg.content
Key CAMEL-AI Concepts:
- ModelFactory: Creates model instances with specific configurations
- ChatAgent: Individual agents with specialized roles and tools
- BaseMessage: Standardized message format for agent communication
- Toolkits: Pre-built tool collections (PPTXToolkit provides PowerPoint functions)
- Agent Orchestration: Route requests to appropriate specialized agents
Architecture Overview
Model Evaluation and Selection
Testing revealed significant differences in edge deployment viability:
Mistral 7B Instruct Q4 GGUF
# Typical output from Mistral during function calls
{
"function": "create_slide",
"parameters": {
"title": "Introduction",
"content": "Overview of the topic..." # Often malformed JSON
Issues encountered:
- Inconsistent JSON formatting breaking CAMEL's function calling
- Good conversational ability but poor structured output reliability
- Memory usage: ~4.2GB for model weights
Meta Llama 3.1 8B Instruct Q4 GGUF
Better function calling compliance but resource constraints became apparent:
# Memory pressure observed
Model RAM: ~5.1GB
Whisper: ~1GB
TTS Models: ~800MB
System overhead: ~1.2GB
Total: 8.1GB (exceeding available memory)
Result: Frequent OOM crashes during multi-modal operations.
Qwen 2.5 7B Instruct Q4 GGUF
The optimal balance for this hardware configuration:
# Consistent structured output
{
"name": "add_slide",
"arguments": {
"title": "Technical Implementation",
"content": "• Core architecture components\n• Integration patterns\n• Performance considerations"
}
}
Performance metrics:
- Model RAM: ~4.0GB
- Inference latency: 2-4 seconds for typical responses
- Function calling success rate: >95%
- Memory efficiency allowing concurrent model execution
Multi-Agent Architecture Implementation
CAMEL-AI's agent separation proved crucial for system reliability:
# Agent initialization
conversation_agent = ChatAgent(
system_message=conversation_prompt,
model=model,
tools=[] # No tools - pure conversation
)
slide_agent = ChatAgent(
system_message=slide_generation_prompt,
model=model,
tools=pptx_toolkit.get_tools() # Specialized tools
)
This architecture provides:
- Isolation: Agent failures don't cascade
- Specialization: Each agent optimized for specific tasks
- Maintainability: Clear separation of concerns
- Extensibility: Easy to add new agent types
Performance Analysis
Successful Components
Whisper STT Performance:
- Accuracy: 95%+ in varied noise conditions
- Latency: ~1-2 seconds for 15-second audio clips
- Memory footprint: Stable at ~1GB
- CPU utilization: Efficient ARM64 optimization
CAMEL Framework:
- Agent orchestration: Reliable switching between conversation and task execution
- PPTXToolkit integration: Seamless PowerPoint generation
- Error handling: Graceful fallbacks when function calls fail
Performance Bottlenecks
TTS Synthesis:
The critical bottleneck emerged in text-to-speech generation:
Average TTS generation times:
- Short responses (5-10 words): 8-12 seconds
- Medium responses (20-30 words): 15-20 seconds
- Long responses (50+ words): 25-35 seconds
Root causes:
- Tacotron2 model not optimized for ARM64
- Sequential processing without batching
- Memory bandwidth limitations during vocoder inference
Model Inference Scaling:
Memory usage scaling:
Base system: 1.2GB
+ Whisper: 2.2GB (+1GB)
+ LLM (7B Q4): 6.4GB (+4.2GB)
+ TTS models: 7.8GB (+1.4GB)
Peak usage: 7.8GB/8GB (97.5% utilization)
Technical Insights and Optimizations
Memory Management
# Implemented model lifecycle management
def cleanup_unused_models():
if not current_tts_active:
del tts_model
torch.cuda.empty_cache()
Prompt Engineering for Edge
Complex prompts caused timeouts. Optimization required:
# Before: Complex 500+ token prompt → 3+ minute timeouts
# After: Simplified 150 token prompt → 30-60 second responses
simplified_prompt = f"""Create 5 slides about: {topic}
Keep each slide to 3-4 bullet points.
Focus on core concepts only."""
Deployment Considerations
Resource Allocation Strategy
# Jetson power mode optimization
sudo nvpmodel -m 0 # Max performance mode
sudo jetson_clocks # Lock clocks to maximum
Model Quantization Impact
Q4 quantization provided the optimal balance:
- Size reduction: 7B model from ~28GB to ~4GB
- Quality retention: Minimal impact on structured output
- Inference speed: 2x improvement over FP16
Results and Practical Applications
The system successfully demonstrated:
- Complete offline operation: No internet dependency after setup
- Multi-modal interaction: Speech input to document output
- Real-world utility: Generated presentations with meaningful content
- Edge viability: Practical deployment on consumer hardware
Example workflow timing:
User speech: "Create slides on quantum computing"
→ Whisper transcription: 2s
→ Agent orchestration: 5s
→ Content generation: ~180s
→ PowerPoint creation: ~120s
→ TTS response: 10s
Total pipeline: ~317 seconds
Future Optimization Directions
- TTS Acceleration: Investigate lightweight models or hardware acceleration
- Model Distillation: Train smaller specialized models for specific tasks
- Memory Optimization: Implement dynamic model loading/unloading
- Quantization Research: Explore INT8 or mixed-precision inference
Conclusion
Multi-agent AI workflows are viable on edge hardware, but require careful architecture decisions and model selection. The combination of CAMEL-AI's orchestration capabilities with optimized local inference demonstrates that sophisticated AI applications can run independently of cloud infrastructure.
The key insight: edge AI success depends more on system integration and optimization than raw computational power. With thoughtful design, even modest hardware can deliver compelling AI experiences.
Code and detailed implementation notes available on request. Always interested in discussing edge AI architectures and optimization strategies.
Top comments (2)
Love this blog! Very Insightful
Thank you @thenomadevel! 🙏🏼🙏🏼