Learn how to use all 14 Ollama API endpoints with real-world examples, best practices, and production-ready insights.
Artificial Intelligence is rapidly moving from cloud-only environments to local deployments. Developers increasingly want privacy, lower latency, reduced costs, and complete control over their AI infrastructure.
This is where Ollama shines.
Ollama allows you to run powerful Large Language Models (LLMs) such as Llama, Gemma, Mistral, Qwen, DeepSeek, and many others directly on your local machine or server. Beyond running models, Ollama provides a robust REST API that enables developers to integrate AI capabilities into applications, automation workflows, chatbots, coding assistants, search engines, and enterprise systems.
In this guide, you'll learn all 14 Ollama API endpoints, understand when to use each one, and see practical examples that go beyond the official documentation.
What Is Ollama?
Ollama is a platform designed to simplify the deployment and execution of large language models locally.
Some advantages include:
- Privacy-focused AI processing
- No dependency on external AI providers
- Reduced API costs
- Fast local inference
- OpenAI-compatible API support
- Easy model management
By default, Ollama runs on:
http://localhost:11434
1. Generate Text
Endpoint
POST /api/generate
Purpose
Generates text from a single prompt.
Example
curl http://localhost:11434/api/generate \
-d '{
"model":"llama3",
"prompt":"Explain quantum computing in simple terms."
}'
Real Use Cases
- Content generation
- Code generation
- Documentation writing
- SEO article creation
- Email drafting
Expert Tip
Use /api/generate for one-shot tasks where conversation history is unnecessary. It consumes fewer resources than chat endpoints.
2. Chat Conversations
Endpoint
POST /api/chat
Purpose
Maintains conversational context.
Example
curl http://localhost:11434/api/chat \
-d '{
"model":"llama3",
"messages":[
{
"role":"user",
"content":"Create a Node.js REST API."
}
]
}'
Real Use Cases
- AI assistants
- Customer support bots
- Programming copilots
- Internal company chatbots
Expert Tip
For production chat applications, always store conversation history externally rather than relying solely on the model context window.
3. Generate Embeddings
Endpoint
POST /api/embeddings
Purpose
Converts text into numerical vectors.
Example
curl http://localhost:11434/api/embeddings \
-d '{
"model":"nomic-embed-text",
"prompt":"How does machine learning work?"
}'
Real Use Cases
- Semantic search
- RAG systems
- Recommendation engines
- Knowledge bases
Expert Tip
Embeddings are the foundation of modern Retrieval-Augmented Generation (RAG) systems.
4. List Installed Models
Endpoint
GET /api/tags
Purpose
Displays all downloaded models.
Example
curl http://localhost:11434/api/tags
Why It Matters
Useful for:
- Admin dashboards
- Deployment scripts
- Health checks
- Monitoring systems
5. Display Model Details
Endpoint
POST /api/show
Purpose
Returns detailed model information.
Example
curl http://localhost:11434/api/show \
-d '{
"name":"llama3"
}'
Useful Information Returned
- Parameters
- Quantization level
- Model size
- Context length
- Architecture details
Expert Tip
Use this endpoint to automatically validate model compatibility before deployment.
6. Download a Model
Endpoint
POST /api/pull
Purpose
Downloads a model from the Ollama registry.
Example
curl http://localhost:11434/api/pull \
-d '{
"name":"deepseek-r1"
}'
Automation Scenario
When deploying a new server:
startup.sh
can automatically pull required models before application startup.
7. Upload a Model
Endpoint
POST /api/push
Purpose
Publishes a model to a registry.
Example
curl http://localhost:11434/api/push \
-d '{
"name":"mycompany-assistant"
}'
Real Use Cases
- Internal AI distribution
- Team collaboration
- Enterprise model sharing
8. Create a Custom Model
Endpoint
POST /api/create
Purpose
Creates custom models from a Modelfile.
Example
curl http://localhost:11434/api/create \
-d '{
"name":"seo-expert",
"modelfile":"FROM llama3"
}'
Why This Is Powerful
You can:
- Add custom system prompts
- Create branded assistants
- Standardize AI behavior
- Build department-specific AI agents
9. Copy a Model
Endpoint
POST /api/copy
Purpose
Duplicates an existing model.
Example
curl http://localhost:11434/api/copy \
-d '{
"source":"llama3",
"destination":"llama3-backup"
}'
Common Use Cases
- Versioning
- Testing
- Experimentation
- Safe upgrades
10. Delete a Model
Endpoint
DELETE /api/delete
Purpose
Removes a model from local storage.
Example
curl -X DELETE http://localhost:11434/api/delete \
-d '{
"name":"old-model"
}'
Best Practice
Always verify model usage before deleting in shared environments.
11. View Running Models
Endpoint
GET /api/ps
Purpose
Shows models currently loaded in memory.
Example
curl http://localhost:11434/api/ps
Why It Matters
Helpful for:
- Memory monitoring
- Resource optimization
- Capacity planning
- Troubleshooting
Expert Tip
Large models may occupy several gigabytes of RAM even when idle.
12. Check Ollama Version
Endpoint
GET /api/version
Purpose
Returns the installed Ollama version.
Example
curl http://localhost:11434/api/version
Production Use
Useful for:
- CI/CD validation
- Compatibility checks
- Deployment audits
13. OpenAI-Compatible Chat Completions
Endpoint
POST /v1/chat/completions
Purpose
Provides OpenAI API compatibility.
Example
curl http://localhost:11434/v1/chat/completions \
-d '{
"model":"llama3",
"messages":[
{
"role":"user",
"content":"Write a Python function for sorting."
}
]
}'
Why Developers Love This
Applications built for OpenAI can often switch to Ollama with minimal code changes.
Real Benefits
- Lower costs
- Local execution
- Better privacy
- Vendor independence
14. OpenAI-Compatible Model Listing
Endpoint
GET /v1/models
Purpose
Lists available models using the OpenAI format.
Example
curl http://localhost:11434/v1/models
Best Use Cases
- AI gateways
- SDK integrations
- Multi-provider platforms
- Existing OpenAI-based projects
Building Production Systems with Ollama
Many developers stop at generating text, but modern AI applications usually combine several endpoints:
AI Chatbot
/api/chat
/api/show
/api/ps
RAG Search Engine
/api/embeddings
/api/chat
Internal AI Platform
/api/pull
/api/show
/api/chat
/api/delete
OpenAI Replacement
/v1/chat/completions
/v1/models
Combining endpoints intelligently is what separates a proof of concept from a production-ready AI solution.
Security Best Practices
Before exposing Ollama publicly:
- Place it behind a reverse proxy
- Enable authentication
- Limit access with firewalls
- Monitor resource consumption
- Restrict model management endpoints
- Use HTTPS in production
Never expose an unrestricted Ollama instance directly to the internet.
Performance Optimization Tips
To achieve better performance:
- Use quantized models when possible.
- Keep frequently used models loaded.
- Monitor RAM utilization.
- Cache embeddings.
- Use SSD storage.
- Separate inference and application servers for high traffic.
These practices can significantly reduce latency and improve throughput.
Conclusion
Ollama is much more than a tool for running local language models, it is a complete AI platform with endpoints covering text generation, conversational AI, embeddings, model lifecycle management, monitoring, and OpenAI compatibility.
Understanding all 14 endpoints allows developers to build sophisticated AI solutions without relying entirely on external providers. Whether you're creating a chatbot, a RAG-powered knowledge base, a coding assistant, or an enterprise AI platform, Ollama provides the building blocks needed to deploy AI locally, securely, and efficiently.
As organizations increasingly prioritize privacy, cost control, and infrastructure ownership, mastering the Ollama API is becoming a valuable skill for modern software engineers, DevOps professionals, and AI developers.
Top comments (0)