Ollama’s Global Reach: A Look at Deployment Trends and Model Choices

#ai #llm #deepseek #llama3

By Ryan, Tenthe AI(Report Source: Tenthe AI) Data as of: April 24, 2025

Ollama is a popular open-source tool that makes it easy for developers and researchers to run, create, and share large language models (LLMs) right on their own computers or servers. This report dives into data from over 174,500 Ollama instances worldwide to uncover trends in how it’s being deployed, what models people prefer, where it’s most used, and some network characteristics.

(Note: Some statistics cover all 174,590 instances, while model-specific data comes from the 41,021 instances whose APIs were accessible.)

Quick Summary: Key Findings
Out of ~174,590 Ollama instances identified, 41,021 had publicly accessible APIs, meaning we could see what models they were running. That’s about a 24.18% accessibility rate.
Top Countries: The United States and China lead in Ollama deployments.
Hosting: Most Ollama instances are hosted on cloud platforms, especially AWS, Alibaba Cloud, and Tencent Cloud.
Popular Models: llama3, deepseek-r1, mistral, and qwen series models are favorites. llama3:latest and deepseek-r1:latest are the most deployed.
Model Size: Models with 7 billion to 8 billion (7B-8B) parameters are the most common choice.
Efficiency: 4-bit quantized models (like Q4_K_M and Q4_0) are widely used for their balance of performance and resource-saving.
Network: Most instances use the default port 11434 and are exposed via HTTP.
Where in the World is Ollama?
Ollama has a global footprint, but some areas show much higher concentrations.

Top 20 Countries: The United States (29,195 unique IPs) and China (16,464 unique IPs) are far ahead. They are followed by Japan, Germany, United Kingdom, India, and Singapore.

Top Cities Globally: Leading cities include Ashburn (US), Portland (US), Singapore, Frankfurt am Main (Germany), and Beijing (China).
US Hotspots: Ashburn, Portland, and Columbus.

China Hotspots: Beijing, Hangzhou, and Shanghai.

Generally, deployments are concentrated in tech hubs and areas with strong data center infrastructure.

What AI Models Are People Using?
Ollama users have access to a wide range of open-source LLMs. Here’s what’s popular:

Model Names:
llama3:latest (12,659 unique IP deployments)
deepseek-r1:latest (12,572 unique IP deployments)
mistral:latest (11,163 unique IP deployments)
qwen:latest (9,868 unique IP deployments)
llama3:8b-text-q4_K_S (9,845 unique IP deployments) The :latest tag is common, showing users often want the newest versions. Chinese models like DeepSeek and Qwen are also very popular.

Model Families:
The llama family (from Meta AI) is overwhelmingly dominant.
qwen2 (from Alibaba) is the second most popular.
Interestingly, embedding models like nomic-bert and bert see high usage, possibly because Ollama uses them for default embedding tasks.

Model Size (Parameters):
7B-8B parameter models (e.g., “8.0B”, “7.6B”, “7.2B”, “7B”) are the most common. These offer a good balance of capability and resource needs.
Small models (e.g., “134.52M”, “137M”) are also widely used, likely for embedding tasks or lightweight applications.

Quantization (Model Efficiency):Quantization makes models smaller and faster.
4-bit quantization (like Q4_K_M, Q4_0, Q4_K_S) is the clear winner, offering a great compromise between performance and size.
F16 (16-bit, less compressed) is also popular for users wanting higher fidelity or having more powerful hardware.

Model Sizes in China vs. USA:
USA: Strong preference for medium-sized models (1B to <10B parameters).
China: Also uses medium models widely but has a significantly larger number of small model (<1B) deployments. China also shows more activity with extra-large models (>=50B).

Network Setup Insights
Ports: The default port 11434 is used by most. Other common web ports like 80 and 8080 are also seen, often indicating a reverse proxy setup.
Protocols: A majority of instances serve over HTTP (unencrypted), with a smaller but significant number using HTTPS (encrypted).
Hosting Providers: Cloud providers dominate.
AWS (AMAZON-02, AMAZON-AES) is the largest host.
Followed by China Telecom, Alibaba Cloud, and Tencent Cloud.

The large number of Ollama instances exposed via HTTP is a security concern. Communications with these models could be intercepted. It’s recommended to use HTTPS, often by setting up a reverse proxy.
This report is based on instances with publicly accessible /api/tags. The actual number of Ollama deployments is likely higher, with many running in private networks.
In Conclusion
Ollama is clearly a popular tool for developers and AI enthusiasts worldwide, making it easier to experiment with powerful language models.

Global but Concentrated: Usage is widespread, but heavily focused in the US and China, particularly in tech hubs.
Llama Leads the Pack: The Llama model family is the most popular, with 7B-8B parameter models and 4-bit quantization being common choices that balance performance and resource use.
Cloud-Powered: Most users run Ollama on major cloud platforms.
Open Source Vibrancy: The diversity of models and configurations highlights a booming open-source AI ecosystem.
Security Matters: Users should be mindful of securing their Ollama instances, especially if exposed to the internet.
Ollama is playing a key role in bridging the gap between cutting-edge AI models and the broader community. As models become more efficient, we can expect its adoption to grow even further.

DEV Community

Ollama’s Global Reach: A Look at Deployment Trends and Model Choices

Top comments (0)