DEV Community

Cover image for Selfhost Ollama with Open WebUI Online
Lightning Developer
Lightning Developer

Posted on • Edited on

4 1 2 1

Selfhost Ollama with Open WebUI Online

Introduction

In the rapidly advancing AI landscape, deploying large language models (LLMs) such as Meta’s Llama 3, Google’s Gemma, and Mistral on local systems offers unparalleled advantages in terms of data privacy and customization. However, self-hosting and enabling secure online access to these tools significantly enhances their potential, whether it’s for developers demonstrating prototypes, researchers collaborating remotely, or businesses integrating AI into customer-centric applications.

This comprehensive guide provides step-by-step instructions for securely sharing Ollama’s API and Open WebUI online using Pinggy, a simple tunneling service. Learn how to seamlessly make your local AI setup accessible worldwide without the need for cloud infrastructure or complex configurations.

Summary of the steps:

  1. Install Ollama & Download a Model:

    • Get Ollama from ollama.com and run a model:
    ollama run llama3:8b
    
    
  2. Deploy Open WebUI

    • Run via Docker
    docker run -d -p 3000:8080 --add-host=host.docker.internal:host- 
    gateway ghcr.io/open-webui/open-webui:main
    
    
  3. Expose WebUI Online

    • Tunnel port 3000:
    ssh -p 443 -R0:localhost:3000 a.pinggy.io
    
    

Share the generated URL for ChatGPT-like access to your LLMs.

Why Share Ollama API and Open WebUI Online?

The Rise of Local AI Deployments:

Due to growing concerns about data privacy and API expenses, running LLMs locally using tools like Ollama and Open WebUI has become a popular choice. However, keeping access limited to your local network restricts their usability. Sharing these tools online enables:

  • AI integration into web and mobile applications.
  • Project demonstrations without cloud deployment.
  • Lower latency while keeping inference local.

Why Use Pinggy for Tunneling?

Pinggy simplifies the process of port forwarding by providing secure tunnels. Its standout features include:

  • Free HTTPS URLs without requiring signup.
  • No rate limitations on the free plan.
  • SSH-based encrypted connections for enhanced security.

Prerequisites for Sharing Ollama and Open WebUI

A. Install Ollama

  1. Download and install Ollama based on your operating system:

    • Windows: Run the .exe installer.
    • macOS/Linux: Execute:
     curl -fsSL https://ollama.com/install.sh | sh
    
  2. Verify the installation:

    ollama --version
    

B. Download a Model

Ollama supports a wide range of models. Start with a lightweight one:

ollama run qwen:0.5b
Enter fullscreen mode Exit fullscreen mode

For multimodal models:

ollama run llava:13b
Enter fullscreen mode Exit fullscreen mode

C. Install Open WebUI

Open WebUI offers a ChatGPT-like interface for Ollama. Install it via Docker:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Enter fullscreen mode Exit fullscreen mode

Access the interface at http://localhost:3000 and set up an admin account.

Installed

command

Sharing Ollama API Online: Detailed Steps

  1. Start Ollama Locally

    By default, Ollama runs on port 11434. Launch the server:

    ollama serve
    
  2. Create a Public URL with Pinggy

    Run this SSH command to tunnel the Ollama API:

    ssh -p 443 -R0:localhost:11434 -t qr@a.pinggy.io 
    "u:Host:localhost:11434"
    
    

    After executing, you will receive a URL such as https://abc123.pinggy.link.

  3. Verify API Access

    Test the shared API using curl:

    curl https://abc123.pinggy.link/api/tags
    

Alternatively, use a browser to verify access.

Running WebUI

Sharing Open WebUI Online: Step-by-Step

  1. Expose Open WebUI via Pinggy

    To share port 3000, execute:

    ssh -p 443 -R0:localhost:3000 a.pinggy.io
    
    

    You will receive a unique URL, such as
    https://xyz456.pinggy.link.

  2. Access WebUI Remotely

    1. Open the provided URL in a browser.
    2. Log in using your Open WebUI credentials.
    3. Utilize features such as:
      • Chatting with various models.
      • Uploading documents for Retrieval-Augmented Generation (RAG).
      • Switching between different models.

WebUI
Setp1

step2
Advanced Security and Optimization Tips

  1. Enhance Security

    Add basic authentication to your Pinggy tunnel by appending
    username/password credentials:

    ssh -p 443 -R0:localhost:3000 user:pass@a.pinggy.io
    
  2. Utilize Custom Domains

Upgrade to Pinggy Pro to configure
custom domains:

   ssh -p 443 -R0:localhost:3000 -T yourdomain.com@a.pinggy.io
Enter fullscreen mode Exit fullscreen mode

Real-World Applications for Remote AI Access

Collaborative Development

  • Share an Ollama instance for collaborative code reviews and documentation generation.
  • Co-train custom models using Open WebUI.

Customer-Facing Applications

  • Power AI-driven chatbots for enhanced customer support.
  • Automate content generation for blogs and social media.

Academic and Research Projects

  • Securely share proprietary models with research collaborators.

Troubleshooting Common Issues

Connection Refused

  • Ensure Ollama is running with ollama serve.
  • Check firewall settings for ports 11434 and 3000.

Model Loading Failures

  • Verify model compatibility with your current Ollama version.
  • Free up system memory for larger models such as llama3:70b.

Conclusion

By combining Ollama, Open WebUI, and Pinggy, you can transform your local AI environment into a secure, shareable platform without relying on cloud services. This setup caters perfectly to startups, researchers, and anyone prioritizing data privacy and performance.

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry 🕒

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (2)

Collapse
 
artydev profile image
artydev

Thank you :-)

Collapse
 
rjonesy profile image
R Jones

Thanks for this.

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →