Lightning Developer

Posted on Oct 3

A Guide to Securely Exposing Ollama on Colab via Pinggy

#webdev #pinggy #ai #tutorial

Working with large language models locally often presents challenges—expensive GPUs, complex software setups, and high electricity costs. But with Google Colab and Pinggy, you can run Ollama models remotely, access them anywhere, and even provide a web interface for interactive use. This guide will walk you through every step, including all commands, so you can get started immediately.

Why Use Google Colab for Ollama?

Google Colab offers free GPU resources (like NVIDIA T4s) that make it possible to run models that would otherwise require costly hardware. Combined with Pinggy, a tunneling service, you can expose your Colab instance to the internet securely.

This setup is ideal for:

Developers: Quickly experiment with different models and APIs.
Researchers: Test large models without investing in local hardware.
Students: Learn by running real-world models without high costs.

Colab comes preinstalled with CUDA drivers and essential ML libraries, so you don’t have to worry about environment setup.

Step 1: Setting Up Your Colab Environment

Open Google Colab and create a new notebook.
Enable GPU runtime:

   Runtime > Change runtime type > Hardware accelerator: GPU

GPU acceleration is essential for practical use. Small models can run on CPU, but any model larger than ~2-3B parameters requires a GPU to run efficiently.

Step 2: Installing Ollama

Ollama provides a simple installation script for Linux-based environments like Colab. Run the following commands in a Colab cell:

# Install required utilities
!sudo apt-get update
!sudo apt-get install -y pciutils curl

# Download and install Ollama
!curl -fsSL https://ollama.ai/install.sh | sh

Once installed, start the Ollama server in the background using Python:

import subprocess

def start_ollama_server():
    subprocess.Popen(['ollama', 'serve'])
    print("Ollama server launched successfully!")

start_ollama_server()

This will launch the Ollama service on port 11434, which is the default API port.

Step 3: Installing and Configuring Pinggy

Pinggy allows you to create a secure tunnel from your Colab instance to the internet, giving you a publicly accessible URL.

Install the Pinggy Python SDK:

!pip install pinggy

Start a tunnel to your Ollama server:

import pinggy

# Create a tunnel forwarding traffic from [Pinggy](https://pinggy.io/) to your local Ollama server
tunnel = pinggy.start_tunnel(
    forwardto="localhost:11434",
    headermodification=["u:Host:localhost:11434"]
)

# Display the public URL
print(f"Tunnel started! Access Ollama API at: {tunnel.urls}")

You now have a public URL like https://randomstring.a.pinggy.link that forwards requests to your Ollama instance. You can use this URL from any device or share it with collaborators.

Step 4: Downloading and Running Your First Model

With Ollama running and the tunnel active, you can download a model and start interacting with it.

# Download a small Llama model
!ollama pull llama3.2:1b

# Run the model
!ollama run llama3.2:1b

Test your model via the API using Python:

import requests
import json

tunnel_url = "https://your-tunnel-url.a.pinggy.link"  # Replace with your Pinggy URL

response = requests.post(
    f"{tunnel_url}/api/generate",
    json={
        "model": "llama3.2:1b",
        "prompt": "Hello, how are you?",
        "stream": False
    }
)

print(response.json()["response"])

This confirms that your model is running and accessible remotely.

Step 5: Setting Up OpenWebUI

While API access is beneficial for developers, OpenWebUI offers a browser interface that is similar to ChatGPT.

Install OpenWebUI:

!pip install open-webui

Create a separate tunnel for the web interface (default port 8000):

tunnel_ui = pinggy.start_tunnel(
    forwardto="localhost:8000"
)

print(f"OpenWebUI is accessible at: {tunnel_ui.urls}")

Launch OpenWebUI:

!open-webui serve --port 8000

You now have a fully functional web interface where you can interact with your models using a ChatGPT-like interface.

Step 6: Using the Setup

Experimentation: Test multiple prompts and models quickly without cluttering your local system.
Collaboration: Share Pinggy URLs with teammates or students for temporary access.
Prototyping: Build small applications that integrate with your Ollama API.

Remember, Colab instances are temporary. For longer sessions or bigger models, consider Colab Pro or other hosting options.

Step 7: Cleaning Up

When your session ends, Colab will automatically terminate your instance and Pinggy tunnels. You can also stop the server manually:

# Terminate Ollama server
subprocess.call(["pkill", "-f", "ollama"])

This ensures no resources are running unnecessarily.

Conclusion

Running Ollama on Google Colab through Pinggy combines the best of both worlds:

Free GPU resources for running models
Secure public access through Pinggy tunnels
A clean, user-friendly interface with OpenWebUI

This setup is perfect for experimentation, learning, or even temporary demos. By following these steps, you can host your own language models without investing in expensive hardware or complex local setups.

DEV Community