Working with large language models locally often presents challenges—expensive GPUs, complex software setups, and high electricity costs. But with Google Colab and Pinggy, you can run Ollama models remotely, access them anywhere, and even provide a web interface for interactive use. This guide will walk you through every step, including all commands, so you can get started immediately.
Why Use Google Colab for Ollama?
Google Colab offers free GPU resources (like NVIDIA T4s) that make it possible to run models that would otherwise require costly hardware. Combined with Pinggy, a tunneling service, you can expose your Colab instance to the internet securely.
This setup is ideal for:
- Developers: Quickly experiment with different models and APIs.
- Researchers: Test large models without investing in local hardware.
- Students: Learn by running real-world models without high costs.
Colab comes preinstalled with CUDA drivers and essential ML libraries, so you don’t have to worry about environment setup.
Step 1: Setting Up Your Colab Environment
- Open Google Colab and create a new notebook.
- Enable GPU runtime:
   Runtime > Change runtime type > Hardware accelerator: GPU
GPU acceleration is essential for practical use. Small models can run on CPU, but any model larger than ~2-3B parameters requires a GPU to run efficiently.
Step 2: Installing Ollama
Ollama provides a simple installation script for Linux-based environments like Colab. Run the following commands in a Colab cell:
# Install required utilities
!sudo apt-get update
!sudo apt-get install -y pciutils curl
# Download and install Ollama
!curl -fsSL https://ollama.ai/install.sh | sh
Once installed, start the Ollama server in the background using Python:
import subprocess
def start_ollama_server():
    subprocess.Popen(['ollama', 'serve'])
    print("Ollama server launched successfully!")
start_ollama_server()
This will launch the Ollama service on port 11434, which is the default API port.
Step 3: Installing and Configuring Pinggy
Pinggy allows you to create a secure tunnel from your Colab instance to the internet, giving you a publicly accessible URL.
- Install the Pinggy Python SDK:
!pip install pinggy
- Start a tunnel to your Ollama server:
import pinggy
# Create a tunnel forwarding traffic from [Pinggy](https://pinggy.io/) to your local Ollama server
tunnel = pinggy.start_tunnel(
    forwardto="localhost:11434",
    headermodification=["u:Host:localhost:11434"]
)
# Display the public URL
print(f"Tunnel started! Access Ollama API at: {tunnel.urls}")
You now have a public URL like https://randomstring.a.pinggy.link that forwards requests to your Ollama instance. You can use this URL from any device or share it with collaborators.
Step 4: Downloading and Running Your First Model
With Ollama running and the tunnel active, you can download a model and start interacting with it.
# Download a small Llama model
!ollama pull llama3.2:1b
# Run the model
!ollama run llama3.2:1b
Test your model via the API using Python:
import requests
import json
tunnel_url = "https://your-tunnel-url.a.pinggy.link"  # Replace with your Pinggy URL
response = requests.post(
    f"{tunnel_url}/api/generate",
    json={
        "model": "llama3.2:1b",
        "prompt": "Hello, how are you?",
        "stream": False
    }
)
print(response.json()["response"])
This confirms that your model is running and accessible remotely.
Step 5: Setting Up OpenWebUI
While API access is beneficial for developers, OpenWebUI offers a browser interface that is similar to ChatGPT.
- Install OpenWebUI:
!pip install open-webui
- Create a separate tunnel for the web interface (default port 8000):
tunnel_ui = pinggy.start_tunnel(
    forwardto="localhost:8000"
)
print(f"OpenWebUI is accessible at: {tunnel_ui.urls}")
- Launch OpenWebUI:
!open-webui serve --port 8000
You now have a fully functional web interface where you can interact with your models using a ChatGPT-like interface.
Step 6: Using the Setup
- Experimentation: Test multiple prompts and models quickly without cluttering your local system.
- Collaboration: Share Pinggy URLs with teammates or students for temporary access.
- Prototyping: Build small applications that integrate with your Ollama API.
Remember, Colab instances are temporary. For longer sessions or bigger models, consider Colab Pro or other hosting options.
Step 7: Cleaning Up
When your session ends, Colab will automatically terminate your instance and Pinggy tunnels. You can also stop the server manually:
# Terminate Ollama server
subprocess.call(["pkill", "-f", "ollama"])
This ensures no resources are running unnecessarily.
Conclusion
Running Ollama on Google Colab through Pinggy combines the best of both worlds:
- Free GPU resources for running models
- Secure public access through Pinggy tunnels
- A clean, user-friendly interface with OpenWebUI
This setup is perfect for experimentation, learning, or even temporary demos. By following these steps, you can host your own language models without investing in expensive hardware or complex local setups.
 

 
    
Top comments (0)