Lightning Developer

Posted on Nov 3, 2025 • Edited on Nov 26, 2025

Deploy AI Applications on Google Colab - No Cost, No Server Needed

#webdev #pinggy #python #productivity

Building an AI-powered application is thrilling, but hosting it can quickly become expensive. Between GPU rentals, cloud servers, and data transfer costs, even small projects can eat into your budget. But what if you could deploy a working AI app online without paying a single penny?

Why Google Colab Makes Sense for Hosting

Normally, AI applications require GPU-backed cloud servers that cost anywhere from a few cents to several dollars per hour. These costs quickly pile up when testing or iterating frequently.

Google Colab changes the game. It offers free access to Tesla T4 GPUs, which are powerful enough for small to medium-sized language models like Llama 3.2 1B. While Colab was built primarily for machine learning experiments, it’s perfectly capable of running lightweight web apps — especially when paired with a tunneling service like Pinggy.

Pinggy creates a public, secure URL for your Colab app, allowing anyone to access it directly. This makes it a great tool for developers, students, or researchers who want to prototype or share AI projects without worrying about infrastructure or cost.

Step 1: Preparing the Colab Notebook

Start with a new Colab notebook. Since running an AI model requires GPU acceleration, enable it first:

Go to Runtime → Change runtime type
Select GPU under the hardware accelerator options.

Then, install Ollama and the required system package by running:

!sudo apt-get install -y pciutils
!curl https://ollama.ai/install.sh | sh

The first command ensures your GPU is properly detected, while the second installs Ollama and sets up everything needed to serve language models.

Step 2: Launching the Ollama Server

Once Ollama is installed, we’ll start its server process in the background so that it stays active even when the cell finishes executing:

import subprocess
import os

def start_ollama_server():
    subprocess.Popen(
        ['nohup', 'ollama', 'serve'],
        stdout=open('ollama.log', 'w'),
        stderr=open('ollama_error.log', 'w'),
        preexec_fn=os.setsid
    )
    print("Ollama server is running in the background.")

start_ollama_server()

The server will handle all API requests from your Flask app and log outputs for troubleshooting.

Step 3: Confirming the Server Is Active

Next, make sure Ollama is listening on its default port — 11434. This quick check ensures that the API endpoint is available.

def check_ollama_port(port='11434'):
    try:
        output = subprocess.run(['sudo', 'lsof', '-i', '-P', '-n'],
                                capture_output=True, text=True).stdout
        if f":{port} (LISTEN)" in output:
            print(f"Ollama is active on port {port}")
        else:
            print("Ollama is not running.")
    except Exception as e:
        print(f"Error checking port: {e}")

check_ollama_port()

If the server is properly configured, you’ll get a confirmation message.

Step 4: Loading the AI Model

With the server ready, it’s time to load the model. The Llama 3.2 1B model is lightweight, efficient, and well-suited for Colab’s free GPU tier.

!ollama pull llama3.2:1b

This version of the Llama model is compact enough to run smoothly while still generating meaningful, coherent responses. If you’re using Colab Pro, you can try larger models like llama3.2:3b for more advanced results.

Step 5: Making It Public Using Pinggy

Colab sessions are isolated by default, meaning external users can’t access them directly. To share your AI app, install Pinggy and create a public tunnel.

!pip install pinggy

Then, start the tunnel to forward requests to the local Flask server on port 8000:

import pinggy

tunnel = pinggy.start_tunnel(forwardto="localhost:8000")
print(f"Public URLs: {tunnel.urls}")

You’ll get an HTTPS link (something like https://randomstring.a.pinggy.link) that anyone can use to access your app during the Colab session.

Step 6: Building the Flask Web App

Now it’s time to design the user interface and connect it with the model. Below is a simple Flask app that allows users to input a topic and receive a blog post generated by the Llama model.

from flask import Flask, request, render_template_string
import requests, json

app = Flask(__name__)

HTML_TEMPLATE = """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>AI Blog Writer</title>
<style>
body {
  font-family: system-ui, sans-serif;
  background: linear-gradient(135deg, #6a11cb, #2575fc);
  min-height: 100vh;
  display: flex;
  align-items: center;
  justify-content: center;
  margin: 0;
}
.container {
  background: #fff;
  padding: 2rem;
  border-radius: 15px;
  box-shadow: 0 10px 30px rgba(0,0,0,0.2);
  width: 90%;
  max-width: 800px;
}
</style>
</head>
<body>
<div class="container">
  <h1>AI Blog Writer</h1>
  <form method="POST">
    <input type="text" name="title" placeholder="Enter blog topic..." required style="width:70%; padding:0.5rem;">
    <button type="submit">Generate</button>
  </form>
  {% if blog %}
    <h2>{{ title }}</h2>
    <p>{{ blog }}</p>
  {% endif %}
</div>
</body>
</html>
"""

@app.route("/", methods=["GET", "POST"])
def index():
    blog, title = None, None
    if request.method == "POST":
        title = request.form.get("title")
        prompt = f"Write an informative and creative blog post about: '{title}'"
        try:
            response = requests.post(
                "http://localhost:11434/api/generate",
                json={"model": "llama3.2:1b", "prompt": prompt},
                stream=True
            )
            blog_parts = []
            for line in response.iter_lines():
                if line:
                    data = json.loads(line.decode("utf-8"))
                    if "response" in data:
                        blog_parts.append(data["response"])
            blog = "".join(blog_parts)
        except Exception as e:
            blog = f"Error reaching Ollama API: {e}"
    return render_template_string(HTML_TEMPLATE, blog=blog, title=title)

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=8000)

When a user submits a topic, the app sends a prompt to the Ollama API, streams the response, and displays it neatly on the web page. The HTML layout is responsive and simple, making it easy to use on both desktop and mobile devices.

Step 7: Running the App

Execute the Flask cell and wait for it to start. Once it’s running, open the Pinggy URL printed earlier. You’ll see your AI Blog Writer live on the internet.

Try entering a topic like “Benefits of Remote Work” or “How AI is Transforming Education” and watch as your app generates an entire article in real time.

The Power of This Setup

This free hosting approach combines the best of three platforms:

Google Colab gives you free GPU compute.
Ollama runs powerful language models locally.
Pinggy bridges the gap by making the Colab app publicly accessible.

It’s an ideal environment for testing AI ideas, building quick prototypes, or sharing interactive demos without investing in dedicated servers.

Conclusion

Hosting an AI web app doesn’t always require paid cloud infrastructure. By combining Colab’s free GPU, Ollama’s efficient LLM serving, and Pinggy’s tunneling, you can run and share your AI projects entirely for free.

While Colab has some session limits, it’s perfect for experimentation, learning, and low-traffic apps. This workflow can easily be extended beyond blog writing; you can build chatbots, summarization tools, or creative writing assistants using the same foundation.

In short, this setup proves that with a bit of creativity, even high-performance AI applications can be built and hosted at zero cost.

DEV Community