DEV Community

Cover image for How to Easily Share OpenLLM API Online
Lightning Developer
Lightning Developer

Posted on

2 2 1 1 1

How to Easily Share OpenLLM API Online

Deploying and Exposing Self-Hosted AI Models with OpenLLM and Pinggy:

As generative AI adoption grows, developers increasingly seek ways to self-host large language models (LLMs) for enhanced control over data privacy and model customization. OpenLLM is an excellent framework for deploying models like Llama 3 and Mistral locally, but exposing them over the internet can be challenging. Enter Pinggy, a tunneling solution that allows secure remote access to self-hosted LLM APIs without complex infrastructure.

This guide walks you through the process of deploying an OpenLLM instance and sharing it with a public URL using Pinggy—making your AI services accessible in just a few minutes.

Why Self-Host LLMs?

  • The Rise of Local AI Deployment
  • Many developers prefer to host LLMs locally due to:
  • Data Privacy: Avoid sending sensitive data to third-party API providers.
  • Cost Efficiency: Reduce API usage costs associated with cloud-based services.
  • Customization: Fine-tune and optimize models based on specific needs.

However, a major drawback of self-hosting is that the models remain confined to a local machine, limiting access for collaboration, integration, and testing. This is where Pinggy simplifies remote exposure.

Why Use Pinggy for Tunneling?

Pinggy provides a lightweight, secure, and efficient solution for exposing local services over the internet. Compared to other tunneling tools, it offers:

  • Free HTTPS URLs with minimal setup
  • No rate limits on free-tier usage
  • Persistent URLs with the Pinggy Pro plan
  • Built-in web debugger to monitor incoming requests

By integrating Pinggy, you can share your OpenLLM API remotely without complex networking configurations.

Step-by-Step Guide to Deploy and Share OpenLLM

Step 1: Install OpenLLM & Deploy a Model

Prerequisites:

Python installed

pip package manager

Install OpenLLM:

pip install openllm
Enter fullscreen mode Exit fullscreen mode

Installation

Start a Model Server:

To launch an LLM, use the following command (replace llama3.2:1b-instruct-ggml-fp16-linux with your preferred model):

openllm serve llama3.2:1b-instruct-ggml-fp16-linux

Enter fullscreen mode Exit fullscreen mode

How to start a Model Server
Supported Models: Mistral, Falcon, Qwen, Dolly-v2, and more.

At this point, OpenLLM is running on localhost:3000 but inaccessible outside your machine. Let’s expose it using Pinggy.

Step 2: Expose OpenLLM API via Pinggy

Create a Secure Tunnel:

Run the following command to create a secure remote tunnel:

ssh -p 443 -R0:localhost:3000 a.pinggy.io
Enter fullscreen mode Exit fullscreen mode

Upon execution, Pinggy will generate a public URL that allows remote access to your model. For example:

https://xyz123.pinggy.link
Enter fullscreen mode Exit fullscreen mode

Access API Endpoints:

Once exposed, use the provided URL to interact with OpenLLM:

  • Check API Status:

    curl https://xyz123.pinggy.link/
    

OpenLLM

  • Access OpenLLM WebUI:

     curl https://xyz123.pinggy.link/chat
    

Chat

  • List Available Models:

     curl https://xyz123.pinggy.link/v1/models
    

Models

Advanced Configuration and Security

Secure Your API with Authentication:

To restrict access, append a username and password to your SSH command:

ssh -p 443 -R0:localhost:3000 -t a.pinggy.io b:username:password
Enter fullscreen mode Exit fullscreen mode

This adds an authentication layer, ensuring only authorized users can access the endpoint.

With Pinggy Pro, you can configure a custom domain for your LLM service, improving branding and ease of access.

Real-World Use Cases

  1. Collaborative AI Development
    Teams can share an OpenLLM instance for testing and model fine-
    tuning.
    Remote developers can integrate AI models into applications without
    local installations.

  2. AI-Powered Customer Support & Content Generation.
    Expose OpenLLM’s API to build chatbots for businesses.
    Use LLMs for automated content creation in marketing and social
    media.

  3. Academic & Research Workflows
    Researchers can collaborate on AI models without exposing internal infrastructure.

OpenLLM can be used for real-time experiments and AI benchmarking.

Troubleshooting & Optimization

Model Loading Issues?

Ensure your machine meets the hardware requirements (RAM/GPU availability).

Try using a lower-precision model:

openllm run llama3.2:1b-instruct-ggml-fp16-linux --quantize int4
Enter fullscreen mode Exit fullscreen mode

Connection Timeouts?

For unstable networks, use Pinggy’s persistent tunnel mode:

while true; do
  ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:3000 a.pinggy.io;
  sleep 10;
done
Enter fullscreen mode Exit fullscreen mode

Conclusion:
Combining OpenLLM for model deployment with Pinggy for secure remote access creates a straightforward and effective solution for AI developers. It enables full control over models, remote access without infrastructure complexity, and enhanced security with authentication and custom domains.

Top comments (0)

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up