Lightning Developer

Posted on Feb 13

How to Easily Share OpenLLM API Online

#webdev #ai #api #javascript

Deploying and Exposing Self-Hosted AI Models with OpenLLM and Pinggy:

As generative AI adoption grows, developers increasingly seek ways to self-host large language models (LLMs) for enhanced control over data privacy and model customization. OpenLLM is an excellent framework for deploying models like Llama 3 and Mistral locally, but exposing them over the internet can be challenging. Enter Pinggy, a tunneling solution that allows secure remote access to self-hosted LLM APIs without complex infrastructure.

This guide walks you through the process of deploying an OpenLLM instance and sharing it with a public URL using Pinggy—making your AI services accessible in just a few minutes.

Why Self-Host LLMs?

The Rise of Local AI Deployment
Many developers prefer to host LLMs locally due to:
Data Privacy: Avoid sending sensitive data to third-party API providers.
Cost Efficiency: Reduce API usage costs associated with cloud-based services.
Customization: Fine-tune and optimize models based on specific needs.

However, a major drawback of self-hosting is that the models remain confined to a local machine, limiting access for collaboration, integration, and testing. This is where Pinggy simplifies remote exposure.

Why Use Pinggy for Tunneling?

Pinggy provides a lightweight, secure, and efficient solution for exposing local services over the internet. Compared to other tunneling tools, it offers:

Free HTTPS URLs with minimal setup
No rate limits on free-tier usage
Persistent URLs with the Pinggy Pro plan
Built-in web debugger to monitor incoming requests

By integrating Pinggy, you can share your OpenLLM API remotely without complex networking configurations.

Step-by-Step Guide to Deploy and Share OpenLLM

Step 1: Install OpenLLM & Deploy a Model

Prerequisites:

Python installed

pip package manager

Install OpenLLM:

pip install openllm

Start a Model Server:

To launch an LLM, use the following command (replace llama3.2:1b-instruct-ggml-fp16-linux with your preferred model):

openllm serve llama3.2:1b-instruct-ggml-fp16-linux

Supported Models: Mistral, Falcon, Qwen, Dolly-v2, and more.

At this point, OpenLLM is running on localhost:3000 but inaccessible outside your machine. Let’s expose it using Pinggy.

Step 2: Expose OpenLLM API via Pinggy

Create a Secure Tunnel:

Run the following command to create a secure remote tunnel:

ssh -p 443 -R0:localhost:3000 a.pinggy.io

Upon execution, Pinggy will generate a public URL that allows remote access to your model. For example:

https://xyz123.pinggy.link

Access API Endpoints:

Once exposed, use the provided URL to interact with OpenLLM:

Check API Status:
```
curl https://xyz123.pinggy.link/
```

Access OpenLLM WebUI:
```
 curl https://xyz123.pinggy.link/chat
```

List Available Models:

 curl https://xyz123.pinggy.link/v1/models

Advanced Configuration and Security

Secure Your API with Authentication:

To restrict access, append a username and password to your SSH command:

ssh -p 443 -R0:localhost:3000 -t a.pinggy.io b:username:password

This adds an authentication layer, ensuring only authorized users can access the endpoint.

With Pinggy Pro, you can configure a custom domain for your LLM service, improving branding and ease of access.

Real-World Use Cases

Collaborative AI Development
Teams can share an OpenLLM instance for testing and model fine-
tuning.
Remote developers can integrate AI models into applications without
local installations.
AI-Powered Customer Support & Content Generation.
Expose OpenLLM’s API to build chatbots for businesses.
Use LLMs for automated content creation in marketing and social
media.
Academic & Research Workflows
Researchers can collaborate on AI models without exposing internal infrastructure.

OpenLLM can be used for real-time experiments and AI benchmarking.

Troubleshooting & Optimization

Model Loading Issues?

Ensure your machine meets the hardware requirements (RAM/GPU availability).

Try using a lower-precision model:

openllm run llama3.2:1b-instruct-ggml-fp16-linux --quantize int4

Connection Timeouts?

For unstable networks, use Pinggy’s persistent tunnel mode:

while true; do
  ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:3000 a.pinggy.io;
  sleep 10;
done

Conclusion:
Combining OpenLLM for model deployment with Pinggy for secure remote access creates a straightforward and effective solution for AI developers. It enables full control over models, remote access without infrastructure complexity, and enhanced security with authentication and custom domains.

DEV Community

How to Easily Share OpenLLM API Online

Top comments (0)