DEV Community

Cover image for Offloading AI inference to your users' devices
Miguel Ángel Cabrera Miñagorri
Miguel Ángel Cabrera Miñagorri

Posted on

1 1

Offloading AI inference to your users' devices

Integrating LLMs in existing web applications is becoming the norm. Also, there are more and more AI native companies. These create autonomous agents putting the LLM in the center and giving it tools allowing it to perform actions on different systems.

In this post I will present a new project called Offload, which allows you to move all that processing to the user devices, increasing their data privacy and reducing the inference costs.

The 2 problems

The are two big concerns when integrating AI in an application: Cost and user data privacy.

1. Cost. The typical way to connect an LLM is to use a third-party API, like OpenAI, Anthropic, or others, there are many alternatives in the market. These APIs are very practical, with just an HTTP request you can easily integrate an LLM into your application. However, these APIs are expensive at scale. They are putting big efforts into reducing the cost, but if you make many API calls per user per day the bill becomes huge.

2. User data privacy. Using third-party APIs for inference is not the best alternative if you work with sensitive user data. These APIs often use the data you send to continue training the model which can expose your confidential data. Also, the data could become visible at some level when it reaches the third-party API provider (for example in a logging system). This is not just a problem for companies, but also for consumers that may not want to send their data to those API providers.

Addressing them

Offload addresses both problems at once. The application "invokes" the LLM via an SDK that behind the scenes runs the model directly on each user device instead of calling a third-party API. This saves money on the inference bill because you do not need to pay for API usage and maintain the user data within each user device, not needing to send it to any API.

If this is of your interest and want to remain in the loop, check out the Offload website here

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay