"We Have No Moat, And Neither Does OpenAI"
"Open-source models are faster, more customizable, more private, and pound-for-pound more capable."
You've probably all heard these quotes by now and if you haven't checkout em out here
And both Google and OpenAI should rightfully be worried. We're well and truly in the age of democratized machine learning.
As big tech and the tech community quickly churn out open source models, it just makes our lives easier, to pick and choose which models serve us and in what way.
Given that, just open sourcing a model isn't enough, we have to be able to deploy, run and maintain it. Thankfully, even this is becoming easier and easier.
In this post I want to walk you through one of the simplest ways of deploying and (if you have money) scaling these models.
HuggingFace
Deploying on HuggingFace is definitely my favourite, it's super easy to get started and there's burgeoning community developing in the opensource machine learning space.
If you're not already familiar with HuggingFace, it's an end to end machine learning platform, with a GitHub like interface.
One of my favourite features is the model hub + the ease of deploying a model. And this is what I'm going to show you now.
First let's take a look at the model hub: https://huggingface.co/models
Here, you'll find all the open source models, from both individual people as well as companies like Google, Meta, OpenAI etc.
For the purpose of this blog, let's go ahead and find StabilityAI's stable-diffusion-2. A text to image model, because playing with images is fun: https://huggingface.co/stabilityai/stable-diffusion-2.
Here, you can see all the information about the model, relevant research papers, licenses, model card, examples, useages, the code etc. 

To the right, you'll notice a deploy button. Go ahead and click that. There's three different options:
- InferenceAPI (which allows you to use the model in API form for testing purposes: this is not for production use!)
- Inference Endpoints (which is what we'll use)
- Spaces (this allows you to deploy the model behind a GUI using Gradio, Streamlit etc. We won't cover this in this post).
Here you can choose a cloud provider: so far AWS has the most GPU options (and I haven't actually seen GCP usable, so I don't know if that's available yet or not!).
For this model, I chose a large GPU on AWS, feel free to try others, however, small ran out of memory for me!
Once you've clicked deploy, maybe go get a coffee or something cause deploying will take some time.
As soon as it's up and running, you can start using the built in UI for testing. More importantly, you can also use it as an API to power your applications.
For example, testing:
And the equivalent cuRL:
curl https://endpoint-name.us-east-1.aws.endpoints.huggingface.cloud \
-X POST \
-d '{"inputs":"A beautiful sunset over the horizon on a clear day in Rome"}' \
-H "Authorization: Bearer <hf_token>" \
-H "Content-Type: application/json"
Now, you can use this API and your token to build any apps or tools you want.
At this point, you're probably wondering about scalability, monitoring and observability.
Scalability
If you flick through the tabs at the top, you can configure replica autoscaling to suit your needs.
Monitoring
The analytics tab is a good starting point, to monitor latency, requests, and utilization.
I would suggest building out more detailed monitoring dashboard once you actually build an application on top of this.
Observability
The logs tab again is a good starting point to see what's going on with your model.
I would suggest more verbose, detailed logging at various levels of your actual application (LB, API gateway, server etc)
Annd, there you have it, a fairly production ready Generative AI API for you to utilize.
Note: Always remember to delete or pause your deployed models, if you don't want a nasty surprise at the end of the month!
Last thought I want to leave you with: really consider the cost of hosting a model vs using one as a service through something like OpenAI. A lot of these services are very cheap, compared to running a model yourself (on prem or cloud), so always do some maths and projections before making a choice!
 



 
    
Top comments (0)