So what is this CLIP ?
CLIP (Contrastive Language–Image Pretraining) basically is a model from OpenAI that can look at an image and a piece of text and figure out how well they match. It is kind of like a search engine for images, where you can give a description and the model figures out which given image is the closest to it.
We have different sizes of CLIP released by OpenAI and we can run it on just CPU than being dependent on GPU but as you've guessed there are trade offs.
Hosting CLIP on GKE
Since CLIP isnt generally available as a chat app or similar sort like the traditional AI tools, I decided to host it.
Host it where ?
A normal VM works for testing but let's go brrr and spin up some pods
Setting up GKE
I have IaC handy that controls all the infra for my personal project, so i added a gke and did the terraform apply
I'm gonna be using
e2-standard-2
for this example and I think it meets the minimum requirement for our model
Now we have the cluster -> lets set up a script to handle the endpoint for the model
Fastapi and Docker
So we need an endpoint to talk to the model and Im using fastapi for that. Let's wrap the CLIP with an endpoint like /predict
where we be sending the image with different texts and it sends back similarity score as response
similarity score is range from -1 to 0 to 1, closer to 1 is how close the description is with the image
Once the script is ready (link to repo) we can now start the containerizing process
if we properly perform the multiple staging containerization we can bring the image size to around ~1.1GB or maybe less
You can read here how I reduced the container size
Push the image to artifact
Let's push the image to the artifact, it will be easy to pull and build the image later on the pod
Let's create an artifact registry repo
Adding the below to the IaC
Now we containerized the script and we have the artifact repo ready, et's write a deployment yml for our K8
Kubernetes -> GKE
So as you know we need a deployment.yml
and service
to get our app to the public, for this demo we gon be using load balancer
I don't think we are going to have that much of a load, so let's humble ourselves and just go with one replica
You can find the yml file here
CI/CD pipeline with Github Actions
So the idea is to build a pipeline that would build the image and push the latest one to the artifact and then we spin up a pod and pull the image from the artifact and run
build image -> artifact -> spin up pod -> pull the image -> run -> expose it to the world
pretty straight forward.
You can find the yml for pipeline here
Issues I faced
- The usual credential permission stuff in the pipeline
- Image not getting pushed to artifact partly cos of the above related issue
- Pod getting choked as the vm was intially e2-micro :)
But hey we finally got it running .. .. ..
Final
The model worked as expected, I mean with the advance ai models we have at disposal this might not come across that fascinating but still consider the time it was released (2021) it's cool and there's still a lot of stuff we can do with this and if all you need is to match text with images without burning a hole in your wallet, CLIP is still a solid little workhorse.”
Lets see some examples and metrics now 😼
was thinking what the above image is and just asked clip, is that a cat, dog or human ? 😳
as you can see cat got 0.9 similarity making it cat (looks like cat is more similar to human than to dog)
Okay now to the metrics --
TODO : Adding monitoring/observability dashboard to this, preferably Prometheus+Grafana
Top comments (0)