NSWF Porn detection microservice
I built a low-cost NSFW API hosted on Digital Ocean's new App Platform.
How do image tagging ML models work?
Making predictions based on images involves two basic steps: training the data and then processing the prediction. How to train the ML model can be found in the Github repo: GantMan/nsfw_model.
The prediction API first fetches the remote image and saves the bytes to disk. Persisting to disk simplifies communicating with the ML library since the library accepts a file path, not a byte stream.
Then the image is resized to fit the dimensions of the ML model. The ML algorithm needs to compare apples to apples and so resizing to match the same size of the image training data is critical for developing the right comparison.
The resized image is categorized using the attached model. This provides a float score for each of the categories: drawings, hentai, neutral, porn, and sexy. The higher the score, the more likely the image is in this category.
Once the prediction is created, we clean up after ourselves by deleting the image from the disk and return the response.
On the client, these scores are converted to 3 states:
- Definitely Adult Content
- Unknown
- Definitely Safe Content
The unknown state will need to be human-reviewed and bucketed into one of the "definite" categories. For my first pass, I use a combination of "sexy" and "porn" scores to determine if it's "definitely adult content" and I look at the "neutral" score to know if the image is "Definitely safe Content."
Quick Start
Self-hosting and using only takes a couple of hours since the API is so simple and Digital Ocean's App Platform allows for Heroku-like deployment.
Flask API
You will need to develop your client, but there are only 2 HTTP endpoints you would need to implement: POST /predict
and GET /health
POST /predict
The service accepts a URL of an image to fetch and process. Instead of passing the image bytes directly, the URL reduces the workload on the client and avoids the overhead of base64 encoding images for the transfer (base64 has a ~33% worse space overhead).
$curl -XPOST 'http://localhost:8080/predict?url=https://www.kcoleman.me/images/hills.jpg'
{"drawings":0.11510543525218964,"hentai":0.024719053879380226,"neutral":0.803202748298645,"porn":0.0172234196215868,"sexy":0.039749305695295334}
GET /health
The health endpoint helps you monitor if the service is running without needing to process an image.
$ curl 'http://localhost:8080/health'
{"status":"ok"}
Hosting ML microservices
Heroku
Unfortunately, Heroku limits the slug size to 500MB. After compilation, the flask app is 635MB (due to needing to load the ML model (250MB) and PyTorch. It is impossible to host ML services on Heroku.
Digital Ocean
The $10/mo Digital Ocean 1GB/1vCPU App Platform hosts this project perfectly. The first deployment takes 20+ minutes, but it will eventually startup. There is a health check endpoint at /health
where you can verify the service is running.
This machine takes about 600ms per request and has 2 workers, so can take about 0.8 requests per second or 72,000 images per day. Not too shabby for a $10/mo ML microservice.
Sample App config
name: nsfw-flask
region: nyc
services:
- environment_slug: python
github:
branch: master
deploy_on_push: true
repo: KevinColemanInc/NSFW-FLASK
health_check:
http_path: /health
http_port: 8080
instance_count: 1
instance_size_slug: basic-s
name: nsfw-flask
routes:
- path: /
run_command: gunicorn --worker-tmp-dir /dev/shm app:app
source_dir: /
Special Thanks
The flask service is a wrapper for the GantMan/nsfw_model. They performed the heavy lift of developing the ML model and the prediction code.
You can play with a web host version of the model on nsfwjs.com since we use the same model.
Top comments (0)