DEV Community: wjiuhe

Stop Writing Flask to Serve/Deploy Your Model: Pinferencia is Here

wjiuhe — Wed, 27 Apr 2022 23:40:14 +0000

Stop Writing Flask to Serve/Deploy Your Model: Pinferencia is Here

Are you still writing flask to serve your model? Stop doing that, you have a much better choice now: Pinferencia.

Pinferencia is a python library aims to be the simplest way to serve your model.

Check out at: underneathall/pinferencia: Python + Inference — Model Deployment library in Python. Simplest model inference server ever. (github.com)

What will you get from Pinferencia?

Fast to code, fast to go alive. Minimal codes to write, minimum codes modifications needed. Just based on what you have.
100% Test Coverage: Both statement and branch coverages, no kidding.
Easy to use, easy to understand.
Automatic API documentation page. All API explained in details with online try-out feature. Thanks to FastAPI and Starlette.
Serve any model, even a single function can be served.
Support Kserve API, compatible with Kubeflow, TF Serving, Triton and TorchServe. There is no pain switching to or from them, and Pinferencia is much faster for prototyping!

Is it really simple and easy?

Yes, and a lot easier than other tools.

You just need to add three extra lines.

Checkout the sample on its page to serve a huggingface model:

Ready to get start?

Go visit: Pinferencia (underneathall.app) for detailed examples.

Google T5 Translation as a Service with Just 7 lines of Codes

wjiuhe — Thu, 21 Apr 2022 00:09:44 +0000

What is T5? Text-To-Text Transfer Transformer (T5) from Google gives the power of translation.

In the article, we will deploy Google T5 model as a REST API service. Difficult? What about I’ll tell you: you just need to write 7 lines of codes?

**HuggingFace** makes it easy to use the pretrained model with just several lines.

**Pinferencia** makes it super easy to serve any model with just three extra lines.

Install Dependencies

HuggingFace

pip install "transformers[pytorch]"

If it doesn’t work, please visit Installation (huggingface.co) and check their official documentations.

Pinferencia

pip install "pinferencia[uvicorn]"

If it doesn’t work, please visit Install — Pinferencia (underneathall.app) and check their official documentations.

Define the Service

First let’s create the app.py to define the service:

Start the Service

uvicorn app:service --reload

Wait for the model get downloaded. When it’s finished, you’ll see:

Call the Service

You can use curl or the interactive api page from Pinferencia.

Curl

The result:

Interactive API Page

Result:

If you like Pinferencia don’t forget to go to GitHub and save to your favorites.

HuggingFace Transformers Bert — Unmask the Myth: Play and Deploy within 10 Lines of Codes

wjiuhe — Wed, 20 Apr 2022 08:17:08 +0000

Bert is a fantastic model to play with. It can infer your missing word in the sentence.

In the article, we will deploy Google T5 model as a REST API service. Difficult? What about I’ll tell you: you just need to write 6 lines of codes?

Pinferencia makes it super easy to serve any model with just three extra lines.
HuggingFace makes it easy to use the pretrained model with just several lines.

Install Dependencies

HuggingFace

pip install "transformers[pytorch]"

If it doesn’t work, please visit Installation (huggingface.co) and check their official documentations.

Pinferencia

pip install "pinferencia[uvicorn]"

If it doesn’t work, please visit Install — Pinferencia (underneathall.app) and check their official documentations.

Define the Service

First let’s create the app.py to define the service:

Start the Service

    uvicorn app:service --reload

Wait for the model get downloaded. When it’s finished, you’ll see:

Call the Service

You can use curl or the interactive api page from Pinferencia.

Curl

Result:

The sentence with the highest score is: penguins cannot fly. Make sense, right?

Besides Curl, you can also use Pinferencia’s:

Interactive API Page

Mlearning.ai Submission Suggestions

GPT2 — Text Generation Transformer: How to Use & How to Serve

wjiuhe — Mon, 18 Apr 2022 16:09:58 +0000

What is text generation? Input some texts, and the model will predict what the following texts will be.

Sounds interesting. How can it be interesting without trying out the model by ourself?

How to Use

The model will be downloaded automatically

from transformers import pipeline, set_seed

generator = pipeline("text-generation", model="gpt2")
set_seed(42)


def predict(text):
    return generator(text, max_length=50, num_return_sequences=3)

That's it!

Let's try it out a little bit:

predict("You look amazing today,")

And the result:

[{'generated_text': 'You look amazing today, guys. If you\'re still in school and you still have a job where you work in the field… you\'re going to look ridiculous by now, you\'re going to look really ridiculous."\n\nHe turned to his friends'},
 {'generated_text': 'You look amazing today, aren\'t you?"\n\nHe turned and looked at me. He had an expression that was full of worry as he looked at me. Even before he told me I\'d have sex, he gave up after I told him'},
 {'generated_text': 'You look amazing today, and look amazing in the sunset."\n\nGarry, then 33, won the London Marathon at age 15, and the World Triathlon in 2007, the two youngest Olympians to ride 100-meters. He also'}]

Let's have a look at the first result.

You look amazing today, guys. If you're still in school and you still have a job where you work in the field… you're going to look ridiculous by now, you're going to look really ridiculous."
He turned to his friends

🤣 That's the thing we're looking for! If you run the prediction again, it'll give different results every time.

Deploy the model

Without deployment, how could a machine learning tutorial be complete?

First, let's install Pinferencia.

pip install "pinferencia[uvicorn]"

If you haven't heard of Pinferencia go to its github page or its homepage to check it out, it's an amazing library help you deploy your model with ease.

Create the Service

Now let's create an app.py file with the codes:

from transformers import pipeline, set_seed

from pinferencia import Server

generator = pipeline("text-generation", model="gpt2")
set_seed(42)


def predict(text):
    return generator(text, max_length=50, num_return_sequences=3)


service = Server()
service.register(model_name="gpt2", model=predict)

Start the Server

uvicorn app:service --reload

Test the Service

You can use curl to test, or you can use an interactive UI

Curl

curl -X 'POST' \
    'http://127.0.0.1:8000/v1/models/gpt2/predict' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "id": "string",
        "parameters": {},
        "data": "You look amazing today,"
    }'

Result:

{
    "id": "string",
    "model_name": "gpt2",
    "data": [
        {
            "generated_text": "You look amazing today, I was in front of my friends. I wanted everyone to see me. But that's all. No one really cares about me in the eyes of the whole world unless I love them.\"\n\nIn a second Facebook post"
        },
        {
            "generated_text": "You look amazing today, and I know I am going to get the job done! So thank you all for all those donations, money, help, and hugs. I hope to see you again soon."
        },
        {
            "generated_text": "You look amazing today, but I will have to wait until early June for what will go down as the first NBA championship (a thing I had been expecting). If it's not the biggest, it is also not great. Now let's look at"
        }
    ]
}

Even cooler, go to http://127.0.0.1:8000, and you will have an interactive ui.

You can send predict requests just there!

Go Pinferencia!

If you like Pinferencia, don't forget to go to https://github.com/underneathall/pinferencia and give it a star.

Easiest Way to Deploy HuggingFace Transformers

wjiuhe — Mon, 18 Apr 2022 01:28:58 +0000

You must know transformers. Well, I don’t mean autobots, Bumble Bee, but the famous machine learning structure.
You probably have used the huggingface transformers models. But have you ever deployed them?

With Pinferencia, just add three more lines and your model goes online!

Never heard of Pinferencia? It’s not late. Go to its GitHub to take a look. Don’t forget to give it a star if you like it.

HuggingFace transformer pipeline

How do you use HuggingFace transformer pipeline?

from transformers import pipeline

vision_classifier = pipeline(task="image-classification")


def predict(data):
    return vision_classifier(images=data)

And you can predict a image with its url:

predict("https://cdn.pixabay.com/photo/2018/08/12/16/59/parrot-3601194_1280.jpg")

Result:

[[{'score': 0.9489120244979858, 'label': 'macaw'},
  {'score': 0.014800671488046646, 'label': 'broom'},
  {'score': 0.009150494821369648, 'label': 'swab, swob, mop'},
  {'score': 0.0018255198374390602, 'label': "plunger, plumber's helper"},
  {'score': 0.0017631321679800749,
   'label': 'African grey, African gray, Psittacus erithacus'}]]

Deploy

Now deploy it with Pinferencia, just add three lines and save as app.py

from transformers import pipeline
from pinferencia import Server

vision_classifier = pipeline(task="image-classification")


def predict(data):
    return vision_classifier(images=data)


service = Server()
service.register(model_name="vision", model=predict)

Now go to the terminal and run

uvicorn app:service --reload

Your service is online! Go to http://127.0.0.1:8000 and check out the API.
Now you can send a request:

curl --location --request POST 'http://127.0.0.1:8000/v1/models/vision/predict' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "data": "https://cdn.pixabay.com/photo/2018/08/12/16/59/parrot-3601194_1280.jpg"
    }'

The result:

[[{'score': 0.9489120244979858, 'label': 'macaw'},
  {'score': 0.014800671488046646, 'label': 'broom'},
  {'score': 0.009150494821369648, 'label': 'swab, swob, mop'},
  {'score': 0.0018255198374390602, 'label': "plunger, plumber's helper"},
  {'score': 0.0017631321679800749,
   'label': 'African grey, African gray, Psittacus erithacus'}]]

Or just use the interactive UI Pinferencia provides:

Simple enough, huh?

If you like Pinferencia go https://github.com/underneathall/pinferencia/star and give it a star.

HuggingFace Transformer Pipeline - Vision: How to Use, Deploy and Serve

wjiuhe — Sun, 17 Apr 2022 03:04:05 +0000

In this tutorial, we will explore how to use Hugging Face pipeline, and how to deploy it with Pinferencia as REST API.

Never heard of Pinferencia? It's not late. Check it out at GitHub

Download the model and predict
The model will be automatically downloaded.

from transformers import pipeline
vision_classifier = pipeline(task="image-classification")

vision_classifier(
    images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
)

Result:

[{'label': 'lynx, catamount', 'score': 0.4403027892112732},
 {'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor',
  'score': 0.03433405980467796},
 {'label': 'snow leopard, ounce, Panthera uncia',
  'score': 0.032148055732250214},
 {'label': 'Egyptian cat', 'score': 0.02353910356760025},
 {'label': 'tiger cat', 'score': 0.023034192621707916}]

Let's try another image, and let's try predict two image in one batch:

image = "https://cdn.pixabay.com/photo/2018/08/12/16/59/parrot-3601194_1280.jpg"
vision_classifier(
    images=[image, image]
)

Result:

[[{'score': 0.9489120244979858, 'label': 'macaw'},
  {'score': 0.014800671488046646, 'label': 'broom'},
  {'score': 0.009150494821369648, 'label': 'swab, swob, mop'},
  {'score': 0.0018255198374390602, 'label': "plunger, plumber's helper"},
  {'score': 0.0017631321679800749,
   'label': 'African grey, African gray, Psittacus erithacus'}],
 [{'score': 0.9489120244979858, 'label': 'macaw'},
  {'score': 0.014800671488046646, 'label': 'broom'},
  {'score': 0.009150494821369648, 'label': 'swab, swob, mop'},
  {'score': 0.0018255198374390602, 'label': "plunger, plumber's helper"},
  {'score': 0.0017631321679800749,
   'label': 'African grey, African gray, Psittacus erithacus'}]]

Amazingly easy! Now let's try:

Deploy the model

Without deployment, how could a machine learning tutorial be complete?

First, let's install Pinferencia.

pip install "pinferencia[uvicorn]"

If you haven't heard of Pinferencia go to its github page or its homepage to check it out, it's an amazing library help you deploy your model with ease.

Now let's create an app.py file with the codes:

from transformers import pipeline
from pinferencia import Server
vision_classifier = pipeline(task="image-classification")

def predict(data):
    return vision_classifier(images=data)

service = Server()
service.register(
    model_name="vision",
    model=predict,
)

Easy, right?

Predict

Curl

curl --location --request POST 'http://127.0.0.1:8000/v1/models/vision/predict' \
--header 'Content-Type: application/json' \
--data-raw '{
    "data": "https://cdn.pixabay.com/photo/2018/08/12/16/59/parrot-3601194_1280.jpg"
}'

or use python requests

import requests

response = requests.post(
    url="http://localhost:8000/v1/models/vision/predict",
    json={
        "data": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"  # noqa
    },
)
print("Prediction:", response.json()["data"])

Result:
python test.py

Prediction: [
    {'score': 0.433499813079834, 'label': 'lynx, catamount'},
    {'score': 0.03479616343975067, 'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'},
    {'score': 0.032401904463768005, 'label': 'snow leopard, ounce, Panthera uncia'},
    {'score': 0.023944756016135216, 'label': 'Egyptian cat'},
    {'score': 0.022889181971549988, 'label': 'tiger cat'}
]

Even cooler, go to http://127.0.0.1:8000, and you will have a interactive ui.

You can send predict requests just there!

Go Pinferencia!

If you like Pinferencia, don't forget to go to https://github.com/underneathall/pinferencia and give it a star.

Popular Machine Learning Deployment Tools

wjiuhe — Sat, 16 Apr 2022 15:05:09 +0000

Training machine learning models, you may use pytorch or tensorflow.

But how do you deploy them? Below are some selected tools to serve your machine learning model with introducing much overhead. Go online in no time!

Tensorflow Serving

GitHub

Pro:

Tensorflow serving is a good choice to serve your model if you've already used tensorflow to train your models.

Con:

If you are not using tensorflow, oops.
If you follow the documentations, err, it takes time to setup and debug.
If your prediction goes complicated, the overhead of writing its handlers, configurations and debug can take a lot of time.

Torchserver

GitHub

Pro:

Similar as tensorflow, if you are using pytorch, it's a good choice. Backend uses python, which is your main language during training you're familiar with.

Con:

Same, if you're not using pytorch, then, not a good choice. also if your prediction goes complicated, the overhead is high.

Triton

GitHub

Pro:

Written in cpp, so it's fast.
Support multiple machine learning framework.

Cons:

Written in cpp, inference takes time, and web server in cpp cannot help much, and it become harder to read and debug.
Really poor documentation. The documentation is more likely for the developers not for the users. And also, a lot of configurations.

Pinferencia

GitHub

Pro:

Support any machine learning framework.
Pure in Python and just a simple library. Overhead is very low.
Implementation is intuitive and can integrate with you current codes, which means no matter how complicated your prediction workflow is, you don't need to make many changes. Actually, almost no changes, so you save your time.
User friendly documentation.

Con:

Relative young project. But it has a full 100% test coverage which beats all the other candidates, that is being young, also being mature.
Also about documentations, more example are adding in the progress. You need to find out if you already have the example of your framework: Pinferencia (underneathall.app)

That is the model serving tools for today, next we will introduce how to deploy your models into kubernetes.

Huggingface Transformers Pytorch Tutorial: Load, Predict and Serve/Deploy

wjiuhe — Sat, 16 Apr 2022 10:20:49 +0000

Many of you must have heard of Bert, or transformers.
And you may also know huggingface.

In this tutorial, let's play with its pytorch transformer model and serve it through REST API

How the model works?

With an input of an incomplete sentence, the model will give its prediction:

Input:

Paris is the [MASK] of France.

Output:

Paris is the capital of France.

Cool~let's try this out now~

Prerequisite

For mac users

If you're working on a M1 Mac like me, you need install cmake and rust

brew install cmake

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Install dependencies

You can install dependencies using pip.

pip install tqdm boto3 requests regex sentencepiece sacremoses

or you can use a docker image instead:

docker run -it -p 8000:8000 -v $(pwd):/opt/workspace huggingface/transformers-pytorch-cpu:4.18.0 bash

Load the model

This will load the tokenizer and the model. It may take sometime to download.

import torch

# load tokenizer
tokenizer = torch.hub.load(
    "huggingface/pytorch-transformers",
    "tokenizer",
    "bert-base-cased",
)
# load masked model
masked_lm_model = torch.hub.load(
    "huggingface/pytorch-transformers",
    "modelForMaskedLM",
    "bert-base-cased",
)

Define the predict function

The input text is: Paris is the [MASK] of France.

input_text = "Paris is the [MASK] of France."

First we need to tokenize the

tokens = tokenizer(input_text)

Let's have a look at the masked index:

mask_index = [
    i
    for i, token_id in enumerate(tokens["input_ids"])
    if token_id == tokenizer.mask_token_id
]

Prepare the tensor:

segments_tensors = torch.tensor([tokens["token_type_ids"]])
tokens_tensor = torch.tensor([tokens["input_ids"]])

Predict:

with torch.no_grad():
    predictions = masked_lm_model(
        tokens_tensor, token_type_ids=segments_tensors
    )

Now, let's have a look at the result:

pred_tokens = torch.argmax(predictions[0][0], dim=1)

# replace the initail input text's mask with predicted text
for i in mask_index:
    tokens["input_ids"][i] = pred_tokens[i]
tokenizer.decode(tokens["input_ids"], skip_special_tokens=True)

Output:

'Paris is the capital of France.'

Let's organize the codes in to a predict function:

def predict(input_text):
    # tokenize the input text
    tokens = tokenizer(input_text)

    # get all the mask index
    mask_index = [
        i
        for i, token_id in enumerate(tokens["input_ids"])
        if token_id == tokenizer.mask_token_id
    ]

    # convert the input ids and type ids to tensor
    segments_tensors = torch.tensor([tokens["token_type_ids"]])
    tokens_tensor = torch.tensor([tokens["input_ids"]])

    # run predictions
    with torch.no_grad():
        predictions = masked_lm_model(
            tokens_tensor, token_type_ids=segments_tensors
        )

    # pick the most likely predictions

    pred_tokens = torch.argmax(predictions[0][0], dim=1)

    # replace the initail input text's mask with predicted text
    for i in mask_index:
        tokens["input_ids"][i] = pred_tokens[i]
    return tokenizer.decode(tokens["input_ids"], skip_special_tokens=True)

Run:

predict("Paris is the [MASK] of France.")

Output:

'Paris is the capital of France.'

Serve it through REST API

First, let's install Pinferencia.

pip install "pinferencia[uvicorn]"

If you haven't heard of Pinferencia, go to its github page https://github.com/underneathall/pinferencia or its homepage https://pinferencia.underneathall.app/ to check it out, it's an amazing library help you deploy your model with ease.

Let's save our predict function into a file app.py and add some lines to register it.

import torch
from pinferencia import Server

# load tokenizer
tokenizer = torch.hub.load(
    "huggingface/pytorch-transformers",
    "tokenizer",
    "bert-base-cased",
)
# load masked model
masked_lm_model = torch.hub.load(
    "huggingface/pytorch-transformers",
    "modelForMaskedLM",
    "bert-base-cased",
)


def predict(input_text):
    # tokenize the input text
    tokens = tokenizer(input_text)

    # get all the mask index
    mask_index = [
        i
        for i, token_id in enumerate(tokens["input_ids"])
        if token_id == tokenizer.mask_token_id
    ]

    # convert the input ids and type ids to tensor
    segments_tensors = torch.tensor([tokens["token_type_ids"]])
    tokens_tensor = torch.tensor([tokens["input_ids"]])

    # run predictions
    with torch.no_grad():
        predictions = masked_lm_model(
            tokens_tensor, token_type_ids=segments_tensors
        )

    # pick the most likely predictions
    pred_tokens = torch.argmax(predictions[0][0], dim=1)

    # replace the initail input text's mask with predicted text
    for i in mask_index:
        tokens["input_ids"][i] = pred_tokens[i]
    return tokenizer.decode(tokens["input_ids"], skip_special_tokens=True)


service = Server()
service.register(model_name="transformer", model=predict)

Run the service, and wait for it to load the model and start the server:

uvicorn app:service --reload

Test the service:

Using curl:

curl --location --request POST 'http://127.0.0.1:8000/v1/models/transformer/predict' \
--header 'Content-Type: application/json' \
--data-raw '{
    "data": "Paris is the [MASK] of France."
}'

Response:

{
    "model_name":"transformer",
    "data":"Paris is the capital of France."
}

Cool~~ Not yet, even cooler:

You can use the swagger ui at http://127.0.0.1:8000 (the server's address) to try the prediction:

Serve machine learning models with Pinferencia

wjiuhe — Thu, 14 Apr 2022 15:47:28 +0000

When reading this post, you perhaps have already known or tried torchserve, triton, seldon core, tf serving, even kserve. They are good products. However, if you are not using a very simple model or you have written many codes and the model is just a part of it. It is not that easy to integrate your codes with them.
Here, you have an alternative: Pinferencia (More tutorial, please visit:https://pinferencia.underneathall.app/)

Github: Pinferencia - If you like it, give it a star.

Install

pip install "pinferencia[uvicorn]"

Quick Start

Serve Any Model

app.py

from pinferencia import Server


class MyModel:
    def predict(self, data):
        return sum(data)


model = MyModel()

service = Server()
service.register(
    model_name="mymodel",
    model=model,
    entrypoint="predict",
)

Just run:

uvicorn app:service --reload

Hooray, your service is alive. Go to http://127.0.0.1:8000/ and have fun.

You will have a full API documentation page to play with:

You can test your model right here:

Any Deep Learning Models? Just as easy. Simply train or load your model, and register it with the service. Go alive immediately.

Pytorch

import torch

from pinferencia import Server


# train your models
model = "..."

# or load your models (1)
# from state_dict
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))

# entire model
model = torch.load(PATH)

# torchscript
model = torch.jit.load('model_scripted.pt')

model.eval()

service = Server()
service.register(
    model_name="mymodel",
    model=model,
)

Tensorflow

import tensorflow as tf

from pinferencia import Server


# train your models
model = "..."

# or load your models (1)
# saved_model
model = tf.keras.models.load_model('saved_model/model')

# HDF5
model = tf.keras.models.load_model('model.h5')

# from weights
model = create_model()
model.load_weights('./checkpoints/my_checkpoint')
loss, acc = model.evaluate(test_images, test_labels, verbose=2)

service = Server()
service.register(
    model_name="mymodel",
    model=model,
    entrypoint="predict",
)

Any model of any framework will just work the same way. Now run uvicorn app:service --reload and enjoy!