DEV Community


Posted on

Stop Writing Flask to Serve/Deploy Your Model: Pinferencia is Here

Stop Writing Flask to Serve/Deploy Your Model: Pinferencia is Here

Are you still writing flask to serve your model? Stop doing that, you have a much better choice now: Pinferencia.

Pinferencia is a python library aims to be the simplest way to serve your model.

Check out at: underneathall/pinferencia: Python + Inference — Model Deployment library in Python. Simplest model inference server ever. (

What will you get from Pinferencia?

  • Fast to code, fast to go alive. Minimal codes to write, minimum codes modifications needed. Just based on what you have.

  • 100% Test Coverage: Both statement and branch coverages, no kidding.

  • Easy to use, easy to understand.

  • Automatic API documentation page. All API explained in details with online try-out feature. Thanks to FastAPI and Starlette.

  • Serve any model, even a single function can be served.

  • Support Kserve API, compatible with Kubeflow, TF Serving, Triton and TorchServe. There is no pain switching to or from them, and Pinferencia is much faster for prototyping!

Is it really simple and easy?

Yes, and a lot easier than other tools.

You just need to add three extra lines.

Checkout the sample on its page to serve a huggingface model:

Ready to get start?

Go visit: Pinferencia ( for detailed examples.

Top comments (0)