Yuelin Wen for Squash.io

Posted on Jan 14, 2020 • Edited on Jan 17, 2020 • Originally published at squash.io

16 Amazing Python libraries you should be using now (2020 updated)

#python #productivity #opensource

In this article we will get familiar with several amazing Python libraries being used by the best software teams. With the exception of HTTPX (which is in beta), the libraries listed below are being actively developed & maintained and are backed by a strong community.

HTTPX

HTTPX was developed by Tom Christine, who is a software engineer specialized in API design and development.

The async paradigm is increasingly becoming more common in high-performance modern applications, but when you are using the async paradigm, the Requests library cannot do its job very well.

Therefore, HTTPX comes to solve this problem. HTTPX is an asynchronous HTTP client based on the well-established usability of Requests and gives you HTTP/2 and HTTP/1.1 support. It gives you an API compatible with the Requests library as much as possible, and it has a tight control on the timeouts. HTTPX can also call directly into a Python web application using the ASGI protocol and it is fully type annotated. The library is also equipped with all the standard features of Requests, such as International Domains and URLs, Keep-Alive & Connection Pooling, and Sessions with Cookie Persistence, etc.

Moreover, you can use either asyncio or trio for HTTPX and apply this in high-performance async web frameworks. As a result HTTPX is able to handle a large numbers of requests.

Arrow

As many Python developers know, one way to work with date & time objects is by using the uncompleted date, time, and timezone functions in the Python's standard library and some other low-level modules. However, they don't have high performance and good usability; for example, they have too many modules and types, which is hard for users to remember and distinguish. The conversions between timezones and timestamps are also long-winded.

Luckily, a sensible and human-friendly Python Library called Arrow can help users create, manipulate, format, and convert dates. It mainly aims to reduce your imports and code for dealing with dates and times.

Arrow supports Python 2.7, 3.5, 3.6, 3.7 and 3.8, and broadly supports ISO 8601. Of course, it can do timezone conversion, and the timestamp is also a property in it.

Some features of Arrow can give you a clue of its capability. Arrow has a drop-in replacement for DateTime, and is Timezone-aware by default. Users will get easy-to-use creation options for many general input scenarios. The shift method is beneficial for relative offsets, such as offsets for weeks. It's possible to automatically format and parse strings, which saves a lot of time. One more attractive feature is that Arrow can create periods, ranges, floors, and ceilings for time frames, ranging from microseconds to years.

Python Fire

Python Fire can automatically generate CLIs for any projects. The library makes the process of creating CLIs super simple. You only need to write the functionality at the command line as a function, module, or class, and once you call Fire, the CLI that you desire is ready for you.

You probably want to test early on during the process of writing a Python library. Without Python Fire, you have to write a primary test method to run the functionality you desire to achieve. Although you can do the test in this way, you need to change the primary method whenever you have new features you want to test, updating the primary method again and again, which is time-consuming and annoying. Using Fire you don't need to change your primary test method continuously when you test your command line.

It is usually not super quick to understand a function by looking at its code, particularly when the function was written by someone else. A better way is to call Fire on the module. This feature allows you to easily inspect all module functions/methods.

Moreover, Fire can let you transit directly between Bash and Python so that you can use the unix tools at the same time.

Starlette

Starlette is a lightweight ASGI framework or toolkit for building high-performance asyncio services.

This production-ready library has many features, including support for WebSocket and GraphQL. Starlette can do in-process background tasks, CORS, GZip, Static Files and Streaming responses. All of these features have extensive test coverage and a code base that is 100% type annotated with zero hard dependencies.

Starlette is meant to be used as a complete framework, or as an ASGI toolkit, providing users with the flexibility to apply any of the components independently. Moreover, reusable components can be shared between any ASGI framework, creating an ecosystem of shared middleware and mountable applications.

Mypy

Mypy is an optimal static type checker for Python 3 and Python 2.7, and it is similar to a static analyzer, or a lint-like tool. By adding some typing annotations when you write your program, Mypy can help you type check your code and look for general bugs. Those annotations you leave in your code guide Mypy to do its job without interfering with your application because the annotations are viewed as comments that have no effects in the execution of your code.

Mypy gives developers the flexibility to decide the workflow. The purpose of Mypy is to combine the advantages of dynamic typing and static typing in the applications. Therefore, users can use dynamic typing as a backup when static typing doesn't work, such as for legacy code. When you run Mypy in your program like a linter, the errors will be reported in a compiler-style format. Mypy provides programmers with a robust and consistent check for a project and its dependencies.

Another advantage of Mypy is its learning curve, which is minimal. Most new users of Mypy should be able to annotate the code correctly for the first time. Moreover, the Mypy cheat sheet is a perfect start. One more advantage of Mypy is that It has a much lower false-positive rate compared to most static analyzers.

FastAPI

FastAPI is a high-performance web framework for API development. FastAPI is based on standard Python type hints for Python 3.6+.

FastAPI comes with many interesting features, perhaps the most important one is speed since it is one of the fastest Python frameworks available. The speed of coding is also 200% faster compared to other frameworks.

And if that wasn't enough, FastAPI can help to keep a low bug rate and can even reduce human related errors by almost 40%. The framework is easy to learn and has an interactive documentation.

FastAPI is based on open standards, for instance OpenAPI. It also comes with declarations of path operations, parameters, body requests, and security. Features for automatic client code generation in different languages are also available.

Immutables

Immutables is an immutable mapping type for Python. A Hash Array Mapped Trie (HAMT) used in Clojure, Scala, Haskell, and other functional languages.

Immutable mappings based on HAMT have O(log N) performance for both set() and get() operations, which is essentially O(1) for relatively small mappings.

Expiring Dict

Expiring Dict is a very handy Python caching library. It provides a dictionary with ordering and auto-expiring values for caching purposes. Dictionary elements have a TTL (max age) and max length, which are checked on each access.

VCR.py

Have you ever used Ruby's VCR library? It records your test suite's HTTP interactions and replays them during future test runs for fast, deterministic and accurate tests.

VCR.py is similar to Ruby's VCR library. It makes the tests with HTTP requests more straightforward and quicker. If your code is in a VCR.py context manager or decorated function, when you run it VCR.py will record all the HTTP interactions during your tests. Then VCR.py will serialize and write the HTTP interactions to a cassette, which is a flat file. When you execute the recorded code it will replay the serialized requests and responses from the cassette file.

This process has many benefits because the requests will not generate any HTTP traffic. Therefore, VCR.py can do its job offline, help to generate deterministic tests as well as significantly increase their execution speed.

When you make changes to your tests all you need to do is to delete the cassette file. Then when you re-run the code, VCR.py will record again the HTTP interactions and generate a new cassette.

Transformers

Transformers (former pytorch-transformers and pytorch-pretrained-bert) provides Natural Language Understanding (NLU) and Natural Language Generation (NLG) with more than 32 prepared models in over 100 languages and deep interoperability between TensorFlow 2.0 and PyTorch. Those architectures generally are BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, and others.

Transformers is an easy-to-use library with no barriers to get familiar with. It is even more powerful and concise than Keras, with the added benefit that it has extraordinary high performance when dealing with NLU and NLG projects.

The goal of Transformers is to become the state-of-the-art NLP for everyone. No matter if you are a deep learning researcher, hands-on practitioner, or AI/ML/NLP teacher and educator, you can get a lot of support from Transformers. The library is very efficient as it reduces the compute costs. For example, you can share your trained models with your colleagues so that they do not need to spend extra efforts retraining models. It comes with more than 30 preset models, and over 100 languages which can save a lot of time on computing and thus help to reduce production costs.

Lastly, Transformers can use the most suitable framework for each part of your applications over time. Specifically, Transformers can use only three lines of code to train a state-of-the-art system for you. The deep interoperability between TensorFlow 2.0 and PyTorch models is an added benefit.

Modin

If you are a data scientist you are probably familiar with the Pandas library. Modin can scale your Pandas workflows by changing only one line of code. Sounds great, right?

Before proving more details on Modin we first need to introduce Ray, which is the key to how Modin works. Ray is a fast and accessible framework for building and running distributed applications. Ray is packaged with Tune, RLlib, and Distributed Training.

The relationship between Modin and Ray is that Modin uses Ray to speed up the Pandas notebook, scripts, and libraries in an effortless way. A weakness of other distributed DataFrame libraries is the regular conflicts among Pandas code. Modin can solve this problem by integrating the Pandas code seamlessly.

Using Modin, you do not need to know details about the cores of your system and the distribution of the data. You do not need to abandon the Pandas notebooks you previously used to take advantage of the enormous acceleration provided by Modin.

Moreover, the modin.pandas is an extremely lightweight parallel DataFrame. After you install Modin, there is no need to stop using the previous Pandas API because Modin can do transparent distribution of the data and computation.

Sometimes, when facing different data sizes it may be difficult to scale 1KB DataFrames to 1TB, there is often a significant overhead. With Modin, you can directly get DataFrames at both 1KB and 1TB.

Dash

If you are building web apps with complex or large datasets, Dash might be for you. Dash is very suitable for visualizing data and it also provides apps with customized user interfaces in pure Python.

Dash can even help you to build user interfaces with Python code in a short amount of time. Dash replaces many of the tools and technologies used for building an interactive web application and it does its job in a simple way.

Detectron 2

Detectron is an object detection platform. It is one of the open-source projects adopted by the Facebook AI Research group.

Detectron 2 is the second generation of this library with many performance enhancements. The library is flexible and extensible and makes training on GPU servers a very quick process. It also comes with state-of-the-art object detection algorithms, allowing developers to do advanced research without the whole complete dataset.

Detectron 2 was rewritten from Scratch in PyTorch, which is a great tool for deep learning. The vast and active community behind PyTorch is an added benefit for Detectron2 users.

With Detectron2, users can insert their customized code into the object detection system as they see fit. Under this situation, hundreds of lines of code can successfully develop a new research project, and the core Detectron2 and the brand-new research achievement can be divided clearly. Moreover, Detectron2 also supports semantic segmentation and panoptic segmentation.

The reason why Detectron2 is faster than the original version is due to moving the entire training pipeline to GPU. Distributing the training to different GPU servers makes the scaling process of large data sets much easier.

Streamlit

Streamlit is a library that can provide you with the fastest way to build custom Machine Learning tools.

The library embraces Python scripting resulting in clean code and fast prototyping. Each change in the Python code directly reruns the code from top to bottom. The cache primitive in Streamlit is consistent, immutable-by-default so users can reuse the information effortlessly.

Imbalanced-learn

Imbalanced-learn offers many re-sampling techniques for machine learning projects. This python library is widely used in datasets to show a robust between-class imbalance.

Imbalanced-learn is compatible with scikit-learn. Scikit-learn is a simple and efficient tool for predictive data analysis. It is built on NumPy, SciPy,and matplotlib. It is open source and very reusable. The library is available for Python 3.6+.

PyTorch

PyTorch offers rapid prototyping for dynamic neural networks and strong GPU support.

The compositions of PyTorch are torch, torch.autograd, torch.jit, torch.nn, torch.multiprocessing, and torch.utils. Component torch is a NumPy like Tensor library, which has the strong GPU support. Torch.autograd is a tape-based automatic differentiation library that is accessible to all various Tensor operations in torch. Torch.jit can work as a compilation stack to create serializable and ideal models. Troch.nn is aggregated with autograd as a neural network aiming for maximum flexibility. Torch.multiprocessing is useful for data loading and Hogwild training. Lastly, torch.utils has a data loader and other utility functions for convenience.