Yohei Seki

Posted on Oct 9, 2018

KurumiPy: A memoization tool that accelerates Python processing

#kurumipy #python #memoization #opensource

We developed a tool "KurumiPy" that can easily adapt memoization in Python, and released it to GitHub as OSS.

Kurumipy can also use the cache when you run the program again.

https://github.com/FujitsuLaboratories/kurumipy

What is memoization?

It is written in Wikipedia as follows (October 7, 2018).

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.

In a function, it is inefficient to repeat the process that has the same return value for the same input data (argument value) many times.

Therefore, cache (save) the result of the first function processing.For the second and subsequent times, the program does not process within that function and returns the cached data as it is.

This makes it possible to omit to repeat the same process many times, so the processing speeds up correspondingly.

Why did you make KurumiPy?

If you are developing a system that is not heavy processing, the need for memoization is low.

Recently, however, it is often said that AI and data analysis. With the current trend, the use of Python to handle large amounts of data is increasing.

For example, as shown in the figure below, there is a system that processes a number of data with large amounts of sensor data as input.

In such a case, if you change the program a little, try to see the output result by executing the program, the function with the data conversion processing which has not rewritten the program is executed again.

For example, when changing the program in the function of the "Data Conversion processing C" in the above figure and re-executing it, the function part of "Data Preprocessing", "Data Conversion processing A" and "Data Conversion processing B", the result is the same even if you reuse the result you ran before.

In such a case, by reusing the data cached by memoization, processing before "Data Conversion processing C" can be omitted and the output result can be output quickly .

In the case of a small amount of data, there may be no need to do so, but it is worthwhile if the input data becomes more than a few Mbytes.

I was developing a certain system, but in that system it took over 40 seconds to process the data and output the result.

However, by applying memoization in KurumiPy, it took less than 1 second to output the result.

When developing a system that performs data analysis processing by trial and error, it sometimes sometimes wants to change the value of the variable a bit and re-execute to see the result.

In such a case, if it takes many seconds each time to output the result, it will be great stress.

At that time, KurumiPy is playing an active part.

If you cache using DB ...

If you optimize using DB (database) well, you do not have to do the same process many times.

Then, you will not need a memoization tool.

However, when preparing the hypothesis verification phase or prototype, it is troublesome to prepare the DB.

Even if you use DB, it may be difficult to optimize well.

At that time, I want a memoization.

How to use KurumiPy

Supported versions of Python are 3.x.

Please install the dependency package "fasteners 0.14.1" beforehand, for example, as follows.

pip install fasteners==0.14.1

Copy 'memoization' folder to your project, then import the module. Write a decorator to functions to enable memoization.

from memoization.memo_decorator import memo

@memo
def your_function(n):
    # ...

You can apply memoization by this alone.

Cache files will be stored in the folder following.

[./memoization/memocache]

Changes in your function
- KurumiPy automatically invalidates cache when you change implementation of the target function. It is useful for test-driven development.
Changes in dependent variables
- KurumiPy automatically invalidates cache when any dependent variables of the target function have been changed. It is useful if you modify parameters and re-execute your program.

Otherwise, as more and more caches are accumulated in the above folder, please delete the cache file of the above folder if necessary.

Restrictions

Functions should meet the restrictions following.

Pure functions
Arguments are string type, numeric type, etc. (Not supported: list type, dict type, set type, file objects etc.)
Not supported: Mutual recursive function in multiple threads (Dining Philosophers Problem).

Other settings

Python 3.3 enables "salt" on hash of str, bytes and datetime objects by default.

Salt prevents consistent hash values across processes.

You need to disable it before running Python programs.

# bash
export PYTHONHASHSEED=0

# Command prompt
set PYTHONHASHSEED=0

Why is it called "KurumiPy"?

Kurumi stands for walnut in Japanese.

They say walnuts improve memory.

For this reason, we have named "Kurumi" because it matches memoization.

Let's develop smartly and efficiently with KurumiPy!

DEV Community