DEV Community

Sharath Hebbar
Sharath Hebbar

Posted on

Joblib

Joblib

Joblib is a set of tools to provide lightweight pipelining in Python. In particular: transparent disk-caching of functions and lazy re-evaluation (memoize pattern) easy simple parallel computing.

Why it is used?

  • Better performance
  • reproducibility
  • Avoid computing the same thing twice
  • Persist to disk transparently

Features

Transparent and fast disk-caching of output value
Embarrassingly parallel helper
Fast compressed Persistence

Importing libraries

from joblib import Memory,Parallel, delayed,dump,load
import pandas as pd
import numpy as np
import math
Enter fullscreen mode Exit fullscreen mode

Data Creation

my_dir = '/content/sample_data'
a = np.vander(np.arange(3))
print(a)
output: [[0 0 1]  [1 1 1]  [4 2 1]]
Enter fullscreen mode Exit fullscreen mode

Memory

mem = Memory(my_dir)
output: [[ 0  0  1]  [ 1  1  1]  [16  4  1]]
sqr = mem.cache(np.square)
b = sqr(a)
print(b)
output: [[ 0  0  1]  [ 1  1  1]  [16  4  1]]
Enter fullscreen mode Exit fullscreen mode

Parallel

%%time
Parallel(n_jobs=1)(delayed(np.square)(i) for i in range(10))
output: CPU times: user 2.85 ms, sys: 0 ns, total: 2.85 ms
Wall time: 3 ms
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
%%time
Parallel(n_jobs=2)(delayed(np.square)(i) for i in range(10))
output: CPU times: user 42.7 ms, sys: 762 µs, total: 43.5 ms
Wall time: 75.9 ms
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
%%time
Parallel(n_jobs=3)(delayed(np.square)(i) for i in range(10))
output: CPU times: user 92.9 ms, sys: 8.93 ms, total: 102 ms
Wall time: 151 ms
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Enter fullscreen mode Exit fullscreen mode

Dump

dump(a,'/content/sample_data/a.job')
output: ['/content/sample_data/a.job']
Load
aa = load('/content/sample_data/a.job')
print(aa)
output: array([[0, 0, 1],        [1, 1, 1],        [4, 2, 1]])
Enter fullscreen mode Exit fullscreen mode

References

Documentation: https://joblib.readthedocs.io
Download: https://pypi.python.org/pypi/joblib#downloads
Source code: https://github.com/joblib/joblib
Report issues: https://github.com/joblib/joblib/issues

Source:
https://medium.com/r/?url=https%3A%2F%2Fgithub.com%2FSharathHebbar%2FData-Science-and-ML%2Ftree%2Fmain%2Fcodes%2Fjoblib

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay