Sharath Hebbar
Posted on
Joblib
Joblib
Joblib is a set of tools to provide lightweight pipelining in Python. In particular: transparent disk-caching of functions and lazy re-evaluation (memoize pattern) easy simple parallel computing.
Why it is used?
- Better performance
- reproducibility
- Avoid computing the same thing twice
- Persist to disk transparently
Features
Transparent and fast disk-caching of output value
Embarrassingly parallel helper
Fast compressed Persistence
Importing libraries
from joblib import Memory,Parallel, delayed,dump,loadimport pandas as pdimport numpy as npimport math
Data Creation
my_dir = '/content/sample_data'a = np.vander(np.arange(3))print(a)output: [[0 0 1] [1 1 1] [4 2 1]]
Memory
mem = Memory(my_dir)output: [[ 0 0 1] [ 1 1 1] [16 4 1]]sqr = mem.cache(np.square)b = sqr(a)print(b)output: [[ 0 0 1] [ 1 1 1] [16 4 1]]
Parallel
%%timeParallel(n_jobs=1)(delayed(np.square)(i) for i in range(10))output: CPU times: user 2.85 ms, sys: 0 ns, total: 2.85 msWall time: 3 ms[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]%%timeParallel(n_jobs=2)(delayed(np.square)(i) for i in range(10))output: CPU times: user 42.7 ms, sys: 762 µs, total: 43.5 msWall time: 75.9 ms[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]%%timeParallel(n_jobs=3)(delayed(np.square)(i) for i in range(10))output: CPU times: user 92.9 ms, sys: 8.93 ms, total: 102 msWall time: 151 ms[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Dump
dump(a,'/content/sample_data/a.job')output: ['/content/sample_data/a.job']Loadaa = load('/content/sample_data/a.job')print(aa)output: array([[0, 0, 1], [1, 1, 1], [4, 2, 1]])
References
Documentation:https://joblib.readthedocs.io
Download:https://pypi.python.org/pypi/joblib#downloads
Source code:https://github.com/joblib/joblib
Report issues:https://github.com/joblib/joblib/issues
Top comments(0)
Subscribe
For further actions, you may consider blocking this person and/orreporting abuse