Python: Cache function output
Joblib traces parameters passed to a function, and if the function has been called with the same parameters it returns the return value cached on a disk.
With only a few lines of code, you get caching of the output of any function:
import tempfile
import sklearn.externals
memory = sklearn.externals.joblib.Memory(
cachedir=tempfile.mkdtemp(),
verbose=0,
)
@memory.cache
def computation(p1, p2):
pass
Via Roman Kierzkowski.
To clear the cache, use memory.clear()
.
Comparison with memoize
The memoize decorator caches in memory all the inputs and outputs of a function call. Using memoize with large objects will consume all the memory, where with Memory, objects are persisted to disk, using a persister optimized for speed and memory usage. Memoize is best suited for functions with “small” input and output objects, whereas Memory is best suited for functions with complex input and output objects, and aggressive persistence to disk.
Via PythonHosted.
Gotchas
memory is designed for pure functions and it is not recommended to use it for methods. If you want to use cache inside a class the recommended pattern is to cache a pure function and use the cached function inside your class, i.e. something like this:
@mem.cache
def compute_func(arg1, arg2, arg3):
# long computation
return result
class Foo(object):
def __init__(self, args):
self.data = None
def compute(self):
self.data = compute_func(self.arg1, self.arg2, 40)
Using Memory for methods is not recommended and has some caveats that make it very fragile from a maintenance point of view because it is very easy to forget about these caveats when your software evolves. If you still want to do it, here are a few known caveats: [First, y]ou cannot decorate a method at class definition. The following code won’t work:
class Foo(object):
@mem.cache # WRONG
def method(self, args):
pass
The right way to do this is to decorate at instantiation time:
class Foo(object):
def __init__(self, args):
self.method = mem.cache(self.method)
def method(self, ...):
pass
[Second, t]he cached method will have self as one of its arguments. That means that the result will be recomputed if anything with self changes.
Via PythonHosted.
Ignoring some arguments
It may be useful not to recalculate a function when certain arguments change, for instance a debug flag. Memory provides the
ignore
list:
>>> @memory.cache(ignore=['debug'])
... def my_func(x, debug=True):
... print('Called with x = %s' % x)
>>> my_func(0)
Called with x = 0
>>> my_func(0, debug=False)
>>> my_func(0, debug=True)
>>> # my_func was not reevaluated
Via PythonHosted.
Leave a comment