lonelyenvoy/python-memoizationPublic

NotificationsYou must be signed in to change notification settings
Fork15
Star240

A powerful caching library for Python, with TTL support and multiple algorithm options.

License

MIT license

240 stars 15 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
memoization		memoization
.coveragerc		.coveragerc
.gitignore		.gitignore
.mypy.ini		.mypy.ini
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
README.md		README.md
examples.py		examples.py
setup.py		setup.py
test.py		test.py

Repository files navigation

python-memoization

A powerful caching library for Python, with TTL support and multiple algorithm options.

If you like this work, pleasestar it on GitHub.

Why choose this library?

Perhaps you know aboutfunctools.lru_cachein Python 3, and you may be wondering why we are reinventing the wheel.

Well, actually not. This lib is based onfunctools. Please find below the comparison withlru_cache.

Features	`functools.lru_cache`	`memoization`
Configurable max size	✔️	✔️
Thread safety	✔️	✔️
Flexible argument typing (typed & untyped)	✔️	Always typed
Cache statistics	✔️	✔️
LRU (Least Recently Used) as caching algorithm	✔️	✔️
LFU (Least Frequently Used) as caching algorithm	No support	✔️
FIFO (First In First Out) as caching algorithm	No support	✔️
Extensibility for new caching algorithms	No support	✔️
TTL (Time-To-Live) support	No support	✔️
Support for unhashable arguments (dict, list, etc.)	No support	✔️
Custom cache keys	No support	✔️
On-demand partial cache clearing	No support	✔️
Iterating through the cache	No support	✔️
Python version	3.2+	3.4+

memoization solves some drawbacks offunctools.lru_cache:

lru_cache does not supportunhashable types, which means function arguments cannot contain dict or list.

>>>fromfunctoolsimportlru_cache>>>@lru_cache()...deff(x):returnx...>>>f([1,2])# unsupportedTraceback (mostrecentcalllast):File"<stdin>",line1,in<module>TypeError:unhashabletype:'list'

lru_cache is vulnerable tohash collision attackand can be hacked or compromised. Using this technique, attackers can make your programunexpectedly slow byfeeding the cached function with certain cleverly designed inputs. However, inmemoization, caching is alwaystyped, which meansf(3) andf(3.0) will be treated as different calls and cached separately. Also,you can build your own cache key with a unique hashing strategy. These measuresprevents the attack fromhappening (or at least makes it a lot harder).

>>>hash((1,))3430019387558>>>hash(3430019387558.0)# two different arguments with an identical hash value3430019387558

Unlikelru_cache,memoization is designed to be highly extensible, which make it easy for developers to add and integrateany caching algorithms (beyond FIFO, LRU and LFU) into this library. SeeContributing Guidance for further detail.

Installation

pip install -U memoization

1-Minute Tutorial

frommemoizationimportcached@cacheddeffunc(arg):    ...# do something slow

Simple enough - the results offunc() are cached.Repetitive calls tofunc() with the same arguments runfunc() only once, enhancing performance.

⚠️WARNING: for functions with unhashable arguments, the default setting may not enablememoization to work properly. Seecustom cache keys section below for details.

15-Minute Tutorial

You will learn about the advanced features in the following tutorial, which enable you to customizememoization .

Configurable options includettl,max_size,algorithm,thread_safe,order_independent andcustom_key_maker.

TTL (Time-To-Live)

@cached(ttl=5)# the cache expires after 5 secondsdefexpensive_db_query(user_id):    ...

For impure functions, TTL (in second) will be a solution. This will be useful when the function returns resources that is valid only for a short time, e.g. fetching something from databases.

Limited cache capacity

@cached(max_size=128)# the cache holds no more than 128 itemsdefget_a_very_large_object(filename):    ...

By default, if you don't specifymax_size, the cache can hold unlimited number of items.When the cache is fully occupied, the former data will be overwritten by a certain algorithm described below.

Choosing your caching algorithm

frommemoizationimportcached,CachingAlgorithmFlag@cached(max_size=128,algorithm=CachingAlgorithmFlag.LFU)# the cache overwrites items using the LFU algorithmdeffunc(arg):    ...

Possible values foralgorithm are:

CachingAlgorithmFlag.LRU:Least Recently Used (default)
CachingAlgorithmFlag.LFU:Least Frequently Used
CachingAlgorithmFlag.FIFO:First In First Out

This option is valid only when amax_size is explicitly specified.

Thread safe?

@cached(thread_safe=False)deffunc(arg):    ...

thread_safe isTrue by default. Setting it toFalse enhances performance.

Order-independent cache key

By default, the following function calls will be treated differently and cached twice, which means the cache misses at the second call.

func(a=1,b=1)func(b=1,a=1)

You can avoid this behavior by passing anorder_independent argument to the decorator, although it will slow down the performance a little bit.

@cached(order_independent=True)deffunc(**kwargs):    ...

Custom cache keys

Prior to memorize your function inputs and outputs (i.e. putting them into a cache),memoization needs tobuild acache key using the inputs, so that the outputs can be retrieved later.

By default,memoization tries to combine all your functionarguments and calculate its hash value usinghash(). If it turns out that parts of your arguments areunhashable,memoization will fall back to turning them into a string usingstr(). This behavior relieson the assumption that the string exactly represents the internal state of the arguments, which is true forbuilt-in types.

However, this is not true for all objects.If you pass objects which areinstances of non-built-in classes, sometimes you will need to override the default key-making procedure,because thestr() function on these objects may not hold the correct information about their states.

Here are some suggestions.Implementations of a valid key maker:

MUST be a function with the same signature as the cached function.
MUST produce unique keys, which means two sets of different arguments always map to two different keys.
MUST produce hashable keys, and a key is comparable with another key (memoization only needs to check for their equality).
should compute keys efficiently and produce small objects as keys.

Example:

defget_employee_id(employee):returnemployee.id# returns a string or a integer@cached(custom_key_maker=get_employee_id)defcalculate_performance(employee):    ...

Note that writing a robust key maker function can be challenging in some situations. If you find it difficult,feel free to ask for help by submitting anissue.

Knowing how well the cache is behaving

>>>@cached...deff(x):returnx...>>>f.cache_info()CacheInfo(hits=0,misses=0,current_size=0,max_size=None,algorithm=<CachingAlgorithmFlag.LRU:2>,ttl=None,thread_safe=True,order_independent=False,use_custom_key=False)

Withcache_info, you can retrieve the number ofhits andmisses of the cache, and other information indicating the caching status.

hits: the number of cache hits
misses: the number of cache misses
current_size: the number of items that were cached
max_size: the maximum number of items that can be cached (user-specified)
algorithm: caching algorithm (user-specified)
ttl: Time-To-Live value (user-specified)
thread_safe: whether the cache is thread safe (user-specified)
order_independent: whether the cache is kwarg-order-independent (user-specified)
use_custom_key: whether a custom key maker is used

Other APIs

Access the original undecorated functionf byf.__wrapped__.
Clear the cache byf.cache_clear().
Check whether the cache is empty byf.cache_is_empty().
Check whether the cache is full byf.cache_is_full().
DisableSyntaxWarning bymemoization.suppress_warnings().

Advanced API References

Details

Checking whether the cache contains something

cache_contains_argument(function_arguments, alive_only)

Return True if the cache contains a cached item with the specified function call arguments:param function_arguments:  Can be a list, a tuple or a dict.                            - Full arguments: use a list to represent both positional arguments and keyword                              arguments. The list contains two elements, a tuple (positional arguments) and                              a dict (keyword arguments). For example,                                f(1, 2, 3, a=4, b=5, c=6)                              can be represented by:                                [(1, 2, 3), {'a': 4, 'b': 5, 'c': 6}]                            - Positional arguments only: when the arguments does not include keyword arguments,                              a tuple can be used to represent positional arguments. For example,                                f(1, 2, 3)                              can be represented by:                                (1, 2, 3)                            - Keyword arguments only: when the arguments does not include positional arguments,                              a dict can be used to represent keyword arguments. For example,                                f(a=4, b=5, c=6)                              can be represented by:                                {'a': 4, 'b': 5, 'c': 6}:param alive_only:          Whether to check alive cache item only (default to True).:return:                    True if the desired cached item is present, False otherwise.

cache_contains_result(return_value, alive_only)

Return True if the cache contains a cache item with the specified user function return value. O(n) timecomplexity.:param return_value:        A return value coming from the user function.:param alive_only:          Whether to check alive cache item only (default to True).:return:                    True if the desired cached item is present, False otherwise.

Iterating through the cache

cache_arguments()

Get user function arguments of all alive cache elementssee also: cache_items()Example:   @cached   def f(a, b, c, d):       ...   f(1, 2, c=3, d=4)   for argument in f.cache_arguments():       print(argument)  # ((1, 2), {'c': 3, 'd': 4}):return: an iterable which iterates through a list of a tuple containing a tuple (positional arguments) and        a dict (keyword arguments)

cache_results()

Get user function return values of all alive cache elementssee also: cache_items()Example:   @cached   def f(a):       return a   f('hello')   for result in f.cache_results():       print(result)  # 'hello':return: an iterable which iterates through a list of user function result (of any type)

cache_items()

Get cache items, i.e. entries of all alive cache elements, in the form of (argument, result).argument: a tuple containing a tuple (positional arguments) and a dict (keyword arguments).result: a user function return value of any type.see also: cache_arguments(), cache_results().Example:   @cached   def f(a, b, c, d):       return 'the answer is ' + str(a)   f(1, 2, c=3, d=4)   for argument, result in f.cache_items():       print(argument)  # ((1, 2), {'c': 3, 'd': 4})       print(result)    # 'the answer is 1':return: an iterable which iterates through a list of (argument, result) entries

cache_for_each()

Perform the given action for each cache element in an order determined by the algorithm until allelements have been processed or the action throws an error:param consumer:           an action function to process the cache elements. Must have 3 arguments:                             def consumer(user_function_arguments, user_function_result, is_alive): ...                           user_function_arguments is a tuple holding arguments in the form of (args, kwargs).                             args is a tuple holding positional arguments.                             kwargs is a dict holding keyword arguments.                             for example, for a function: foo(a, b, c, d), calling it by: foo(1, 2, c=3, d=4)                             user_function_arguments == ((1, 2), {'c': 3, 'd': 4})                           user_function_result is a return value coming from the user function.                           is_alive is a boolean value indicating whether the cache is still alive                           (if a TTL is given).

Removing something from the cache

cache_clear()

Clear the cache and its statistics information

cache_remove_if(predicate)

Remove all cache elements that satisfy the given predicate:param predicate:           a predicate function to judge whether the cache elements should be removed. Must                            have 3 arguments, and returns True or False:                              def consumer(user_function_arguments, user_function_result, is_alive): ...                            user_function_arguments is a tuple holding arguments in the form of (args, kwargs).                              args is a tuple holding positional arguments.                              kwargs is a dict holding keyword arguments.                              for example, for a function: foo(a, b, c, d), calling it by: foo(1, 2, c=3, d=4)                              user_function_arguments == ((1, 2), {'c': 3, 'd': 4})                            user_function_result is a return value coming from the user function.                            is_alive is a boolean value indicating whether the cache is still alive                            (if a TTL is given).:return:                    True if at least one element is removed, False otherwise.

Q&A

Q: There are duplicated code inmemoization and most of them can be eliminated by using another level ofabstraction (e.g. classes and multiple inheritance). Why not refactor?
A: We would like to keep the code in a proper level of abstraction. However, these abstractions make it run slower.As this is a caching library focusing on speed, we have to give up some elegance for better performance. Refactoringis our future work.
Q: I have submitted an issue and not received a reply for a long time. Anyone can help me?
A: Sorry! We are not working full-time, but working voluntarily on this project, so you might experience some delay.We appreciate your patience.

Contributing

This project welcomes contributions from anyone.

Read Contributing Guidance first.
Submit bugs and help us verify fixes.
Submit pull requests for bug fixes and features and discuss existing proposals. Please make sure that your PR passes the tests intest.py.
See contributors of this project.

License

The MIT License

About

A powerful caching library for Python, with TTL support and multiple algorithm options.

Releases12

Version 0.4.0 Latest

Aug 1, 2021

+ 11 releases

Packages

No packages published

Contributors2

Languages

Python100.0%

Movatterモバイル変換

License

lonelyenvoy/python-memoization

Folders and files

Latest commit

History

Repository files navigation

python-memoization

Why choose this library?

Installation

1-Minute Tutorial

15-Minute Tutorial

TTL (Time-To-Live)

Limited cache capacity

Choosing your caching algorithm

Thread safe?

Order-independent cache key

Custom cache keys

Knowing how well the cache is behaving

Other APIs

Advanced API References

Checking whether the cache contains something

cache_contains_argument(function_arguments, alive_only)

cache_contains_result(return_value, alive_only)

Iterating through the cache

cache_arguments()

cache_results()

cache_items()

cache_for_each()

Removing something from the cache

cache_clear()

cache_remove_if(predicate)

Q&A

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases12

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages