NumPy security#
Security issues can be reported privately as described in the project READMEand when opening anew issue on the issue tracker.ThePython security reporting guidelinesare a good resource and its notes apply also to NumPy.
NumPy’s maintainers are not security experts. However, we are conscientiousabout security and experts of both the NumPy codebase and how it’s used.Please do notify us before creating security advisories against NumPy aswe are happy to prioritize issues or help with assessing the severity of a bug.A security advisory we are not aware of beforehand can lead to a lot of workfor all involved parties.
Advice for using NumPy on untrusted data#
A user who can freely execute NumPy (or Python) functions must be consideredto have the same privilege as the process/Python interpreter.
That said, NumPy should be generally safe to use ondata provided byunprivileged users and read through safe API functions (e.g. loaded from atext file or.npy file without pickle support).Maliciousvalues ordata sizes should never lead to privilege escalation.Note that the above refers to array data. We do not currently consider forexamplef2py to be safe:it is typically used to compile a program that is then run.Anyf2py invocation must thus use the same privilege as the later execution.
The following points may be useful or should be noted when working withuntrusted data:
Exhausting memory can result in an out-of-memory kill, which is a possibledenial of service attack. Possible causes could be:
Functions reading text files, which may require much more memory thanthe original input file size.
If users can create arbitrarily shaped arrays, NumPy’s broadcasting meansthat intermediate or result arrays can be much larger than the inputs.
NumPy structured dtypes allow for a large amount of complexity. Fortunately,most code fails gracefully when a structured dtype is provided unexpectedly.However, code should either disallow untrusted users to provide these(e.g. via
.npyfiles) or carefully check the fields included fornested structured/subarray dtypes.Passing on user input should generally be considered unsafe(except for the data being read).An example would be
np.dtype(user_string)ordtype=user_string.The speed of operations can depend on values and memory order can lead tolarger temporary memory use and slower execution.This means that operations may be significantly slower or use more memorycompared to simple test cases.
When reading data, consider enforcing a specific shape (e.g. one dimensional)or dtype such as
float64,float32, orint64to reduce complexity.
When working with non-trivial untrusted data, it is advisable to sandbox theanalysis to guard against potential privilege escalation.This is especially advisable if further libraries based on NumPy are used sincethese add additional complexity and potential security issues.