What is NumPy?#

NumPy is the fundamental package for scientific computing in Python.It is a Python library that provides a multidimensional array object,various derived objects (such as masked arrays and matrices), and anassortment of routines for fast operations on arrays, includingmathematical, logical, shape manipulation, sorting, selecting, I/O,discrete Fourier transforms, basic linear algebra, basic statisticaloperations, random simulation and much more.

At the core of the NumPy package, is thendarray object. Thisencapsulatesn-dimensional arrays of homogeneous data types, withmany operations being performed in compiled code for performance.There are several important differences between NumPy arrays and thestandard Python sequences:

  • NumPy arrays have a fixed size at creation, unlike Python lists(which can grow dynamically). Changing the size of anndarray willcreate a new array and delete the original.

  • The elements in a NumPy array are all required to be of the samedata type, and thus will be the same size in memory. The exception:one can have arrays of (Python, including NumPy) objects, therebyallowing for arrays of different sized elements.

  • NumPy arrays facilitate advanced mathematical and other types ofoperations on large numbers of data. Typically, such operations areexecuted more efficiently and with less code than is possible usingPython’s built-in sequences.

  • A growing plethora of scientific and mathematical Python-basedpackages are using NumPy arrays; though these typically supportPython-sequence input, they convert such input to NumPy arrays priorto processing, and they often output NumPy arrays. In other words,in order to efficiently use much (perhaps even most) of today’sscientific/mathematical Python-based software, just knowing how touse Python’s built-in sequence types is insufficient - one alsoneeds to know how to use NumPy arrays.

The points about sequence size and speed are particularly important inscientific computing. As a simple example, consider the case ofmultiplying each element in a 1-D sequence with the correspondingelement in another sequence of the same length. If the data arestored in two Python lists,a andb, we could iterate overeach element:

c=[]foriinrange(len(a)):c.append(a[i]*b[i])

This produces the correct answer, but ifa andb each containmillions of numbers, we will pay the price for the inefficiencies oflooping in Python. We could accomplish the same task much morequickly in C by writing (for clarity we neglect variable declarationsand initializations, memory allocation, etc.)

for(i=0;i<rows;i++){c[i]=a[i]*b[i];}

This saves all the overhead involved in interpreting the Python codeand manipulating Python objects, but at the expense of the benefitsgained from coding in Python. Furthermore, the coding work requiredincreases with the dimensionality of our data. In the case of a 2-Darray, for example, the C code (abridged as before) expands to

for(i=0;i<rows;i++){for(j=0;j<columns;j++){c[i][j]=a[i][j]*b[i][j];}}

NumPy gives us the best of both worlds: element-by-element operationsare the “default mode” when anndarray is involved, but theelement-by-element operation is speedily executed by pre-compiled Ccode. In NumPy

c=a*b

does what the earlier examples do, at near-C speeds, but with the codesimplicity we expect from something based on Python. Indeed, the NumPyidiom is even simpler! This last example illustrates two of NumPy’sfeatures which are the basis of much of its power: vectorization andbroadcasting.

Why is NumPy fast?#

Vectorization describes the absence of any explicit looping, indexing,etc., in the code - these things are taking place, of course, just“behind the scenes” in optimized, pre-compiled C code. Vectorizedcode has many advantages, among which are:

  • vectorized code is more concise and easier to read

  • fewer lines of code generally means fewer bugs

  • the code more closely resembles standard mathematical notation(making it easier, typically, to correctly code mathematicalconstructs)

  • vectorization results in more “Pythonic” code. Withoutvectorization, our code would be littered with inefficient anddifficult to readfor loops.

Broadcasting is the term used to describe the implicitelement-by-element behavior of operations; generally speaking, inNumPy all operations, not just arithmetic operations, butlogical, bit-wise, functional, etc., behave in this implicitelement-by-element fashion, i.e., they broadcast. Moreover, in theexample above,a andb could be multidimensional arrays of thesame shape, or a scalar and an array, or even two arrays withdifferent shapes, provided that the smaller array is “expandable” tothe shape of the larger in such a way that the resulting broadcast isunambiguous. For detailed “rules” of broadcasting seeBroadcasting.

Who else uses NumPy?#

NumPy fully supports an object-oriented approach, starting, onceagain, withndarray. For example,ndarray is a class, possessingnumerous methods and attributes. Many of its methods are mirrored byfunctions in the outer-most NumPy namespace, allowing the programmerto code in whichever paradigm they prefer. This flexibility has allowed theNumPy array dialect and NumPyndarray class to become thede-facto languageof multi-dimensional data interchange used in Python.