Add an event to this calendar.
Times are shown in UTC/GMT.
Add an event to this calendar.
PEP: | 465 |
---|---|
Title: | A dedicated infix operator for matrix multiplication |
Version: | d0b969132758 |
Last-Modified: | 2016-05-03 12:03:16 +0300 (Tue, 03 May 2016) |
Author: | Nathaniel J. Smith <njs at pobox.com> |
Status: | Final |
Type: | Standards Track |
Content-Type: | text/x-rst |
Created: | 20-Feb-2014 |
Python-Version: | 3.5 |
Post-History: | 13-Mar-2014 |
Contents
This PEP proposes a new binary operator to be used for matrixmultiplication, called@. (Mnemonic:@ is* formATrices.)
A new binary operator is added to the Python language, togetherwith the corresponding in-place version:
Op | Precedence/associativity | Methods |
---|---|---|
@ | Same as* | __matmul__,__rmatmul__ |
@= | n/a | __imatmul__ |
No implementations of these methods are added to the builtin orstandard library types. However, a number of projects have reachedconsensus on the recommended semantics for these operations; seeIntended usage details below for details.
For details on how this operator will be implemented in CPython, seeImplementation details.
In numerical code, there are two important operations which competefor use of Python's* operator: elementwise multiplication, andmatrix multiplication. In the nearly twenty years since the Numericlibrary was first proposed, there have been many attempts to resolvethis tension[13]; none have been really satisfactory.Currently, most numerical Python code uses* for elementwisemultiplication, and function/method syntax for matrix multiplication;however, this leads to ugly and unreadable code in commoncircumstances. The problem is bad enough that significant amounts ofcode continue to use the opposite convention (which has the virtue ofproducing ugly and unreadable code indifferent circumstances), andthis API fragmentation across codebases then creates yet moreproblems. There does not seem to be anygood solution to theproblem of designing a numerical API within current Python syntax --only a landscape of options that are bad in different ways. Theminimal change to Python syntax which is sufficient to resolve theseproblems is the addition of a single new infix operator for matrixmultiplication.
Matrix multiplication has a singular combination of features whichdistinguish it from other binary operations, which together provide auniquely compelling case for the addition of a dedicated infixoperator:
When we crunch numbers on a computer, we usually have lots and lots ofnumbers to deal with. Trying to deal with them one at a time iscumbersome and slow -- especially when using an interpreted language.Instead, we want the ability to write down simple operations thatapply to large collections of numbers all at once. Then-dimensionalarray is the basic object that all popular numeric computingenvironments use to make this possible. Python has several librariesthat provide such arrays, with numpy being at present the mostprominent.
When working with n-dimensional arrays, there are two different wayswe might want to define multiplication. One is elementwisemultiplication:
[[1, 2], [[11, 12], [[1 * 11, 2 * 12], [3, 4]] x [13, 14]] = [3 * 13, 4 * 14]]
and the other ismatrix multiplication[19]:
[[1, 2], [[11, 12], [[1 * 11 + 2 * 13, 1 * 12 + 2 * 14], [3, 4]] x [13, 14]] = [3 * 11 + 4 * 13, 3 * 12 + 4 * 14]]
Elementwise multiplication is useful because it lets us easily andquickly perform many multiplications on a large collection of values,without writing a slow and cumbersomefor loop. And this works aspart of a very general schema: when using the array objects providedby numpy or other numerical libraries, all Python operators workelementwise on arrays of all dimensionalities. The result is that onecan write functions using straightforward code likea * b + c / d,treating the variables as if they were simple values, but thenimmediately use this function to efficiently perform this calculationon large collections of values, while keeping them organized usingwhatever arbitrarily complex array layout works best for the problemat hand.
Matrix multiplication is more of a special case. It's only defined on2d arrays (also known as "matrices"), and multiplication is the onlyoperation that has an important "matrix" version -- "matrix addition"is the same as elementwise addition; there is no such thing as "matrixbitwise-or" or "matrix floordiv"; "matrix division" and "matrixto-the-power-of" can be defined but are not very useful, etc.However, matrix multiplication is still used very heavily across allnumerical application areas; mathematically, it's one of the mostfundamental operations there is.
Because Python syntax currently allows for only a singlemultiplication operator*, libraries providing array-like objectsmust decide: either use* for elementwise multiplication, or use* for matrix multiplication. And, unfortunately, it turns outthat when doing general-purpose number crunching, both operations areused frequently, and there are major advantages to using infix ratherthan function call syntax in both cases. Thus it is not at all clearwhich convention is optimal, or even acceptable; often it varies on acase-by-case basis.
Nonetheless, network effects mean that it is very important that wepickjust one convention. In numpy, for example, it is technicallypossible to switch between the conventions, because numpy provides twodifferent types with different__mul__ methods. Fornumpy.ndarray objects,* performs elementwise multiplication,and matrix multiplication must use a function call (numpy.dot).Fornumpy.matrix objects,* performs matrix multiplication,and elementwise multiplication requires function syntax. Writing codeusingnumpy.ndarray works fine. Writing code usingnumpy.matrix also works fine. But trouble begins as soon as wetry to integrate these two pieces of code together. Code that expectsanndarray and gets amatrix, or vice-versa, may crash orreturn incorrect results. Keeping track of which functions expectwhich types as inputs, and return which types as outputs, and thenconverting back and forth all the time, is incredibly cumbersome andimpossible to get right at any scale. Functions that defensively tryto handle both types as input and DTRT, find themselves flounderinginto a swamp ofisinstance andif statements.
PEP 238 split/ into two operators:/ and//. Imagine thechaos that would have resulted if it had instead splitint intotwo types:classic_int, whose__div__ implemented floordivision, andnew_int, whose__div__ implemented truedivision. This, in a more limited way, is the situation that Pythonnumber-crunchers currently find themselves in.
In practice, the vast majority of projects have settled on theconvention of using* for elementwise multiplication, and functioncall syntax for matrix multiplication (e.g., usingnumpy.ndarrayinstead ofnumpy.matrix). This reduces the problems caused by APIfragmentation, but it doesn't eliminate them. The strong desire touse infix notation for matrix multiplication has caused a number ofspecialized array libraries to continue to use the opposing convention(e.g., scipy.sparse, pyoperators, pyviennacl) despite the problemsthis causes, andnumpy.matrix itself still gets used inintroductory programming courses, often appears in StackOverflowanswers, and so forth. Well-written libraries thus must continue tobe prepared to deal with both types of objects, and, of course, arealso stuck using unpleasant funcall syntax for matrix multiplication.After nearly two decades of trying, the numerical community has stillnot found any way to resolve these problems within the constraints ofcurrent Python syntax (seeRejected alternatives to adding a newoperator below).
This PEP proposes the minimum effective change to Python syntax thatwill allow us to drain this swamp. It splits* into twooperators, just as was done for/:* for elementwisemultiplication, and@ for matrix multiplication. (Why not thereverse? Because this way is compatible with the existing consensus,and because it gives us a consistent rule that all the built-innumeric operators also apply in an elementwise manner to arrays; thereverse convention would lead to more special cases.)
So that's why matrix multiplication doesn't and can't just use*.Now, in the rest of this section, we'll explain why it nonethelessmeets the high bar for adding a new operator.
Right now, most numerical code in Python uses syntax likenumpy.dot(a, b) ora.dot(b) to perform matrix multiplication.This obviously works, so why do people make such a fuss about it, evento the point of creating API fragmentation and compatibility swamps?
Matrix multiplication shares two features with ordinary arithmeticoperations like addition and multiplication on numbers: (a) it is usedvery heavily in numerical programs -- often multiple times per line ofcode -- and (b) it has an ancient and universally adopted tradition ofbeing written using infix syntax. This is because, for typicalformulas, this notation is dramatically more readable than anyfunction call syntax. Here's an example to demonstrate:
One of the most useful tools for testing a statistical hypothesis isthe linear hypothesis test for OLS regression models. It doesn'treally matter what all those words I just said mean; if we findourselves having to implement this thing, what we'll do is look upsome textbook or paper on it, and encounter many mathematical formulasthat look like:
Here the various variables are all vectors or matrices (details forthe curious:[5]).
Now we need to write code to perform this calculation. In currentnumpy, matrix multiplication can be performed using either thefunction or method call syntax. Neither provides a particularlyreadable translation of the formula:
import numpy as npfrom numpy.linalg import inv, solve# Using dot function:S = np.dot((np.dot(H, beta) - r).T, np.dot(inv(np.dot(np.dot(H, V), H.T)), np.dot(H, beta) - r))# Using dot method:S = (H.dot(beta) - r).T.dot(inv(H.dot(V).dot(H.T))).dot(H.dot(beta) - r)
With the@ operator, the direct translation of the above formulabecomes:
S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)
Notice that there is now a transparent, 1-to-1 mapping between thesymbols in the original formula and the code that implements it.
Of course, an experienced programmer will probably notice that this isnot the best way to compute this expression. The repeated computationof should perhaps be factored out; and,expressions of the formdot(inv(A), B) should almost always bereplaced by the more numerically stablesolve(A, B). When using@, performing these two refactorings gives us:
# Version 1 (as above)S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)# Version 2trans_coef = H @ beta - rS = trans_coef.T @ inv(H @ V @ H.T) @ trans_coef# Version 3S = trans_coef.T @ solve(H @ V @ H.T, trans_coef)
Notice that when comparing between each pair of steps, it's very easyto see exactly what was changed. If we apply the equivalenttransformations to the code using the .dot method, then the changesare much harder to read out or verify for correctness:
# Version 1 (as above)S = (H.dot(beta) - r).T.dot(inv(H.dot(V).dot(H.T))).dot(H.dot(beta) - r)# Version 2trans_coef = H.dot(beta) - rS = trans_coef.T.dot(inv(H.dot(V).dot(H.T))).dot(trans_coef)# Version 3S = trans_coef.T.dot(solve(H.dot(V).dot(H.T)), trans_coef)
Readability counts! The statements using@ are shorter, containmore whitespace, can be directly and easily compared both to eachother and to the textbook formula, and contain only meaningfulparentheses. This last point is particularly important forreadability: when using function-call syntax, the required parentheseson every operation create visual clutter that makes it very difficultto parse out the overall structure of the formula by eye, even for arelatively simple formula like this one. Eyes are terrible at parsingnon-regular languages. I made and caught many errors while trying towrite out the 'dot' formulas above. I know they still contain atleast one error, maybe more. (Exercise: find it. Or them.) The@ examples, by contrast, are not only correct, they're obviouslycorrect at a glance.
If we are even more sophisticated programmers, and writing code thatwe expect to be reused, then considerations of speed or numericalaccuracy might lead us to prefer some particular order of evaluation.Because@ makes it possible to omit irrelevant parentheses, we canbe certain that if wedo write something like(H @ V) @ H.T,then our readers will know that the parentheses must have been addedintentionally to accomplish some meaningful purpose. In thedotexamples, it's impossible to know which nesting decisions areimportant, and which are arbitrary.
Infix@ dramatically improves matrix code usability at all stagesof programmer interaction.
A large proportion of scientific code is written by people who areexperts in their domain, but are not experts in programming. Andthere are many university courses run each year with titles like "Dataanalysis for social scientists" which assume no programmingbackground, and teach some combination of mathematical techniques,introduction to programming, and the use of programming to implementthese mathematical techniques, all within a 10-15 week period. Thesecourses are more and more often being taught in Python rather thanspecial-purpose languages like R or Matlab.
For these kinds of users, whose programming knowledge is fragile, theexistence of a transparent mapping between formulas and code oftenmeans the difference between succeeding and failing to write that codeat all. This is so important that such classes often use thenumpy.matrix type which defines* to mean matrixmultiplication, even though this type is buggy and heavilydisrecommended by the rest of the numpy community for thefragmentation that it causes. This pedagogical use case is, in fact,theonly reasonnumpy.matrix remains a supported part of numpy.Adding@ will benefit both beginning and advanced users withbetter syntax; and furthermore, it will allow both groups tostandardize on the same notation from the start, providing a smootheron-ramp to expertise.
The world is full of continuous data, and computers are increasinglycalled upon to work with it in sophisticated ways. Arrays are thelingua franca of finance, machine learning, 3d graphics, computervision, robotics, operations research, econometrics, meteorology,computational linguistics, recommendation systems, neuroscience,astronomy, bioinformatics (including genetics, cancer research, drugdiscovery, etc.), physics engines, quantum mechanics, geophysics,network analysis, and many other application areas. In most or all ofthese areas, Python is rapidly becoming a dominant player, in largepart because of its ability to elegantly mix traditional discrete datastructures (hash tables, strings, etc.) on an equal footing withmodern numerical data types and algorithms.
We all live in our own little sub-communities, so some Python usersmay be surprised to realize the sheer extent to which Python is usedfor number crunching -- especially since much of this particularsub-community's activity occurs outside of traditional Python/FOSSchannels. So, to give some rough idea of just how many numericalPython programmers are actually out there, here are two numbers: In2013, there were 7 international conferences organized specifically onnumerical Python[3][4]. At PyCon 2014, ~20%of the tutorials appear to involve the use of matrices[6].
To quantify this further, we used Github's "search" function to lookat what modules are actually imported across a wide range ofreal-world code (i.e., all the code on Github). We checked forimports of several popular stdlib modules, a variety of numericallyoriented modules, and various other extremely high-profile moduleslike django and lxml (the latter of which is the #1 most downloadedpackage on PyPI). Starred lines indicate packages which export array-or matrix-like objects which will adopt@ if this PEP isapproved:
Count of Python source files on Github matching given search terms (as of 2014-04-10, ~21:00 UTC)================ ========== =============== ======= ===========module "import X" "from X import" total total/numpy================ ========== =============== ======= ===========sys 2374638 63301 2437939 5.85os 1971515 37571 2009086 4.82re 1294651 8358 1303009 3.12numpy ************** 337916 ********** 79065 * 416981 ******* 1.00warnings 298195 73150 371345 0.89subprocess 281290 63644 344934 0.83django 62795 219302 282097 0.68math 200084 81903 281987 0.68threading 212302 45423 257725 0.62pickle+cPickle 215349 22672 238021 0.57matplotlib 119054 27859 146913 0.35sqlalchemy 29842 82850 112692 0.27pylab *************** 36754 ********** 41063 ** 77817 ******* 0.19scipy *************** 40829 ********** 28263 ** 69092 ******* 0.17lxml 19026 38061 57087 0.14zlib 40486 6623 47109 0.11multiprocessing 25247 19850 45097 0.11requests 30896 560 31456 0.08jinja2 8057 24047 32104 0.08twisted 13858 6404 20262 0.05gevent 11309 8529 19838 0.05pandas ************** 14923 *********** 4005 ** 18928 ******* 0.05sympy 2779 9537 12316 0.03theano *************** 3654 *********** 1828 *** 5482 ******* 0.01================ ========== =============== ======= ===========
These numbers should be taken with several grains of salt (seefootnote for discussion:[12]), but, to the extent theycan be trusted, they suggest thatnumpy might be the singlemost-imported non-stdlib module in the entire Pythonverse; it's evenmore-imported than such stdlib stalwarts assubprocess,math,pickle, andthreading. And numpy users represent only asubset of the broader numerical community that will benefit from the@ operator. Matrices may once have been a niche data typerestricted to Fortran programs running in university labs and militaryclusters, but those days are long gone. Number crunching is amainstream part of modern Python usage.
In addition, there is some precedence for adding an infix operator tohandle a more-specialized arithmetic operation: the floor divisionoperator//, like the bitwise operators, is very useful undercertain circumstances when performing exact calculations on discretevalues. But it seems likely that there are many Python programmerswho have never had reason to use// (or, for that matter, thebitwise operators).@ is no more niche than//.
We've seen that@ makes matrix formulas dramatically easier towork with for both experts and non-experts, that matrix formulasappear in many important applications, and that numerical librarieslike numpy are used by a substantial proportion of Python's user base.But numerical libraries aren't just about matrix formulas, and beingimportant doesn't necessarily mean taking up a lot of code: if matrixformulas only occured in one or two places in the averagenumerically-oriented project, then it still wouldn't be worth adding anew operator. So how common is matrix multiplication, really?
When the going gets tough, the tough get empirical. To get a roughestimate of how useful the@ operator will be, the table belowshows the rate at which different Python operators are actually usedin the stdlib, and also in two high-profile numerical packages -- thescikit-learn machine learning library, and the nipy neuroimaginglibrary -- normalized by source lines of code (SLOC). Rows are sortedby the 'combined' column, which pools all three code bases together.The combined column is thus strongly weighted towards the stdlib,which is much larger than both projects put together (stdlib: 411575SLOC, scikit-learn: 50924 SLOC, nipy: 37078 SLOC).[7]
Thedot row (marked******) counts how common matrix multiplyoperations are in each codebase.
==== ====== ============ ==== ======== op stdlib scikit-learn nipy combined==== ====== ============ ==== ======== = 2969 5536 4932 3376 / 10,000 SLOC - 218 444 496 261 + 224 201 348 231 == 177 248 334 196 * 156 284 465 192 % 121 114 107 119 ** 59 111 118 68 != 40 56 74 44 / 18 121 183 41 > 29 70 110 39 += 34 61 67 39 < 32 62 76 38 >= 19 17 17 18 <= 18 27 12 18 dot ***** 0 ********** 99 ** 74 ****** 16 | 18 1 2 15 & 14 0 6 12 << 10 1 1 8 // 9 9 1 8 -= 5 21 14 8 *= 2 19 22 5 /= 0 23 16 4 >> 4 0 0 3 ^ 3 0 0 3 ~ 2 4 5 2 |= 3 0 0 2 &= 1 0 0 1 //= 1 0 0 1 ^= 1 0 0 0 **= 0 2 0 0 %= 0 0 0 0 <<= 0 0 0 0 >>= 0 0 0 0==== ====== ============ ==== ========
These two numerical packages alone contain ~780 uses of matrixmultiplication. Within these packages, matrix multiplication is usedmore heavily than most comparison operators (<!=<=>=). Even when we dilute these counts by including the stdlibinto our comparisons, matrix multiplication is still used more oftenin total than any of the bitwise operators, and 2x as often as//.This is true even though the stdlib, which contains a fair amount ofinteger arithmetic and no matrix operations, makes up more than 80% ofthe combined code base.
By coincidence, the numeric libraries make up approximately the sameproportion of the 'combined' codebase as numeric tutorials make up ofPyCon 2014's tutorial schedule, which suggests that the 'combined'column may not bewildly unrepresentative of new Python code ingeneral. While it's impossible to know for certain, from this data itseems entirely possible that across all Python code currently beingwritten, matrix multiplication is already used more often than//and the bitwise operations.
It's certainly unusual (though extended slicing existed for some timebuiltin types gained support for it,Ellipsis is still unusedwithin the stdlib, etc.). But the important thing is whether a changewill benefit users, not where the software is being downloaded from.It's clear from the above that@ will be used, and used heavily.And this PEP provides the critical piece that will allow the Pythonnumerical community to finally reach consensus on a standard duck typefor all array-like objects, which is a necessary precondition to everadding a numerical array type to the stdlib.
Currently, the only legal use of the@ token in Python code is atstatement beginning in decorators. The new operators are both infix;the one place they can never occur is at statement beginning.Therefore, no existing code will be broken by the addition of theseoperators, and there is no possible parsing ambiguity betweendecorator-@ and the new operators.
Another important kind of compatibility is the mental cost paid byusers to update their understanding of the Python language after thischange, particularly for users who do not work with matrices and thusdo not benefit. Here again,@ has minimal impact: evencomprehensive tutorials and references will only need to add asentence or two to fully document this PEP's changes for anon-numerical audience.
This section is informative, rather than normative -- it documents theconsensus of a number of libraries that provide array- or matrix-likeobjects on how@ will be implemented.
This section uses the numpy terminology for describing arbitrarymultidimensional arrays of data, because it is a superset of all othercommonly used models. In this model, theshape of any array isrepresented by a tuple of integers. Because matrices aretwo-dimensional, they have len(shape) == 2, while 1d vectors havelen(shape) == 1, and scalars have shape == (), i.e., they are "0dimensional". Any array contains prod(shape) total entries. Noticethatprod(()) == 1[20] (for the same reason that sum(()) == 0); scalarsare just an ordinary kind of array, not a special case. Notice alsothat we distinguish between a single scalar value (shape == (),analogous to1), a vector containing only a single entry (shape ==(1,), analogous to[1]), a matrix containing only a single entry(shape == (1, 1), analogous to[[1]]), etc., so the dimensionalityof any array is always well-defined. Other libraries with morerestricted representations (e.g., those that support 2d arrays only)might implement only a subset of the functionality described here.
The recommended semantics for@ for different inputs are:
2d inputs are conventional matrices, and so the semantics areobvious: we apply conventional matrix multiplication. If we writearr(2, 3) to represent an arbitrary 2x3 array, thenarr(2, 3)@ arr(3, 4) returns an array with shape (2, 4).
1d vector inputs are promoted to 2d by prepending or appending a '1'to the shape, the operation is performed, and then the addeddimension is removed from the output. The 1 is always added on the"outside" of the shape: prepended for left arguments, and appendedfor right arguments. The result is that matrix @ vector and vector@ matrix are both legal (assuming compatible shapes), and bothreturn 1d vectors; vector @ vector returns a scalar. This isclearer with examples.
An infelicity of this definition for 1d vectors is that it makes@ non-associative in some cases ((Mat1 @ vec) @ Mat2 !=Mat1 @ (vec @ Mat2)). But this seems to be a case wherepracticality beats purity: non-associativity only arises for strangeexpressions that would never be written in practice; if they arewritten anyway then there is a consistent rule for understandingwhat will happen (Mat1 @ vec @ Mat2 is parsed as(Mat1 @ vec)@ Mat2, just likea - b - c); and, not supporting 1d vectorswould rule out many important use cases that do arise very commonlyin practice. No-one wants to explain to new users why to solve thesimplest linear system in the obvious way, they have to type(inv(A) @b[:,np.newaxis]).flatten() instead ofinv(A) @ b,or perform an ordinary least-squares regression by typingsolve(X.T @ X, X @y[:,np.newaxis]).flatten() instead ofsolve(X.T @ X, X @ y). No-one wants to type(a[np.newaxis, :]@b[:,np.newaxis])[0, 0] instead ofa @ b every time theycompute an inner product, or(a[np.newaxis, :] @ Mat @b[:,np.newaxis])[0, 0] for general quadratic forms instead ofa @Mat @ b. In addition, sage and sympy (see below) use thesenon-associative semantics with an infix matrix multiplicationoperator (they use*), and they report that they haven'texperienced any problems caused by it.
For inputs with more than 2 dimensions, we treat the last twodimensions as being the dimensions of the matrices to multiply, and'broadcast' across the other dimensions. This provides a convenientway to quickly compute many matrix products in a single operation.For example,arr(10, 2, 3) @ arr(10, 3, 4) performs 10 separatematrix multiplies, each of which multiplies a 2x3 and a 3x4 matrixto produce a 2x4 matrix, and then returns the 10 resulting matricestogether in an array with shape (10, 2, 4). The intuition here isthat we treat these 3d arrays of numbers as if they were 1d arraysof matrices, and then apply matrix multiplication in anelementwise manner, where now each 'element' is a whole matrix.Note that broadcasting is not limited to perfectly aligned arrays;in more complicated cases, it allows several simple but powerfultricks for controlling how arrays are aligned with each other; see[10] for details. (In particular, it turns out thatwhen broadcasting is taken into account, the standard scalar *matrix product is a special case of the elementwise multiplicationoperator*.)
If one operand is >2d, and another operand is 1d, then the aboverules apply unchanged, with 1d->2d promotion performed beforebroadcasting. E.g.,arr(10, 2, 3) @ arr(3) first promotes toarr(10, 2, 3) @ arr(3, 1), then broadcasts the right argument tocreate the aligned operationarr(10, 2, 3) @ arr(10, 3, 1),multiplies to get an array with shape (10, 2, 1), and finallyremoves the added dimension, returning an array with shape (10, 2).Similarly,arr(2) @ arr(10, 2, 3) produces an intermediate arraywith shape (10, 1, 3), and a final array with shape (10, 3).
0d (scalar) inputs raise an error. Scalar * matrix multiplicationis a mathematically and algorithmically distinct operation frommatrix @ matrix multiplication, and is already covered by theelementwise* operator. Allowing scalar @ matrix would thusboth require an unnecessary special case, and violate TOOWTDI.
We group existing Python projects which provide array- or matrix-liketypes based on what API they currently use for elementwise and matrixmultiplication.
Projects which currently use * for elementwise multiplication, andfunction/method calls for matrix multiplication:
The developers of the following projects have expressed an intentionto implement@ on their array-like types using the abovesemantics:
The following projects have been alerted to the existence of the PEP,but it's not yet known what they plan to do if it's accepted. Wedon't anticipate that they'll have any objections, though, sinceeverything proposed here is consistent with how they already dothings:
Projects which currently use * for matrix multiplication, andfunction/method calls for elementwise multiplication:
The following projects have expressed an intention, if this PEP isaccepted, to migrate from their current API to the elementwise-*,matmul-@ convention (i.e., this is a list of projects whose APIfragmentation will probably be eliminated if this PEP is accepted):
The following projects have been alerted to the existence of the PEP,but it's not known what they plan to do if it's accepted (i.e., thisis a list of projects whose API fragmentation may or may not beeliminated if this PEP is accepted):
Projects which currently use * for matrix multiplication, and whichdon't really care about elementwise multiplication of matrices:
There are several projects which implement matrix types, but from avery different perspective than the numerical libraries discussedabove. These projects focus on computational methods for analyzingmatrices in the sense of abstract mathematical objects (i.e., linearmaps over free modules over rings), rather than as big bags full ofnumbers that need crunching. And it turns out that from the abstractmath point of view, there isn't much use for elementwise operations inthe first place; as discussed in the Background section above,elementwise operations are motivated by the bag-of-numbers approach.So these projects don't encounter the basic problem that this PEPexists to address, making it mostly irrelevant to them; while theyappear superficially similar to projects like numpy, they're actuallydoing something quite different. They use* for matrixmultiplication (and for group actions, and so forth), and if this PEPis accepted, their expressed intention is to continue doing so, whileperhaps adding@ as an alias. These projects include:
New functionsoperator.matmul andoperator.__matmul__ areadded to the standard library, with the usual semantics.
A corresponding functionPyObject* PyObject_MatrixMultiply(PyObject*o1, PyObject *o2) is added to the C API.
A new AST node is added namedMatMult, along with a new tokenATEQUAL and new bytecode opcodesBINARY_MATRIX_MULTIPLY andINPLACE_MATRIX_MULTIPLY.
Two new type slots are added; whether this is toPyNumberMethodsor a newPyMatrixMethods struct remains to be determined.
Why@ instead of some other spelling? There isn't any consensusacross other programming languages about how this operator should benamed[11]; here we discuss the various options.
Restricting ourselves only to symbols present on US English keyboards,the punctuation characters that don't already have a meaning in Pythonexpression context are:@, backtick,$,!, and?. Ofthese options,@ is clearly the best;! and? are alreadyheavily freighted with inapplicable meanings in the programmingcontext, backtick has been banned from Python by BDFL pronouncement(seePEP 3099), and$ is uglier, even more dissimilar to* and, and has Perl/PHP baggage.$ is probably thesecond-best option of these, though.
Symbols which are not present on US English keyboards start at asignificant disadvantage (having to spend 5 minutes at the beginningof every numeric Python tutorial just going over keyboard layouts isnot a hassle anyone really wants). Plus, even if we somehow overcamethe typing problem, it's not clear there are any that are actuallybetter than@. Some options that have been suggested include:
What we need, though, is an operator that means "matrixmultiplication, as opposed to scalar/elementwise multiplication".There is no conventional symbol with this meaning in eitherprogramming or mathematics, where these operations are usuallydistinguished by context. (And U+2297 CIRCLED TIMES is actually usedconventionally to mean exactly the wrong things: elementwisemultiplication -- the "Hadamard product" -- or outer product, ratherthan matrix/inner product like our operator).@ at least has thevirtue that itlooks like a funny non-commutative operator; a naiveuser who knows maths but not programming couldn't look atA * BversusA × B, orA * B versusA ⋅ B, orA * B versusA ° B and guess which one is the usual multiplication, and whichone is the special case.
Finally, there is the option of using multi-character tokens. Someoptions:
So, it doesn't matter much, but@ seems as good or better than anyof the alternatives:
There was a long discussion[15] aboutwhether@ should be right- or left-associative (or even somethingmore exotic[18]). Almost all Python operators areleft-associative, so following this convention would be the simplestapproach, but there were two arguments that suggested matrixmultiplication might be worth making right-associative as a specialcase:
First, matrix multiplication has a tight conceptual association withfunction application/composition, so many mathematically sophisticatedusers have an intuition that an expression like proceedsfrom right-to-left, with first transforming the vector, and then transforming the result. This isn'tuniversally agreed (and not all number-crunchers are steeped in thepure-math conceptual framework that motivates this intuition[16]), but at the least thisintuition is more common than for other operations like which everyone reads as going from left-to-right.
Second, if expressions likeMat @ Mat @ vec appear often in code,then programs will run faster (and efficiency-minded programmers willbe able to use fewer parentheses) if this is evaluated asMat @ (Mat@ vec) then if it is evaluated like(Mat @ Mat) @ vec.
However, weighing against these arguments are the following:
Regarding the efficiency argument, empirically, we were unable to findany evidence thatMat @ Mat @ vec type expressions actuallydominate in real-life code. Parsing a number of large projects thatuse numpy, we found that when forced by numpy's current funcall syntaxto choose an order of operations for nested calls todot, peopleactually use left-associative nesting slightlymore often thanright-associative nesting[17]. And anyway,writing parentheses isn't so bad -- if an efficiency-minded programmeris going to take the trouble to think through the best way to evaluatesome expression, they probablyshould write down the parenthesesregardless of whether they're needed, just to make it obvious to thenext reader that they order of operations matter.
In addition, it turns out that other languages, including those withmuch more of a focus on linear algebra, overwhelmingly make theirmatmul operators left-associative. Specifically, the@ equivalentis left-associative in R, Matlab, Julia, IDL, and Gauss. The onlyexceptions we found are Mathematica, in whicha @ b @ c would beparsed non-associatively asdot(a, b, c), and APL, in which alloperators are right-associative. There do not seem to exist anylanguages that make@ right-associative and*left-associative. And these decisions don't seem to be controversial-- I've never seen anyone complaining about this particular aspect ofany of these other languages, and the left-associativity of*doesn't seem to bother users of the existing Python libraries that use* for matrix multiplication. So, at the least we can conclude fromthis that making@ left-associative will certainly not cause anydisasters. Making@ right-associative, OTOH, would be exploringnew and uncertain ground.
And another advantage of left-associativity is that it is much easierto learn and remember that@ acts like*, than it is toremember first that@ is unlike other Python operators by beingright-associative, and then on top of this, also have to rememberwhether it is more tightly or more loosely binding than*. (Right-associativity forces us to choose a precedence, andintuitions were about equally split on which precedence made moresense. So this suggests that no matter which choice we made, no-onewould be able to guess or remember it.)
On net, therefore, the general consensus of the numerical community isthat while matrix multiplication is something of a special case, it'snot special enough to break the rules, and@ should parse like* does.
No__matmul__ or__matpow__ are defined for builtin numerictypes (float,int, etc.) or for thenumbers.Numberhierarchy, because these types represent scalars, and the consensussemantics for@ are that it should raise an error on scalars.
We do not -- for now -- define a__matmul__ method on the standardmemoryview orarray.array objects, for several reasons. Ofcourse this could be added if someone wants it, but these types wouldrequire quite a bit of additional work beyond__matmul__ beforethey could be used for numeric work -- e.g., they have no way to doaddition or scalar multiplication either! -- and adding suchfunctionality is beyond the scope of this PEP. In addition, providinga quality implementation of matrix multiplication is highlynon-trivial. Naive nested loop implementations are very slow andshipping such an implementation in CPython would just create a trapfor users. But the alternative -- providing a modern, competitivematrix multiply -- would require that CPython link to a BLAS library,which brings a set of new complications. In particular, severalpopular BLAS libraries (including the one that ships by default onOS X) currently break the use ofmultiprocessing[8].Together, these considerations mean that the cost/benefit of adding__matmul__ to these types just isn't there, so for now we'llcontinue to delegate these problems to numpy and friends, and defer amore systematic solution to a future proposal.
There are also non-numeric Python builtins which define__mul__(str,list, ...). We do not define__matmul__ for thesetypes either, because why would we even do that.
Earlier versions of this PEP also proposed a matrix power operator,@@, analogous to**. But on further consideration, it wasdecided that the utility of this was sufficiently unclear that itwould be better to leave it out for now, and only revisit the issue if-- once we have more experience with@ -- it turns out that@@is truly missed.[14]
Over the past few decades, the Python numeric community has explored avariety of ways to resolve the tension between matrix and elementwisemultiplication operations.PEP 211 andPEP 225, both proposed in 2000and last seriously discussed in 2008[9], were earlyattempts to add new operators to solve this problem, but suffered fromserious flaws; in particular, at that time the Python numericalcommunity had not yet reached consensus on the proper API for arrayobjects, or on what operators might be needed or useful (e.g.,PEP 225proposes 6 new operators with unspecified semantics). Experiencesince then has now led to consensus that the best solution, for bothnumeric Python and core Python, is to add a single infix operator formatrix multiply (together with the other new operators this implieslike@=).
We review some of the rejected alternatives here.
Use a second type that defines __mul__ as matrix multiplication:As discussed above (Background: What's wrong with the status quo?),this has been tried this for many years via thenumpy.matrix type(and its predecessors in Numeric and numarray). The result is astrong consensus among both numpy developers and developers ofdownstream packages thatnumpy.matrix should essentially never beused, because of the problems caused by having conflicting duck typesfor arrays. (Of course one could then argue we shouldonly define__mul__ to be matrix multiplication, but then we'd have the sameproblem with elementwise multiplication.) There have been severalpushes to removenumpy.matrix entirely; the only counter-argumentshave come from educators who find that its problems are outweighed bythe need to provide a simple and clear mapping between mathematicalnotation and code for novices (seeTransparent syntax is especiallycrucial for non-expert programmers). But, of course, starting outnewbies with a dispreferred syntax and then expecting them totransition later causes its own problems. The two-type solution isworse than the disease.
Add lots of new operators, or add a new generic syntax for defininginfix operators: In addition to being generally un-Pythonic andrepeatedly rejected by BDFL fiat, this would be using a sledgehammerto smash a fly. The scientific python community has consensus thatadding one operator for matrix multiplication is enough to fix the oneotherwise unfixable pain point. (In retrospect, we all thinkPEP 225was a bad idea too -- or at least far more complex than it needed tobe.)
Add a new @ (or whatever) operator that has some other meaning ingeneral Python, and then overload it in numeric code: This was theapproach taken byPEP 211, which proposed defining@ to be theequivalent ofitertools.product. The problem with this is thatwhen taken on its own terms, it's pretty clear thatitertools.product doesn't actually need a dedicated operator. Ithasn't even been deemed worth of a builtin. (During discussions ofthis PEP, a similar suggestion was made to define@ as a generalpurpose function composition operator, and this suffers from the sameproblem;functools.compose isn't even useful enough to exist.)Matrix multiplication has a uniquely strong rationale for inclusion asan infix operator. There almost certainly don't exist any otherbinary operations that will ever justify adding any other infixoperators to Python.
Add a .dot method to array types so as to allow "pseudo-infix"A.dot(B) syntax: This has been in numpy for some years, and in manycases it's better than dot(A, B). But it's still much less readablethan real infix notation, and in particular still suffers from anextreme overabundance of parentheses. SeeWhy should matrixmultiplication be infix? above.
Use a 'with' block to toggle the meaning of * within a single codeblock: E.g., numpy could define a special context object so thatwe'd have:
c = a * b # element-wise multiplicationwith numpy.mul_as_dot: c = a * b # matrix multiplication
However, this has two serious problems: first, it requires that everyarray-like type's__mul__ method know how to check some globalstate (numpy.mul_is_currently_dot or whatever). This is fine ifa andb are numpy objects, but the world contains manynon-numpy array-like objects. So this either requires non-localcoupling -- every numpy competitor library has to import numpy andthen checknumpy.mul_is_currently_dot on every operation -- orelse it breaks duck-typing, with the above code doing radicallydifferent things depending on whethera andb are numpyobjects or some other sort of object. Second, and worse,withblocks are dynamically scoped, not lexically scoped; i.e., anyfunction that gets called inside thewith block will suddenly finditself executing inside the mul_as_dot world, and crash and burnhorribly -- if you're lucky. So this is a construct that could onlybe used safely in rather limited cases (no function calls), and whichwould make it very easy to shoot yourself in the foot without warning.
Use a language preprocessor that adds extra numerically-orientedoperators and perhaps other syntax: (As per recent BDFL suggestion:[1]) This suggestion seems based on the idea thatnumerical code needs a wide variety of syntax additions. In fact,given@, most numerical users don't need any other operators orsyntax; it solves the one really painful problem that cannot be solvedby other means, and that causes painful reverberations through thelarger ecosystem. Defining a new language (presumably with its ownparser which would have to be kept in sync with Python's, etc.), justto support a single binary operator, is neither practical nordesireable. In the numerical context, Python's competition isspecial-purpose numerical languages (Matlab, R, IDL, etc.). Comparedto these, Python's killer feature is exactly that one can mixspecialized numerical code with code for XML parsing, web pagegeneration, database access, network programming, GUI libraries, andso forth, and we also gain major benefits from the huge variety oftutorials, reference material, introductory classes, etc., which usePython. Fragmenting "numerical Python" from "real Python" would be amajor source of confusion. A major motivation for this PEP is toreduce fragmentation. Having to set up a preprocessor would be anespecially prohibitive complication for unsophisticated users. And weuse Python because we like Python! We don't wantalmost-but-not-quite-Python.
Use overloading hacks to define a "new infix operator" like *dot*,as in a well-known Python recipe: (See:[2]) Beautiful isbetter than ugly. This is... not beautiful. And not Pythonic. Andespecially unfriendly to beginners, who are just trying to wrap theirheads around the idea that there's a coherent underlying system behindthese magic incantations that they're learning, when along comes anevil hack like this that violates that system, creates bizarre errormessages when accidentally misused, and whose underlying mechanismscan't be understood without deep knowledge of how object orientedsystems work.
Use a special "facade" type to support syntax like arr.M * arr:This is very similar to the previous proposal, in that the.Mattribute would basically return the same object asarr *dot` would,and thus suffers the same objections about 'magicalness'. Thisapproach also has somenon-obvious complexities: for example, while``arr.M * arr must return an array,arr.M * arr.M andarr *arr.M must return facade objects, or elsearr.M * arr.M * arrandarr * arr.M * arr will not work. But this means that facadeobjects must be able to recognize both other array objects and otherfacade objects (which creates additional complexity for writinginteroperating array types from different libraries who must nowrecognize both each other's array types and their facade types). Italso creates pitfalls for users who may easily typearr * arr.M orarr.M * arr.M and expect to get back an array object; instead,they will get a mysterious object that throws errors when they attemptto use it. Basically with this approach users must be careful tothink of.M* as an indivisible unit that acts as an infix operator-- and as infix-operator-like token strings go, at least*dot*is prettier looking (look at its cute little ears!).
Collected here for reference:
[1] | From a comment by GvR on a G+ post by GvR; thecomment itself does not seem to be directly linkable:https://plus.google.com/115212051037621986145/posts/hZVVtJ9bK3u |
[2] | http://code.activestate.com/recipes/384122-infix-operators/http://www.sagemath.org/doc/reference/misc/sage/misc/decorators.html#sage.misc.decorators.infix_operator |
[3] | http://conference.scipy.org/past.html |
[4] | http://pydata.org/events/ |
[5] | In this formula, is a vector or matrix ofregression coefficients, is the estimatedvariance/covariance matrix for these coefficients, and we want totest the null hypothesis that; a largethen indicates that this hypothesis is unlikely to be true. Forexample, in an analysis of human height, the vectormight contain one value which was the average height of themeasured men, and another value which was the average height of themeasured women, and then setting wouldlet us test whether men and women are the same height onaverage. Compare to eq. 2.139 inhttp://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/xegbohtmlnode17.html Example code is adapted fromhttps://github.com/rerpy/rerpy/blob/0d274f85e14c3b1625acb22aed1efa85d122ecb7/rerpy/incremental_ls.py#L202 |
[6] | Out of the 36 tutorials scheduled for PyCon 2014(https://us.pycon.org/2014/schedule/tutorials/), we guess that the8 below will almost certainly deal with matrices:
In addition, the following tutorials could easily involve matrices:
This gives an estimated range of 8 to 12 / 36 = 22% to 33% oftutorials dealing with matrices; saying ~20% then gives us somewiggle room in case our estimates are high. |
[7] | SLOCs were defined as physical lines which containat least one token that is not a COMMENT, NEWLINE, ENCODING,INDENT, or DEDENT. Counts were made by usingtokenize modulefrom Python 3.2.3 to examine the tokens in all files ending.pyunderneath some directory. Only tokens which occur at least oncein the source trees are included in the table. The counting scriptis availablein the PEP repository. Matrix multiply counts were estimated by counting how often certaintokens which are used as matrix multiply function names occurred ineach package. This creates a small number of false positives forscikit-learn, because we also count instances of the wrappersarounddot that this package uses, and so there are a few dozentokens which actually occur inimport ordef statements. All counts were made using the latest development version of eachproject as of 21 Feb 2014. 'stdlib' is the contents of the Lib/ directory in commitd6aa3fa646e2 to the cpython hg repository, and treats the followingtokens as indicating matrix multiply: n/a. 'scikit-learn' is the contents of the sklearn/ directory in commit69b71623273ccfc1181ea83d8fb9e05ae96f57c7 to the scikit-learnrepository (https://github.com/scikit-learn/scikit-learn), andtreats the following tokens as indicating matrix multiply:dot,fast_dot,safe_sparse_dot. 'nipy' is the contents of the nipy/ directory in commit5419911e99546401b5a13bd8ccc3ad97f0d31037 to the nipy repository(https://github.com/nipy/nipy/), and treats the following tokens asindicating matrix multiply:dot. |
[8] | BLAS libraries have a habit of secretly spawningthreads, even when used from single-threaded programs. And threadsplay very poorly withfork(); the usual symptom is thatattempting to perform linear algebra in a child process causes animmediate deadlock. |
[9] | http://fperez.org/py4science/numpy-pep225/numpy-pep225.html |
[10] | http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html |
[11] | http://mail.scipy.org/pipermail/scipy-user/2014-February/035499.html |
[12] | Counts were produced by manually entering thestring"import foo" or"from foo import" (with quotes) intothe Github code search page, e.g.:https://github.com/search?q=%22import+numpy%22&ref=simplesearch&type=Codeon 2014-04-10 at ~21:00 UTC. The reported values are the numbersgiven in the "Languages" box on the lower-left corner, next to"Python". This also causes some undercounting (e.g., leaving outCython code, and possibly one should also count HTML docs and soforth), but these effects are negligible (e.g., only ~1% of numpyusage appears to occur in Cython code, and probably even less forthe other modules listed). The use of this box is crucial,however, because these counts appear to be stable, while the"overall" counts listed at the top of the page ("We've found ___code results") are highly variable even for a single search --simply reloading the page can cause this number to vary by a factorof 2 (!!). (They do seem to settle down if one reloads the pagerepeatedly, but nonetheless this is spooky enough that it seemedbetter to avoid these numbers.) These numbers should of course be taken with multiple grains ofsalt; it's not clear how representative Github is of Python code ingeneral, and limitations of the search tool make it impossible toget precise counts. AFAIK this is the best data set currentlyavailable, but it'd be nice if it were better. In particular:
Also, it's possible there exist other non-stdlib modules we didn'tthink to test that are even more-imported than numpy -- though wetried quite a few of the obvious suspects. If you find one, let usknow! The modules tested here were chosen based on a combinationof intuition and the top-100 list at pypi-ranking.info. Fortunately, it doesn't really matter if it turns out that numpyis, say, merely thethird most-imported non-stdlib module, sincethe point is just that numeric programming is a common andmainstream activity. Finally, we should point out the obvious: whether a package isimport**ed** is rather different from whether it's import**ant**.No-one's claiming numpy is "the most important package" or anythinglike that. Certainly more packages depend on distutils, e.g., thendepend on numpy -- and far fewer source files import distutils thanimport numpy. But this is fine for our present purposes. Mostsource files don't import distutils because most source files don'tcare how they're distributed, so long as they are; these sourcefiles thus don't care about details of how distutils' API works.This PEP is in some sense about changing how numpy's and relatedpackages' APIs work, so the relevant metric is to look at sourcefiles that are choosing to directly interact with that API, whichis sort of like what we get by looking at import statements. |
[13] | The first such proposal occurs in Jim Hugunin's veryfirst email to the matrix SIG in 1995, which lays out the firstdraft of what became Numeric. He suggests using* forelementwise multiplication, and% for matrix multiplication:https://mail.python.org/pipermail/matrix-sig/1995-August/000002.html |
[14] | http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069502.html |
[15] | http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069444.htmlhttp://mail.scipy.org/pipermail/numpy-discussion/2014-March/069605.html |
[16] | http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069610.html |
[17] | http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069578.html |
[18] | http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069530.html |
[19] | https://en.wikipedia.org/wiki/Matrix_multiplication |
[20] | https://en.wikipedia.org/wiki/Empty_product |
This document has been placed in the public domain.