- Notifications
You must be signed in to change notification settings - Fork3
yugr/python-hate
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A growing list of things I dislike about Python.
There are workarounds for some of them (often half-broken and usually unintuitive)and others may even be considered to be virtues by some people.
The most critical problem of Python is complete lack of static checking(it does not even detect missing variable definitions)which increases debugging time and makes refactoring more time-consuming than needed.This becomes particularly obvious when you run your app on a huge amount of data overnight,just to detect missing initialization in some rarely called function in the morning.
There isPylint but it is a linter (i.e. style checker)rather than a real static analyzer so it is unable, by design, to detectmany serious errors which require dataflow analysis.For example if fails on basic stuff like
- invalid string formatting (fixedhere)
- iterating over unsorted dicts (reportedhere with draft patch, rejected because maintainers consider it unimportant (no particular reasons provided))
- dead list computations (e.g. using
sorted(lst)
instead oflst.sort()
) - modifying list while iterating over it (reportedhere with draft patch, rejected because maintainers consider it unimportant (no particular reasons provided))
- etc.
GIL precludes high-performant multithreadingwhich is suprising at the age of multicores.
Type annotations have finally been introduced in Python 3.5. Google has even developed apytype type inferencer/checker but it seems to haveseriouslimitations so it's unclear whether it's production-ready.
Lack of type annotations forces people to use hungarian notationin complex programs (hello 90-s!).
This is not a valid syntax:
x == not y
It's not possible to overloadand
,or
ornot
(which might have been handy to represent e.g. operations on set-like or geometric objects).There's even aPEP which was rejectedbecause Guido disliked particular implementation.
It's very easy to make a mistake of writinglen(lst1) == lst2
instead of intendedlen(lst1) == len(lst2)
.Python will (un)helpfully make it harder to find this errorby silently converting first variant to[len(lst1)] * len(lst2) == lst2
(instead of aborting with a type fail).
For unclear reason lambda functions only support expressionsso anything that has control flow requires a local named function.
PEP 3113tries to persuade you that this was a good design decision:
While an informal poll of the handful of Python programmers I know personally ...indicates a huge majority of people do not know of this feature ...
Theis
andis not
operators have the same precedence as comparisonsso this code
op.post_modification is None != full_op.post_modification is None
would rather unexpectedly evalute as
((op.post_modification is None) != full_op.post_modification) is None
Explicitly writing outself
in all method declarations and callsshould not be needed. Apart from being an unnecessary boilerplate,this enables another class of bugs:
class A: def first(x, *y): return xa = Aprint(a.first(1,2,3)) # Will print a, not 1
Python does not require a call to parent class constructor:
class A(B): def __init__(self, x): super().__init__() self.x = x
so when it's missing you'll have hard time understandingwhether it's been omitted deliberately or accidentally.
Python allows omission of parenthesis around tuples in most cases:
for i, x in enumerate(xs): passx, y = y, xreturn x, y
but not all cases:
foo = [x, y for x in range(5) for y in range(5)]SyntaxError: invalid syntax
It's not possible to do tuple unpacking in lambdas so insteadof concise and readable
lst = [(1, 'Helen', None), (3, 'John', '121')]lst.sort(key=lambda n, name, phone: (name, phone)) # TypeError: <lambda>() missing 2 required positional arguments
you should use
lst.sort(key=lambda n_name_phone: (n_name_phone[1], n_name_phone[2]))
This seems to be intentional decision as tuple unpacking does work in Python 2.
Sets can be initialized via syntactic sugar:
>>> x = {1, 2, 3}>>> type(x)<class 'set'>
but it breaks for empty sets:
>>> x = {}>>> type(x)<class 'dict'>
It's too easy to inadvertently share references:
a = b = []
or
def foo(x=[]): # foo() will return [1], [1, 1], [1, 1, 1], etc. x.append(1) return x
or even
def foo(obj, lst=[]): obj.lst = lstfoo(obj)obj.lst.append(1) # Hoorah, this modifies default value of foo
Default return value from function (whenreturn
is omitted) isNone
.This makes it impossible to declare subroutines which are not supposedto return anything (and verify this at runtime).
Assigning a non-existent object field adds it instead of throwing an exception:
class A: def __init__(self): self.x = 0...a = A()a.y = 1 # OK
This complicates refactoring because forgetting to update an outdated field namedeep inside your (or your colleague's) program will silently work,breaking your program much later.
This can be overcome with__slots__
but when have you seen themused last time?
When accessing object attribute viaobj.attr
syntax Python will first searchforattr
inobj
's instance variables. If it's not present,it will searchattr
in class variables ofobj
's class.
This behavior is reasonable and matches other languages.
Problem is that status ofattr
will change if we write it:
class A: x = 1a = A()# Here self.v means A.v ...print(A.x) # 1print(a.x) # 1# ... and here it does nota.x = 2print(A.x) # 1print(a.x) # 2
This leads to this particularly strange and unexpected semantics:
class A: x = 1 def __init__(self): self.x += 1print(A.x) # 1a = A()print(A.x) # 1print(a.x) # 2
To understand what's going on, note that__init__
is interpreted as
def __init__(self): self.x = A.x + 1
No comments:
> 2+2 is 4True> 999+1 is 1000False
This happens becauseonly sufficiently small integer objects are reused:
# Two different instances of number "1000">>> id(999+1)140068481622512>>> id(1000)140068481624112# Single instance of number "4">>> id(2+2)10968896>>> id(4)10968896
Invalid indexing throws an exceptionbut invalid slicing does not:
>>> a=list(range(4))>>> a[4]Traceback (most recent call last): File "<stdin>", line 1, in <module>IndexError: list index out of range>>> a[4:5][]
Python got rid of spurious bracing but introduced a spurious:
lexeme instead.The lexeme is not needed for parsing and its only purposewasto somehow "enhance readability".
Normally all unary operators have higher priority than binary ones but of course not in Python:
>>> not 'x' in ['x', False]False>>> (not 'x') in ['x', False]True>>> not ('x' in ['x', False])False
A funny consequence of this is thatx not in lst
andnot x in lst
notations are equivalent.
When you callsuper().__init__
in your class constructor:
class B(A): def __init__(self): super(B, self).__init__()
it will NOT necessarily call constructor of superclass (A
in this case).
Instead it will call a constructor ofsome other class from class hierarchyofself
s class (if this sounds a bit complicated that's because it actually is).
Let's look at a simple example:
# object# / \# A B# | |# C D# \ /# Eclass A(object): def __init__(self): print("A") super().__init__()class B(object): def __init__(self): print("B") super().__init__()class C(A): def __init__(self, arg): print(f"C {arg}") super().__init__()class D(B): def __init__(self, arg): print(f"D {arg}") super().__init__()class E(C, D): def __init__(self, arg): print(f"E {arg}") super().__init__(arg)
If we try to construct an instance ofE
we'll get a puzzling error:
>>> E(10)E 1C 1ATraceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in __init__ File "<stdin>", line 4, in __init__ File "<stdin>", line 4, in __init__TypeError: __init__() missing 1 required positional argument: 'arg'
What happens here is that for diamond class hierarchies Python willexecute constructors in strange unintuitive order(called MRO, explained in detailshere).In our case the order happens to be E, C, A, D, B.
As you can see poorA
suddenly has to callD
instead of expectedobject
.D
requiresarg
of whichA
is of course not aware of, hence the crash.So when you callsuper()
you have no idea which class you are calling into,nor what the expected__init__
signature is.
When usingsuper()
ALL your__init__
methods have to use keyword arguments onlyand pass all of them to the caller:
class A(object): def __init__(self, **kwargs): print("A") if type(self).__mro__[-2] is A: # Avoid "TypeError: object.__init__() takes no parameters" error super().__init__() return super().__init__(**kwargs)class B(object): def __init__(self, **kwargs): print("B") if type(self).__mro__[-2] is B: # Avoid "TypeError: object.__init__() takes no parameters" error super().__init__() return super().__init__(**kwargs)class C(A): def __init__(self, **kwargs): arg = kwargs["arg"] print(f"C {arg}") super().__init__(**kwargs)class D(B): def __init__(self, **kwargs): arg = kwargs["arg"] print(f"D {arg}") super().__init__(**kwargs)class E(C, D): def __init__(self, **kwargs): arg = kwargs["arg"] print(f"E {arg}") super().__init__(**kwargs)E(arg=1)
Notice the especially beautiful__mro__
checks which I needed to avoid error inobject.__init_
which just so happens to NOT support the kwargs convention(seesuper() and changing the signature of cooperative methodsfor details).
I'll let the reader decide how more readable and efficient this makes your code.
SeePython's Super is nifty, but you can't use it for an in-depth discussion.
Thanks to active use of generators in Python 3 it became easierto misuse standard APIs:
if filter(lambda x: x == 0, [1,2]): print("Yes") # Prints "Yes"!
Surpisingly enough this does not apply torange
(i.e.bool(range(0))
returnsFalse
as expected).
argparse
does not provide automatic support for--no-XXX
flags.
By default formatter used by argparse
- won't display default option values
- will make code examples provided via
epilog
unreadableby stripping leading whitespaces
Enabling both features requires defining a custom formatter:
class Formatter(argparse.ArgumentDefaultsHelpFormatter, argparse.RawDescriptionHelpFormatter): pass
>>> opts, args = getopt.getopt(['A', '-o', 'B'], 'o:')>>> opts[]>>> args['A', '-o', 'B']>>> opts, args = getopt.getopt(['-o', 'B', 'A'], 'o:')>>> opts[('-o', 'B')]>>> args['A']
split
andjoin
accept list and separator in different order:
sep.join(lst)lst.split(sep)
Builtin functionsdo not support named argumentse.g.
>>> x = {1: 2}>>> x.get(2, 0)0>>> x.get(2, default=0)Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: get() takes no keyword arguments
string.strip()
andlist.sort()
, although named similarly (a verb in imperative mood),have very different behavior: string's method returns a stripped copy whereaslist's one sorts object inplace (and returnsNone
).
Os.path.join
willsilently drop preceding inputs on argument with leading slash:
>>> print(os.path.join('/home/yugr', '/../libexec'))/../libexec
Python docs mentions this behavior:
If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
but unfortunately do not provide any reasons for such irrational choice. Throwing exception would be much less error-prone.
Google search forwhy os.path.join throws away returns 22K results at the time of this writing...
Python uses a non-standard banker's rounding algorithm:
# Python3 (bankers rounding)>>> round(0.5)0>>> round(1.5)2>>> round(2.5)2
Apart from being counterintuitive, this is also different from most other languages (C, Java, Ruby, etc.).
Python lacks lexical scoping i.e. there is no way to localize variablein a scope smaller than a function.This often hurts when renaming variables during code refactoring.Forgetting to rename variable name in a single place, causes interpreterto pick up an unrelated name from unrelated block 50 lines above orfrom previous loop iteration.
This is especially inconvenient for one-off variables (e.g. loop counters):
for i in range(100): ...# 100 lines laterfor j in range(200): a[i] = 0 # Yikes, forgot to rename!
One could remove a name from scope viadel i
but this is considered tooverbose so noone uses it.
Similarly to above, there is no control over visibility (can't hide class methods,can't hide module functions). You are left with aconvention to precedeprivate functions with_
and hope for the best.
Python allows different syntaxes for the aliasing functionality:
from mod import submod as Ximport mod.submod as X
Python scoping rules require that assigning a variable automatically declares it local.This causes inconsistencies and weird limitations in practice. E.g. variable wouldbe considered local even if assignment follows first use:
global xxxxxx = 0def foo(): a = xxx # Throws UnboundLocalError xxx = 2
and even if assignment is never ever executed:
def foo(): a = xxx # Still aborts... if False: xxx = 2
This is particularly puzzling in long functions when someone accidentallyadds a local variable which matches name of a global variable used in other partof the same function:
def value(): ...def foo(): ... # A lot of code value(1) # Surprise! UnboundLocalError ... # Yet more code for value in data: ...
Once you've lost some time debugging the issue, you can be overcomefor global variables by declaring their names asglobal
before first use:
def foo(): global xxx a = xxx xxx = 2
But there are no magic keywords forvariables from non-global outer scopes so theyare essentially unwritable from nested scopes i.e. closures:
def foo(): xxx = 1 def bar(): xxx = 2 # No way to modify xxx...
The only available "solution" is to wrap the variable into a fake 1-element array(whaat?!):
def foo(): xxx = [1] def bar(): xxx[0] = 2
Normally statements that belongs to false branches are not executed.E.g. this code works:
if True: import rere.search(r'1', '1')
and this one raisesNameError
:
if False: import rere.search(r'1', '1')
This does not apply toglobal
declarations though:
xxx = 1def foo(): if False: global xxx xxx = 42foo()print(xxx) # Prints 42
Relative imports (from .xxx.yyy import mymod
) have many weird limitationse.g. they will not allow you to import module from parent folder andthey will seize work in main script
ModuleNotFoundError: No module named '__main__.xxx'; '__main__' is not a package
A workaround is to use extremely uglysys.path
hackery:
import sysimport os.pathsys.path.append(os.path.join(os.path.dirname(__file__), 'xxx', 'yyy'))
Search for "python relative imports" on stackoverflow to see some really clumsy Python code(e.g.hereorhere).Also seeWhen are circular imports fatal?for more weird limitations of relative imports with respect to circular dependencies.
It's very hard to automatically optimize Python codebecause there are far too many waysin which program may change execution environment e.g.
for re in regexes: ...
(see e.g.this quote from Guido).Existing optimizers (e.g. pypy) have to rely on idioms and heuristics.
Syntax error reporting in Python is extremely primitive.In most cases you simply getSyntaxError: invalid syntax
.
Windows and Linux use different naming convention for Python executables(python
on Windows,python2
/python3
on Linux).
Python debugging is super-slow (few orders of magnitude slower thaninterpretation).
Already mentioned inZero static checking.
Python debugger willignore breakpoints set on pass statements.Thus poor-man's conditional breakpoints like
if x > 0: pass
will silently fail to work, leaving a false impression that condition is always false.
Most people blame Python 3 for syntax changes which break existing code(e.g. makingprint
a regular function) but the real problem issemantic changes as they are much harder to detect and debug.Some examples
- integer division:
print(1/2) # Prints "0" in 2, "0.5" in 3
- checking
filter
ed list for emptiness:
if filter(lambda x: x, [0]): print("X") # Executes in 3 but not in 2
- order of keys in dictionary is random until Python 3.6
- different rounding algorithm:
# Python2.7>>> round(0.5)1.0>>> round(1.5)2.0>>> round(2.5)3.0# Python3 (bankers rounding)>>> round(0.5)0>>> round(1.5)2>>> round(2.5)2
Python community does not seem to have a strong culture of preserving API backwards compatibilityor followingSemVer convention(which is hinted by the fact that there are no widespread tools for checking Python package APIcompatibility). This is not surprising given that even minor versions of Python 3 itselfbreak old and popular APIs (e.g.time.clock).Another likely reason is lack of good mechanisms to control what's exported from module(prefixing methods and objects with underscore isnot a good mechanism).
In practice this means that it's too risky to allow differences in minor (and even patch) versionsof dependencies.Instead the most robust (and thus most common) solution is to fixall app dependencies(including the transitive ones) down to patch versions (via blindpip freeze > requirements.txt
)and run each app in a dedicated virtualenv or Docker container.
Apart from complicating deployment, fixing versions alsocomplicates importing module in other applications(due to increased chance of conflicing dependencies)and upgrading dependencies later on to get bugfixes and security patches.
For more details see the excellent"Dependency hell: a library author's guide" talkand an alternative view inShould You Use Upper Bound Version Constraints?