This repository was archived by the owner on Aug 3, 2020. It is now read-only.

JeffPaine/beautiful_idiomatic_pythonPublic archive

NotificationsYou must be signed in to change notification settings
Fork133
Star655

Notes from Raymond Hettinger's talk at PyCon US 2013.

655 stars 133 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md

Repository files navigation

Transforming Code into Beautiful, Idiomatic Python

Notes from Raymond Hettinger's talk at pycon US 2013video,slides.

The code examples and direct quotes are all from Raymond's talk. I've reproduced them here for my own edification and the hopes that others will find them as handy as I have!

Looping over a range of numbers

foriin [0,1,2,3,4,5]:printi**2foriinrange(6):printi**2

Better

foriinxrange(6):printi**2

xrange creates an iterator over the range producing the values one at a time. This approach is much more memory efficient thanrange.xrange was renamed torange in python 3.

Looping over a collection

colors= ['red','green','blue','yellow']foriinrange(len(colors)):printcolors[i]

Better

forcolorincolors:printcolor

Looping backwards

colors= ['red','green','blue','yellow']foriinrange(len(colors)-1,-1,-1):printcolors[i]

Better

forcolorinreversed(colors):printcolor

Looping over a collection and indices

colors= ['red','green','blue','yellow']foriinrange(len(colors)):printi,'--->',colors[i]

Better

fori,colorinenumerate(colors):printi,'--->',color

It's fast and beautiful and saves you from tracking the individual indices and incrementing them.

Whenever you find yourself manipulating indices [in a collection], you're probably doing it wrong.

Looping over two collections

names= ['raymond','rachel','matthew']colors= ['red','green','blue','yellow']n=min(len(names),len(colors))foriinrange(n):printnames[i],'--->',colors[i]forname,colorinzip(names,colors):printname,'--->',color

Better

forname,colorinizip(names,colors):printname,'--->',color

zip creates a new list in memory and takes more memory.izip is more efficient thanzip.Note: in python 3izip was renamed tozip and promoted to a builtin replacing the oldzip.

Looping in sorted order

colors= ['red','green','blue','yellow']# Forward sorted orderforcolorinsorted(colors):printcolor# Backwards sorted orderforcolorinsorted(colors,reverse=True):printcolor

Custom Sort Order

colors= ['red','green','blue','yellow']defcompare_length(c1,c2):iflen(c1)<len(c2):return-1iflen(c1)>len(c2):return1return0printsorted(colors,cmp=compare_length)

Better

printsorted(colors,key=len)

The original is slow and unpleasant to write. Also, comparison functions are no longer available in python 3.

Call a function until a sentinel value

blocks= []whileTrue:block=f.read(32)ifblock=='':breakblocks.append(block)

Better

blocks= []forblockiniter(partial(f.read,32),''):blocks.append(block)

iter takes two arguments. The first you call over and over again and the second is a sentinel value.

Distinguishing multiple exit points in loops

deffind(seq,target):found=Falsefori,valueinenumerate(seq):ifvalue==target:found=Truebreakifnotfound:return-1returni

Better

deffind(seq,target):fori,valueinenumerate(seq):ifvalue==target:breakelse:return-1returni

Inside of everyfor loop is anelse.

Looping over dictionary keys

d= {'matthew':'blue','rachel':'green','raymond':'red'}forkind:printkforkind.keys():ifk.startswith('r'):deld[k]

When should you use the second and not the first? When you're mutating the dictionary.

If you mutate something while you're iterating over it, you're living in a state of sin and deserve what ever happens to you.

d.keys() makes a copy of all the keys and stores them in a list. Then you can modify the dictionary.Note: in python 3 to iterate through a dictionary you have to explicitly write:list(d.keys()) becaused.keys() returns a "dictionary view" (an iterable that provide a dynamic view on the dictionary’s keys). Seedocumentation.

Looping over dictionary keys and values

# Not very fast, has to re-hash every key and do a lookupforkind:printk,'--->',d[k]# Makes a big huge listfork,vind.items():printk,'--->',v

Better

fork,vind.iteritems():printk,'--->',v

iteritems() is better as it returns an iterator.Note: in python 3 there is noiteritems() anditems() behaviour is close to whatiteritems() had. Seedocumentation.

Construct a dictionary from pairs

names= ['raymond','rachel','matthew']colors= ['red','green','blue']d=dict(izip(names,colors))# {'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}

For python 3:d = dict(zip(names, colors))

Counting with dictionaries

colors= ['red','green','red','blue','green','red']# Simple, basic way to count. A good start for beginners.d= {}forcolorincolors:ifcolornotind:d[color]=0d[color]+=1# {'blue': 1, 'green': 2, 'red': 3}

Better

d= {}forcolorincolors:d[color]=d.get(color,0)+1# Slightly more modern but has several caveats, better for advanced users# who understand the intricaciesd=collections.defaultdict(int)forcolorincolors:d[color]+=1

Grouping with dictionaries -- Part I and II

names= ['raymond','rachel','matthew','roger','betty','melissa','judith','charlie']# In this example, we're grouping by name lengthd= {}fornameinnames:key=len(name)ifkeynotind:d[key]= []d[key].append(name)# {5: ['roger', 'betty'], 6: ['rachel', 'judith'], 7: ['raymond', 'matthew', 'melissa', 'charlie']}d= {}fornameinnames:key=len(name)d.setdefault(key, []).append(name)

Better

d=collections.defaultdict(list)fornameinnames:key=len(name)d[key].append(name)

Is a dictionary popitem() atomic?

d= {'matthew':'blue','rachel':'green','raymond':'red'}whiled:key,value=d.popitem()printkey,'-->',value

popitem is atomic so you don't have to put locks around it to use it in threads.

Linking dictionaries

defaults= {'color':'red','user':'guest'}parser=argparse.ArgumentParser()parser.add_argument('-u','--user')parser.add_argument('-c','--color')namespace=parser.parse_args([])command_line_args= {k:vfork,vinvars(namespace).items()ifv}# The common approach below allows you to use defaults at first, then override them# with environment variables and then finally override them with command line arguments.# It copies data like crazy, unfortunately.d=defaults.copy()d.update(os.environ)d.update(command_line_args)

Better

d=ChainMap(command_line_args,os.environ,defaults)

ChainMap has been introduced into python 3. Fast and beautiful.

Improving Clarity

Positional arguments and indicies are nice
Keywords and names are better
The first way is convenient for the computer
The second corresponds to how human’s think

Clarify function calls with keyword arguments

twitter_search('@obama',False,20,True)

Better

twitter_search('@obama',retweets=False,numtweets=20,popular=True)

Is slightly (microseconds) slower but is worth it for the code clarity and developer time savings.

Clarify multiple return values with named tuples

# Old testmod return valuedoctest.testmod()# (0, 4)# Is this good or bad? You don't know because it's not clear.

Better

# New testmod return value, a named tupledoctest.testmod()# TestResults(failed=0, attempted=4)

A named tuple is a subclass of tuple so they still work like a regular tuple, but are more friendly.

To make a named tuple, call namedtuple factory function in collections module:

fromcollectionsimportnamedtupleTestResults=namedtuple('TestResults', ['failed','attempted'])

Unpacking sequences

p='Raymond','Hettinger',0x30,'python@example.com'# A common approach / habit from other languagesfname=p[0]lname=p[1]age=p[2]email=p[3]

Better

fname,lname,age,email=p

The second approach uses tuple unpacking and is faster and more readable.

Updating multiple state variables

deffibonacci(n):x=0y=1foriinrange(n):printxt=yy=x+yx=t

Better

deffibonacci(n):x,y=0,1foriinrange(n):printxx,y=y,x+y

Problems with first approach

x and y are state, and state should be updated all at once or in between lines that state is mis-matched and a common source of issues
ordering matters
it's too low level

The second approach is more high-level, doesn't risk getting the order wrong and is fast.

Simultaneous state updates

tmp_x=x+dx*ttmp_y=y+dy*t# NOTE: The "influence" function here is just an example function, what it does# is not important. The important part is how to manage updating multiple# variables at once.tmp_dx=influence(m,x,y,dx,dy,partial='x')tmp_dy=influence(m,x,y,dx,dy,partial='y')x=tmp_xy=tmp_ydx=tmp_dxdy=tmp_dy

Better

# NOTE: The "influence" function here is just an example function, what it does# is not important. The important part is how to manage updating multiple# variables at once.x,y,dx,dy= (x+dx*t,y+dy*t,influence(m,x,y,dx,dy,partial='x'),influence(m,x,y,dx,dy,partial='y'))

Efficiency

An optimization fundamental rule
Don’t cause data to move around unnecessarily
It takes only a little care to avoid O(n**2) behavior instead of linear behavior

Basically, just don't move data around unecessarily.

Concatenating strings

names= ['raymond','rachel','matthew','roger','betty','melissa','judith','charlie']s=names[0]fornameinnames[1:]:s+=', '+nameprints

Better

print', '.join(names)

Updating sequences

names= ['raymond','rachel','matthew','roger','betty','melissa','judith','charlie']delnames[0]# The below are signs you're using the wrong data structurenames.pop(0)names.insert(0,'mark')

Better

names=collections.deque(['raymond','rachel','matthew','roger','betty','melissa','judith','charlie'])# More efficient with collections.dequedelnames[0]names.popleft()names.appendleft('mark')

Decorators and Context Managers

Helps separate business logic from administrative logic
Clean, beautiful tools for factoring code and improving code reuse
Good naming is essential.
Remember the Spiderman rule: With great power, comes great responsibility!

Using decorators to factor-out administrative logic

# Mixes business / administrative logic and is not reusabledefweb_lookup(url,saved={}):ifurlinsaved:returnsaved[url]page=urllib.urlopen(url).read()saved[url]=pagereturnpage

Better

@cachedefweb_lookup(url):returnurllib.urlopen(url).read()

Note: since python 3.2 there is a decorator for this in thestandard library:functools.lru_cache.

Factor-out temporary contexts

# Saving the old, restoring the newold_context=getcontext().copy()getcontext().prec=50printDecimal(355)/Decimal(113)setcontext(old_context)

Better

withlocalcontext(Context(prec=50)):printDecimal(355)/Decimal(113)

How to open and close files

f=open('data.txt')try:data=f.read()finally:f.close()

Better

withopen('data.txt')asf:data=f.read()

How to use locks

# Make a locklock=threading.Lock()# Old-way to use a locklock.acquire()try:print'Critical section 1'print'Critical section 2'finally:lock.release()

Better

# New-way to use a lockwithlock:print'Critical section 1'print'Critical section 2'

Factor-out temporary contexts

try:os.remove('somefile.tmp')exceptOSError:pass

Better

withignored(OSError):os.remove('somefile.tmp')

ignored is is new in python 3.4,documentation.Note:ignored is actually calledsuppress in the standard library.

To make your ownignored context manager in the meantime:

@contextmanagerdefignored(*exceptions):try:yieldexceptexceptions:pass

Stick that in your utils directory and you too can ignore exceptions

Factor-out temporary contexts

# Temporarily redirect standard out to a file and then return it to normalwithopen('help.txt','w')asf:oldstdout=sys.stdoutsys.stdout=ftry:help(pow)finally:sys.stdout=oldstdout

Better

withopen('help.txt','w')asf:withredirect_stdout(f):help(pow)

redirect_stdout is proposed for python 3.4,bug report.

To roll your ownredirect_stdout context manager

@contextmanagerdefredirect_stdout(fileobj):oldstdout=sys.stdoutsys.stdout=fileobjtry:yieldfileobjfinally:sys.stdout=oldstdout

Concise Expressive One-Liners

Two conflicting rules:

Don’t put too much on one line
Don’t break atoms of thought into subatomic particles

Raymond’s rule:

One logical line of code equals one sentence in English

List Comprehensions and Generator Expressions

result= []foriinrange(10):s=i**2result.append(s)printsum(result)

Better

printsum(i**2foriinxrange(10))

First way tells you what to do, second way tells you what you want.

About

Notes from Raymond Hettinger's talk at PyCon US 2013.

Movatterモバイル変換

JeffPaine/beautiful_idiomatic_python

Folders and files

Latest commit

History

Repository files navigation

Transforming Code into Beautiful, Idiomatic Python

Looping over a range of numbers

Better

Looping over a collection

Better

Looping backwards

Better

Looping over a collection and indices

Better

Looping over two collections

Better

Looping in sorted order

Custom Sort Order

Better

Call a function until a sentinel value

Better

Distinguishing multiple exit points in loops

Better

Looping over dictionary keys

Looping over dictionary keys and values

Better

Construct a dictionary from pairs

Counting with dictionaries

Better

Grouping with dictionaries -- Part I and II

Better

Is a dictionary popitem() atomic?

Linking dictionaries

Better

Improving Clarity

Clarify function calls with keyword arguments

Better

Clarify multiple return values with named tuples

Better

Unpacking sequences

Better

Updating multiple state variables

Better

Simultaneous state updates

Better

Efficiency

Concatenating strings

Better

Updating sequences

Better

Decorators and Context Managers

Using decorators to factor-out administrative logic

Better

Factor-out temporary contexts

Better

How to open and close files

Better

How to use locks

Better

Factor-out temporary contexts

Better

Factor-out temporary contexts

Better

Concise Expressive One-Liners

List Comprehensions and Generator Expressions

Better

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors4

Uh oh!

Packages