Counting Things in Python: A History

Nov9th,201512:30 pm |Comments

Sometimes thePythonic way to solve a problem changes over time. As Python has evolved, so has the Pythonic way to count list items.

Let’s look at different techniques for counting the number of times things appear in a list. While analyzing these techniques, we willonly be looking at code style. We’ll worry about performance later.

We will need some historical context to understand these different techniques. Fortunately we live in the__future__ and we have a time machine. Let’s jump in our DeLorean and head to 1997.

if Statement

It’s January 1, 1997 and we’re using Python 1.4. We have a list of colors and we’d love to know how many times each color occurs in this list. Let’s usea dictionary!

colors=["brown","red","green","yellow","yellow","brown","brown","black"]color_counts={}forcincolors:ifcolor_counts.has_key(c):color_counts[c]=color_counts[c]+1else:color_counts[c]=1

Note: we’re not using+= because augmented assignment won’t be added untilPython 2.0 and we’re not using thec in color_counts idiom because that won’t be invented untilPython 2.2!

After running this we’ll see that ourcolor_counts dictionary now contains the counts of each color in our list:

12	`>>>color_counts{'brown': 3, 'yellow': 2, 'green': 1, 'black': 1, 'red': 1}`

That was pretty simple. We just looped through each color, checked if it was in the dictionary, added the color if it wasn’t, and incremented the count if it was.

We could also write this as:

colors=["brown","red","green","yellow","yellow","brown","brown","black"]color_counts={}forcincolors:ifnotcolor_counts.has_key(c):color_counts[c]=0color_counts[c]=color_counts[c]+1

This might be a little slower on sparse lists (lists with lots of non-repeating colors) because it executes two statements instead of one, but we’re not worried about performance, we’re worried about code style. After some thought, we decide to stick with this new version.

try Block

It’s January 2, 1997 and we’re still using Python 1.4. We woke up this morning with a sudden realization: our code is practicing “Look Before You Leap” (LBYL) when we should be practicing “Easier to Ask Forgiveness, Than Permission” (EAFP) because EAFP is more Pythonic. Let’s refactor our code to use a try-except block:

colors=["brown","red","green","yellow","yellow","brown","brown","black"]color_counts={}forcincolors:try:color_counts[c]=color_counts[c]+1exceptKeyError:color_counts[c]=1

Now our code attempts to increment the count for each color and if the color isn’t in the dictionary, aKeyError will be raised and we will instead set the color count to 1 for the color.

get Method

It’s January 1, 1998 and we’ve upgraded to Python 1.5. We’ve decided to refactor our code to use thenewget method on dictionaries:

1234	`colors=["brown","red","green","yellow","yellow","brown","brown","black"]color_counts={}forcincolors:color_counts[c]=color_counts.get(c,0)+1`

Now our code loops through each color, gets the current count for the color from the dictionary, defaulting this count to0, adds1 to the count, and sets the dictionary key to this new value.

It’s cool that this is all one line of code, but we’re not entirely sure if this is more Pythonic. We decide this might be too clever so we revert this change.

setdefault

It’s January 1, 2001 and we’re now using Python 2.0! We’ve heard thatdictionaries have asetdefault method now and we decide to refactor our code to use this new method. We also decide to use the new+= augmented assignment operator:

colors=["brown","red","green","yellow","yellow","brown","brown","black"]color_counts={}forcincolors:color_counts.setdefault(c,0)color_counts[c]+=1

Thesetdefault method is being called on every loop, regardless of whether it’s needed, but this does seem a little more readable. We decide that this is more Pythonic than our previous solutions and commit our change.

fromkeys

It’s January 1, 2004 and we’re using Python 2.3. We’ve heard about anewfromkeys class method on dictionaries for constructing dictionaries from a list of keys. We refactor our code to use this new method:

1234	`colors=["brown","red","green","yellow","yellow","brown","brown","black"]color_counts=dict.fromkeys(colors,0)forcincolors:color_counts[c]+=1`

This creates a new dictionary using our colors as keys, with all values set to0 initially. This allows us to increment each key without worrying whether it has been set. We’ve removed the need for any checking or exception handling which seems like an improvement. We decide to keep this change.

Comprehension and set

It’s January 1, 2005 and we’re using Python 2.4. We realize that we could solve our counting problem using sets (released in Python 2.3 and made intoa built-in in 2.4) and list comprehensions (released in Python 2.0). After further thought, we remember thatgenerator expressions were also just released in Python 2.4 and we decide to use one of those instead of a list comprehension:

12	`colors=["brown","red","green","yellow","yellow","brown","brown","black"]color_counts=dict((c,colors.count(c))forcinset(colors))`

Note: we didn’t use a dictionary comprehension because those won’t be invented untilPython 2.7.

This works. It’s one line of code. But is it Pythonic?

We remember theZen of Python, whichstarted in a python-list email thread and wassnuck into Python 2.2.1. We typeimport this at our REPL:

12345678910111213141516171819202122

>>>importthisThe Zen of Python, by Tim PetersBeautiful is better than ugly.Explicit is better than implicit.Simple is better than complex.Complex is better than complicated.Flat is better than nested.Sparse is better than dense.Readability counts.Special cases aren't special enough to break the rules.Although practicality beats purity.Errors should never pass silently.Unless explicitly silenced.In the face of ambiguity, refuse the temptation to guess.There should be one-- and preferably only one --obvious way to do it.Although that way may not be obvious at first unless you're Dutch.Now is better than never.Although never is often better than *right* now.If the implementation is hard to explain, it's a bad idea.If the implementation is easy to explain, it may be a good idea.Namespaces are one honking great idea -- let's do more of those!

Our code ismore complex (O(n²) instead ofO(n)),less beautiful, andless readable. That change was a fun experiment, but this one-line solution isless Pythonic than what we already had. We decide to revert this change.

defaultdict

It’s January 1, 2007 and we’re using Python 2.5. We’ve just found out thatdefaultdict is in the standard library now. This should allow us to set0 as the default value in our dictionary. Let’s refactor our code to count using adefaultdict instead:

fromcollectionsimportdefaultdictcolors=["brown","red","green","yellow","yellow","brown","brown","black"]color_counts=defaultdict(int)forcincolors:color_counts[c]+=1

Thatfor loop is so simple now! This is almost certainly more Pythonic.

We realize that ourcolor_counts variable does act differently, however itdoes inherit fromdict and supports all the same mapping functionality.

12	`>>>color_countsdefaultdict(<type 'int'>, {'brown': 3, 'yellow': 2, 'green': 1, 'black': 1, 'red': 1})`

Instead of convertingcolor_counts to adict, we’ll assume the rest of our code practicesduck typing and leave this dict-like object as-is.

Counter

It’s January 1, 2011 and we’re using Python 2.7. We’ve been told that ourdefaultdict code is no longer the most Pythonic way to count colors.ACounter class was included in the standard library in Python 2.7 and it does all of the work for us!

123	`fromcollectionsimportCountercolors=["brown","red","green","yellow","yellow","brown","brown","black"]color_counts=Counter(colors)`

Could this get any simpler? This must be the most Pythonic way.

Likedefaultdict, this returns a dict-like object (adict subclass actually), which should be good enough for our purposes, so we’ll stick with it.

12	`>>>color_countsCounter({'brown': 3, 'yellow': 2, 'green': 1, 'black': 1, 'red': 1})`

After thought: Performance

Notice that we didn’t focus on efficiency for these solutions. Most of these solutions have the same time complexity (O(n) in big O notation) but runtimes could vary based on the Python implementation.

While performance isn’t our main concern,I did measure the run-times on CPython 3.5.0 (you canmeasure performance yourself here). It’s interesting to see how each implementation changes in relative efficiency based on the density of color names in the list.

Conclusion

Per theZen of Python, “there should be one– and preferably only one– obvious way to do it”. This is an aspirational message. There isn’t always one obvious way to do it. The “obvious” way can vary by time, need, and level of expertise.

“Pythonic” is a relative term.

Related Resources

import this and the Zen of Python: Zen of Python trivia borrowed from this post
Permission or Forgiveness: Alex Martelli discusses Grace Hopper’s EAFP
Python How To: Group and Count with Dictionaries: while writing this post, I discovered this related article

Credits

Thanks toBrian Schrader andDavid Lord for proof-reading this post andMicah Denbraver for actuallytesting out these solutions on the correct versions of Python.

Comments

Hi! My name is Trey Hunner.

I help Python teamswrite better Python code through Python team training.

I also help individualslevel-up their Python skills withweekly Python skill-building.

Python Team Training

Write Pythonic code

The best way to improve your skills is towrite more code, but it's time consuming to figure out what code to write. I've madea Python skill-building service to help solve this problem.

Each week you'll get an exercise that'll help you dive deeper into Python and carefullyreflect on your own coding style. The first 3 exercises are free.

See thePython Morsels Privacy Policy.
This form is reCAPTCHA protected (see GooglePrivacy Policy &Terms of Service)

Favorite Posts

Follow @treyhunner

Movatterモバイル変換

Trey Hunner

I help developers level-up their Python skills