Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 265 – Sorting Dictionaries by Value

Author:
Grant Griffin <g2 at iowegian.com>
Status:
Rejected
Type:
Standards Track
Created:
08-Aug-2001
Python-Version:
2.2
Post-History:


Table of Contents

Abstract

This PEP suggests a “sort by value” operation for dictionaries.The primary benefit would be in terms of “batteries included”support for a common Python idiom which, in its current form, isboth difficult for beginners to understand and cumbersome for allto implement.

BDFL Pronouncement

This PEP is rejected because the need for it has been largelyfulfilled by Py2.4’ssorted() builtin function:

>>>sorted(d.iteritems(),key=itemgetter(1),reverse=True)[('b', 23), ('d', 17), ('c', 5), ('a', 2), ('e', 1)]

or for just the keys:

sorted(d,key=d.__getitem__,reverse=True)['b','d','c','a','e']

Also, Python 2.5’sheapq.nlargest() function addresses the common usecase of finding only a few of the highest valued items:

>>>nlargest(2,d.iteritems(),itemgetter(1))[('b', 23), ('d', 17)]

Motivation

A common use of dictionaries is to count occurrences by settingthe value ofd[key] to 1 on its first occurrence, then incrementthe value on each subsequent occurrence. This can be done severaldifferent ways, but theget() method is the most succinct:

d[key]=d.get(key,0)+1

Once all occurrences have been counted, a common use of theresulting dictionary is to print the occurrences inoccurrence-sorted order, often with the largest value first.

This leads to a need to sort a dictionary’s items by value. Thecanonical method of doing so in Python is to first used.items()to get a list of the dictionary’s items, then invert the orderingof each item’s tuple from (key, value) into (value, key), thensort the list; since Python sorts the list based on the first itemof the tuple, the list of (inverted) items is therefore sorted byvalue. If desired, the list can then be reversed, and the tuplescan be re-inverted back to (key, value). (However, in myexperience, the inverted tuple ordering is fine for most purposes,e.g. printing out the list.)

For example, given an occurrence count of:

>>>d={'a':2,'b':23,'c':5,'d':17,'e':1}

we might do:

>>>items=[(v,k)fork,vind.items()]>>>items.sort()>>>items.reverse()# so largest is first>>>items=[(k,v)forv,kinitems]

resulting in:

>>>items[('b', 23), ('d', 17), ('c', 5), ('a', 2), ('e', 1)]

which shows the list in by-value order, largest first. (In thiscase,'b' was found to have the most occurrences.)

This works fine, but is “hard to use” in two aspects. First,although this idiom is known to veteran Pythoneers, it is not atall obvious to newbies – either in terms of its algorithm(inverting the ordering of item tuples) or its implementation(using list comprehensions – which are an advanced Pythonfeature.) Second, it requires having to repeatedly type a lot of“grunge”, resulting in both tedium and mistakes.

We therefore would rather Python provide a method of sortingdictionaries by value which would be both easy for newbies tounderstand (or, better yet, not tohave to understand) andeasier for all to use.

Rationale

As Tim Peters has pointed out, this sort of thing brings on theproblem of trying to be all things to all people. Therefore, wewill limit its scope to try to hit “the sweet spot”. Unusualcases (e.g. sorting via a custom comparison function) can, ofcourse, be handled “manually” using present methods.

Here are some simple possibilities:

Theitems() method of dictionaries can be augmented with newparameters having default values that provide for fullbackwards-compatibility:

(1)items(sort_by_values=0,reversed=0)

or maybe just:

(2)items(sort_by_values=0)

since reversing a list is easy enough.

Alternatively,items() could simply let us control the (key, value)order:

(3)items(values_first=0)

Again, this is fully backwards-compatible. It does less work thanthe others, but it at least eases the most complicated/tricky partof the sort-by-value problem: inverting the order of item tuples.Using this is very simple:

items=d.items(1)items.sort()items.reverse()# (if desired)

The primary drawback of the preceding three approaches is theadditional overhead for the parameter-lessitems() case, due tohaving to process default parameters. (However, if one assumesthatitems() gets used primarily for creating sort-by-value lists,this is not really a drawback in practice.)

Alternatively, we might add a new dictionary method which somehowembodies “sorting”. This approach offers two advantages. First,it avoids adding overhead to theitems() method. Second, it isperhaps more accessible to newbies: when they go looking for amethod for sorting dictionaries, they hopefully run into this one,and they will not have to understand the finer points of tupleinversion and list sorting to achieve sort-by-value.

To allow the four basic possibilities of sorting by key/value and inforward/reverse order, we could add this method:

(4)sorted_items(by_value=0,reversed=0)

I believe the most common case would actually beby_value=1,reversed=1, but the defaults values given here might lead tofewer surprises by users:sorted_items() would be the same asitems() followed bysort().

Finally (as a last resort), we could use:

(5)items_sorted_by_value(reversed=0)

Implementation

The proposed dictionary methods would necessarily be implementedin C. Presumably, the implementation would be fairly simple sinceit involves just adding a few calls to Python’s existingmachinery.

Concerns

Aside from the run-time overhead already addressed inpossibilities 1 through 3, concerns with this proposal probablywill fall into the categories of “feature bloat” and/or “codebloat”. However, I believe that several of the suggestions madehere will result in quite minimal bloat, resulting in a goodtradeoff between bloat and “value added”.

Tim Peters has noted that implementing this in C might not besignificantly faster than implementing it in Python today.However, the major benefits intended here are “accessibility” and“ease of use”, not “speed”. Therefore, as long as it is notnoticeably slower (in the case of plainitems(), speed need not bea consideration.

References

A related thread called “counting occurrences” appeared oncomp.lang.python in August, 2001. This included examples ofapproaches to systematizing the sort-by-value problem byimplementing it as reusable Python functions and classes.

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0265.rst

Last modified:2025-02-01 08:55:40 GMT


[8]ページ先頭

©2009-2025 Movatter.jp