Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 378 – Format Specifier for Thousands Separator

PEP 378 – Format Specifier for Thousands Separator

Author:
Raymond Hettinger <python at rcn.com>
Status:
Final
Type:
Standards Track
Created:
12-Mar-2009
Python-Version:
2.7, 3.1
Post-History:
12-Mar-2009

Table of Contents

Motivation

Provide a simple, non-locale aware way to format a numberwith a thousands separator.

Adding thousands separators is one of the simplest ways tohumanize a program’s output, improving its professional appearanceand readability.

In the finance world, output with thousands separators is the norm.Finance users and non-professional programmers find the localeapproach to be frustrating, arcane and non-obvious.

The locale module presents two other challenges. First, it isa global setting and not suitable for multi-threaded apps thatneed to serve-up requests in multiple locales. Second, thename of a relevant locale (such as “de_DE”) can vary fromplatform to platform or may not be defined at all. The docsfor the locale module describe these andmany other challengesin detail.

It is not the goal to replace the locale module, to performinternationalization tasks, or accommodate every possibleconvention. Such tasks are better suited to robust tools likeBabel. Instead, the goal is to make a common, everydaytask easier for many users.

Main Proposal (from Alyssa Coghlan, originally called Proposal I)

A comma will be added to the format() specifier mini-language:

[[fill]align][sign][#][0][width][,][.precision][type]

The ‘,’ option indicates that commas should be included in theoutput as a thousands separator. As with locales which do notuse a period as the decimal point, locales which use adifferent convention for digit separation will need to use thelocale module to obtain appropriate formatting.

The proposal works well with floats, ints, and decimals.It also allows easy substitution for other separators.For example:

format(n,"6,d").replace(",","_")

This technique is completely general but it is awkward in theone case where the commas and periods need to be swapped:

format(n,"6,f").replace(",","X").replace(".",",").replace("X",".")

Thewidth argument means the total length including the commasand decimal point:

format(1234,"08,d")-->'0001,234'format(1234.5,"08,.1f")-->'01,234.5'

The ‘,’ option is defined as shown above for types ‘d’, ‘e’,‘f’, ‘g’, ‘E’, ‘G’, ‘%’, ‘F’ and ‘’. To allow future extensions, it isundefined for other types: binary, octal, hex, character,etc.

This proposal has the virtue of being simpler than the alternativeproposal but is much less flexible and meets the needs of fewerusers right out of the box. It is expected that some othersolution will arise for specifying alternative separators.

Current Version of the Mini-Language

Research into what Other Languages Do

Scanning the web, I’ve found that thousands separators areusually one of COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.

C-Sharp provides both styles (picture formatting and type specifiers).The type specifier approach is locale aware. The picture formatting onlyoffers a COMMA as a thousands separator:

String.Format("{0:n}",12400)==>"12,400"String.Format("{0:0,0}",12400)==>"12,400"

Common Lisp uses a COLON before the~D decimal type specifier toemit a COMMA as a thousands separator. The general form of~D is~mincol,padchar,commachar,commaintervalD. Thepadchar defaultsto SPACE. Thecommachar defaults to COMMA. Thecommaintervaldefaults to three.

(formatnil"~:D"229345007)=>"229,345,007"

Visual Basic and its brethren (likeMS Excel) use a completelydifferent style and have ultra-flexible custom formatspecifiers like:

"_($* #,##0_)".

COBOL uses picture clauses like:

PICTURE $***,**9.99CR

Java offers aDecimal.Format Class that uses picture patterns (onefor positive numbers and an optional one for negatives) such as:"#,##0.00;(#,##0.00)". It allows arbitrary groupings includinghundreds and ten-thousands and uneven groupings. The special patterncharacters are non-localized (using a DOT for a decimal separator anda COMMA for a grouping separator). The user can supply an alternateset of symbols using the formatter’sDecimalFormatSymbols object.

Alternative Proposal (from Eric Smith, originally called Proposal II)

Make both the thousands separator and decimal separator userspecifiable but not locale aware. For simplicity, limit thechoices to a COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.The SPACE can be either U+0020 or U+00A0.

Whenever a separator is followed by a precision, it is adecimal separator and an optional separator preceding it is athousands separator. When the precision is absent, a lonespecifier means a thousands separator:

[[fill]align][sign][#][0][width][tsep][dsep precision][type]

Examples:

format(1234,"8.1f")-->'  1234.0'format(1234,"8,1f")-->'  1234,0'format(1234,"8.,1f")-->' 1.234,0'format(1234,"8 ,f")-->' 1 234,0'format(1234,"8d")-->'    1234'format(1234,"8,d")-->'   1,234'format(1234,"8_d")-->'   1_234'

This proposal meets mosts needs, but it comes at the expenseof taking a bit more effort to parse. Not every possibleconvention is covered, but at least one of the options (spacesor underscores) should be readable, understandable, and usefulto folks from many diverse backgrounds.

As shown in the examples, thewidth argument means the totallength including the thousands separators and decimal separators.

No change is proposed for the locale module.

The thousands separator is defined as shown above for types‘d’, ‘e’, ‘f’, ‘g’, ‘%’, ‘E’, ‘G’ and ‘F’. To allow futureextensions, it is undefined for other types: binary, octal,hex, character, etc.

The drawback to this alternative proposal is the difficultyof mentally parsing whether a single separator is a thousandsseparator or decimal separator. Perhaps it is too arcaneto link the decimal separator with the precision specifier.

Commentary

  • Some commenters do not like the idea of format strings at alland find them to be unreadable. Suggested alternatives includethe COBOL style PICTURE approach or a convenience function withkeyword arguments for every possible combination.
  • Some newsgroup respondants think there is no place for anyscripts that are not internationalized and that it is a stepbackwards to provide a simple way to hardwire a particular choice(thus reducing incentive to use a locale sensitive approach).
  • Another thought is that embedding some particular convention inindividual format strings makes it hard to change that conventionlater. No workable alternative was suggested but the general ideais to set the convention once and have it apply everywhere (otherscommented that locale already provides a way to do this).
  • There are some precedents for grouping digits in the fractionalpart of a floating point number, but this PEP does not venture intothat territory. Only digits to the left of the decimal point aregrouped. This does not preclude future extensions; it just focuseson a single, generally useful extension to the formatting language.
  • James Knight observed that Indian/Pakistani numbering systemsgroup by hundreds. Ben Finney noted that Chinese group byten-thousands. Eric Smith pointed-out that these are alreadyhandled by the “n” specifier in the locale module (albeit onlyfor integers). This PEP does not attempt to support all of thosepossibilities. It focuses on a single, relatively common groupingconvention that offers a quick way to improve readability in many(though not all) contexts.

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0378.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2026 Movatter.jp