Movatterモバイル変換

REC-png.html

PNG (Portable Network Graphics) Specification

Version 1.0

W3C Recommendation01-October-1996

Previous page
Next page
Table of contents

12. Appendix: Rationale

(This appendix is not part of the formal PNG specification.)

This appendix gives the reasoning behind some of the design decisionsin PNG. Many of these decisions were the subject of considerabledebate. The authors freely admit that another group might have madedifferent decisions; however, we believe that our choices aredefensible and consistent.

12.1. Why a new file format?

Does the world really need yet another graphics format? We believeso. GIF is no longer freely usable, but no other commonly used formatcan directly replace it, as is discussed in more detail below. Wemight have used an adaptation of an existing format, for example GIFwith an unpatented compression scheme. But this would require newcode anyway; it would not be all that much easier to implement than awhole new file format. (PNG is designed to besimple to implement, with the exception of the compressionengine, which would be needed in any case.) We feel that this is anexcellent opportunity to design a new format that fixes some of theknown limitations of GIF.

12.2. Why these features?

The features chosen for PNG are intended to address the needs ofapplications that previously used the special strengths of GIF.In particular, GIF is well adapted for online communicationsbecause of its streamability and progressive display capability.PNG shares those attributes.

We have also addressed some of the widely known shortcomings of GIF.In particular, PNG supports truecolor images. We know of no widelyused image format that losslessly compresses truecolor images aseffectively as PNG does. We hope that PNG will make use oftruecolor images more practical and widespread.

Some form of transparency control is desirable for applicationsin which images are displayed against a background or together withother images. GIF provided a simple transparent-color specificationfor this purpose. PNG supports a full alpha channel as well astransparent-color specifications. This allows both highly flexibletransparency and compression efficiency.

Robustness against transmission errors has been an importantconsideration. For example, images transferred across Internet areoften mistakenly processed as text, leading to file corruption. PNGis designed so that such errors can be detected quickly and reliably.

PNG has been expressly designed not to be completelydependent on a single compression technique. Althoughdeflate/inflate compression is mentioned in thisdocument, PNG would still exist without it.

12.3. Why not these features?

Some features have been deliberately omitted from PNG. These choiceswere made to simplify implementation of PNG, promote portability andinterchangeability, and make the format as simple and foolproof aspossible for users. In particular:

There is no uncompressed variant of PNG. It is possible to storeuncompressed data by using only uncompressed deflate blocks (a featurenormally used to guarantee that deflate does not make incompressibledata much larger). However, PNG software must support fulldeflate/inflate; any software that does not is not compliant with the PNGstandard. The two most important features of PNG---portability andcompression---are absolute requirements for online applications, andusers demand them. Failure to support full deflate/inflate compromisesboth of these objectives.
There is no lossy compression in PNG. Existing formats such as JFIFalready handle lossy compression well. Furthermore, available lossycompression methods (e.g., JPEG) are far from foolproof --- apoor choice of quality level can ruin an image. To avoid userconfusion and unintentional loss of information, we feel it is best tokeep lossy and lossless formats strictly separate. Also, lossy compressionis complex to implement. Adding JPEG support to a PNG decodermight increase its size by an order of magnitude. This wouldcertainly cause some decoders to omit support for the feature, whichwould destroy our goal of interchangeability.
There is no support for CMYK or other unusual color spaces.Again, this is in the name of promoting portability. CMYK, inparticular, is far too device-dependent to be useful as a portableimage representation.
There is no standard chunk for thumbnail views of images. Indiscussions with software vendors who use thumbnails in theirproducts, it has become clear that most would not use a "standard"thumbnail chunk. For one thing, every vendor has a different ideaof what the dimensions and characteristics of a thumbnail ought tobe. Also, some vendors keep thumbnails in separate filesto accommodate varied image formats; they are not going to stop doing thatsimply because of a thumbnail chunk in one new format. Proprietarychunks containing vendor-specific thumbnails appear to be morepractical than a common thumbnail format.

It is worth noting that private extensions to PNG could easily addthese features. We will not, however, include them as part of thebasic PNG standard.

PNG also does not support multiple images in one file.This restriction is a reflection of the reality that many applicationsdo not need and will not support multiple images per file.In any case, single images are afundamentally different sort of object from sequences of images.Rather than make false promises ofinterchangeability, we have drawn a clear distinction betweensingle-image and multi-image formats. PNG is a single-image format.(But seeMultiple-image extension.)

12.4. Why not use format X?

Numerous existing formats were considered before decidingto develop PNG. None could meet the requirements we feltwere important for PNG.

GIF is no longer suitable as a universal standard because of legalentanglements. Although just replacing GIF's compression method wouldavoid that problem, GIF does not support truecolor images, alphachannels, or gamma correction. The spec has more subtle problems too.Only a small subset of the GIF89 spec is actually portable across avariety of implementations, but there is no codification of the mostportable part of the spec.

TIFF is far too complex to meet our goals of simplicity andinterchangeability. Defining a TIFF subset would meet that objection,but would frustrate users making the reasonable assumption that a filesaved as TIFF from their existing software would load into a programsupporting our flavor of TIFF. Furthermore, TIFF is not designed for streamprocessing, has no provision for progressive display, and does notcurrently provide any good, legally unencumbered, lossless compressionmethod.

IFF has also been suggested, but is not suitable in detail: availableimage representations are too machine-specific or not adequatelycompressed. The overall chunk structure of IFF is a useful conceptthat PNG has liberally borrowed from, but we did not attempt to bebit-for-bit compatible with IFF chunk structure. Again this is due todetailed issues, notably the fact that IFF FORMs are not designed tobe serially writable.

Lossless JPEG is not suitable because it does not provide for thestorage of indexed-color images. Furthermore, its lossless truecolorcompression is often inferior to that of PNG.

12.5. Byte order

It has been asked why PNG uses network byte order.We have selected one byte ordering and used itconsistently. Which order in particular isof little relevance, but network byte orderhas the advantage that routines to convertto and from it are already available on anyplatform that supports TCP/IP networking,including all PC platforms.The functions are trivial and will be includedin the reference implementation.

12.6. Interlacing

PNG's two-dimensional interlacing scheme is more complexto implement than GIF's line-wise interlacing. It also costs alittle more in file size. However, it yields an initial imageeight times faster than GIF (the first pass transmitsonly 1/64th of the pixels, compared to 1/8th for GIF). Althoughthis initial image is coarse, it is useful in many situations.For example, if the image is a World Wide Web imagemap that theuser has seen before, PNG's first pass is often enough to determinewhere to click. The PNG scheme also looks better than GIF's, becausehorizontal and vertical resolution never differ by more than a factorof two; this avoids the odd "stretched" look seen when interlacedGIFs are filled in by replicating scanlines. Preliminary resultsshow that small text in an interlaced PNG image is typicallyreadable about twice as fast as in an equivalent GIF, i.e., afterPNG's fifth pass or 25% of the image data, instead of after GIF'sthird pass or 50%. This is again due to PNG's more balanced increasein resolution.

12.7. Why gamma?

It might seem natural to standardize on storing sample values that arelinearly proportional to light intensity (that is, have gamma of 1.0).But in fact, it is common for images to have a gamma of less than 1.There are three good reasons for this:

For reasons detailed inGamma Tutorial,all video cameras apply a "gamma correction" function to the intensityinformation. This causes the video signal to have a gamma of about0.5 relative to the light intensity in the original scene. Thus,images obtained by frame-grabbing video already have a gamma of about0.5.
The human eyehas a nonlinear response to intensity, so linear encodingof samples either wastes sample codes in bright areas of theimage, or provides too few sample codes to avoid banding artifactsin dark areas of the image, or both. At least 12 bitsper sample are needed to avoid visible artifacts in linear encodingwith a 100:1 image intensity range.An image gamma in the range 0.3 to 0.5 allocates sample valuesin a way that roughly corresponds to the eye's response, so that8 bits/sample are enough to avoid artifacts caused by insufficientsample precision in almost all images.This makes "gamma encoding" a much better way ofstoring digital images than the simpler linear encoding.
Many images are created on PCs or workstations with no gammacorrection hardware and no software willing to provide gammacorrection either. In these cases, the images have had theirlighting and color chosen to look best on this platform --- theycan be thought of as having "manual" gamma correction built in.To see what the image author intended,it is necessary to treat such images as having a file_gammavalue in the range 0.4-0.6, depending on the room lightinglevel that the author was working in.

In practice, image gamma values around 1.0 and around 0.5 are bothwidely found. Older image standards such as GIF often do not accountfor this fact. The JFIF standard specifies that images in that formatshould use linear samples, but many JFIF images found on the Internetactually have a gamma somewhere near 0.4 or 0.5. Thevariety of images found and the variety of systems that people displaythem on have led to widespread problems with images appearing"too dark" or "too light".

PNG expects viewers to compensate for image gamma at the time thatthe image is displayed. Another possible approach is to expect encodersto convert all images to a uniform gamma at encoding time. While thatmethod would speed viewers slightly, it has fundamental flaws:

Gamma correction is inherently lossy due to quantization and roundoff error.Requiring conversion at encoding time thus causes irreversibleloss. Since PNG is intended to be a lossless storage format, this isundesirable; we should store unmodified source data.
The encoder might not know the source gamma value. If the decoderdoes gamma correction at viewing time, it can adjust the gamma(change the displayed brightness) in response to feedback from ahuman user. The encoder has no such recourse.
Whatever "standard" gamma we settled on would be wrong forsome displays. Hence viewers would still need gamma correctioncapability.

Since there will always be images with no gamma or an incorrectrecorded gamma, good viewers will need to incorporate gammaadjustment code anyway. Gamma correction at viewing time is thus theright way to go.

SeeGamma Tutorial for more information.

12.8. Non-premultiplied alpha

PNG uses "unassociated" or "non-premultiplied" alpha so thatimages with separate transparency masks can be stored losslessly.Another common technique, "premultiplied alpha", stores pixel valuespremultiplied by the alpha fraction; in effect, the image is alreadycomposited against a black background. Any image data hidden by thetransparency mask is irretrievably lost by that method, sincemultiplying by a zero alpha value always produces zero.

Some image rendering techniques generate images with premultipliedalpha (the alpha value actually represents how much of the pixel iscovered by the image). This representation can be converted to PNG bydividing the sample values by alpha, except where alpha is zero. Theresult will look good if displayed by a viewer that handles alphaproperly, but will not look very good if the viewer ignores the alphachannel.

Although each form of alpha storage has its advantages, we did notwant to require all PNG viewers to handle both forms. We standardizedon non-premultiplied alpha as being the lossless and more general case.

12.9. Filtering

PNG includes filtering capability because filtering can significantlyreduce the compressed size of truecolor and grayscale images.Filtering is also sometimes of value on indexed-color images, althoughthis is less common.

The filter algorithms are defined to operate on bytes, rather thanpixels; this gains simplicity and speed with very little cost incompression performance. Tests have shown that filtering isusually ineffective for images with fewer than 8 bits per sample, soproviding pixelwise filtering for such images would be pointless.For 16 bit/sample data, bytewise filtering is nearly as effective aspixelwise filtering, because MSBs are predicted from adjacent MSBs,and LSBs are predicted from adjacent LSBs.

The encoder is allowed to change filters for each new scanline.This creates no additional complexity for decoders, since a decoder isrequired to contain defiltering logic for every filter type anyway.The only cost is an extra byte per scanline in the pre-compressiondatastream. Our tests showed that when the same filter is selectedfor all scanlines, this extra byte compresses away to almost nothing,so there is little storage cost compared to a fixed filter specifiedfor the whole image. And the potential benefits of adaptive filteringare too great to ignore. Even with the simplistic filter-choiceheuristics so far discovered, adaptive filtering usually outperformsfixed filters. In particular, an adaptive filter can change behaviorfor successive passes of an interlaced image; a fixed filter cannot.

12.10. Text strings

Most graphics file formats include the ability to store some textualinformation along with the image. But many applications need morethan that: they want to be able to store several identifiable piecesof text. For example, a database using PNG files to store medicalX-rays would likely want to include patient's name, doctor's name,etc. A simple way to do this in PNG would be to invent newprivate chunks holding text. The disadvantage of such an approachis that other applications would have no idea what was in thosechunks, and would simply ignore them. Instead, we recommend that textualinformation be stored in standardtEXt chunks with suitablekeywords. Use oftEXt tells any PNG viewer that the chunkcontains text that might be of interest to a human user. Thus, a personlooking at the file with another viewer will still be able to see thetext, and even understand what it is if the keywords are reasonablyself-explanatory. (To this end, we recommend spelled-out keywords,not abbreviations that will be hard for a person to understand.Saving a few bytes on a keyword is false economy.)

The ISO 8859-1 (Latin-1) character set was chosen as a compromisebetween functionality and portability. Some platforms cannotdisplay anything more than 7-bit ASCII characters, while otherscan handle characters beyond the Latin-1 set. We felt that Latin-1represents a widely useful and reasonably portable character set.Latin-1 is a direct subset of character sets commonly used onpopular platforms such as Microsoft Windows and X Windows. It canalso be handled on Macintosh systems with a simple remapping ofcharacters.

There is presently no provision for text employing charactersets other than Latin-1. Werecognize that the need for other character sets will increase.However, PNG already requires that programmers implement anumber of new and unfamiliar features, and text representationis not PNG's primary purpose. Since PNG provides for the creationand public registration of new ancillary chunks of general interest,we expect that text chunks for other character sets, suchas Unicode, eventually will be registered and increase gradually inpopularity.

12.11. PNG file signature

The first eight bytes of a PNG file always contain the followingvalues:

   (decimal)              137  80  78  71  13  10  26  10   (hexadecimal)           89  50  4e  47  0d  0a  1a  0a   (ASCII C notation)    \211   P   N   G  \r  \n \032 \n

This signature both identifies the file as a PNG file and provides forimmediate detection of common file-transfer problems.The first two bytes distinguish PNG files on systems that expect thefirst two bytes to identify the file type uniquely. The first byte ischosen as a non-ASCII value to reduce the probability that a text filemay be misrecognized as a PNG file; also, it catches bad filetransfers that clear bit 7. Bytes two through four name the format.The CR-LF sequence catches bad file transfers that alter newlinesequences. The control-Z character stops file display under MS-DOS.The final line feed checks for the inverse of the CR-LF translationproblem.

A decoder may further verify that the next eight bytes contain anIHDR chunk header with the correct chunk length; this willcatch bad transfers that drop or alter null (zero) bytes.

Note that there is no version number in the signature, nor indeedanywhere in the file. This is intentional: the chunk mechanismprovides a better, more flexible way to handle format extensions, asexplained inChunk naming conventions.

12.12. Chunk layout

The chunk design allows decoders to skip unrecognized or uninterestingchunks: it is simply necessary to skip the appropriate number ofbytes, as determined from the length field.

Limiting chunk length to (2^31)-1 bytes avoids possible problems forimplementations that cannot conveniently handle 4-byte unsigned values.In practice, chunks will usually be much shorter than that anyway.

A separate CRC is provided for each chunk in order to detectbadly-transferred images as quickly as possible. In particular,critical data such as the image dimensions can be validated beforebeing used.

The chunk length is excluded from the CRC so that the CRC can becalculated as the data is generated; this avoids a second pass overthe data in cases where the chunk length is not known in advance.Excluding the length from the CRC does not createany extra risk of failing to discover file corruption, since if thelength is wrong, the CRC check will fail: the CRC will be computed onthe wrong set of bytes and then be tested against the wrong value fromthe file.

12.13. Chunk naming conventions

The chunk naming conventions allow safe, flexible extension of the PNGformat. This mechanism is much better than a format version number,because it works on a feature-by-feature basis rather than being anoverall indicator. Decoders can process newer files if and only ifthe files use no unknown critical features (as indicated by findingunknown critical chunks). Unknown ancillary chunks can be safelyignored. We decided against having an overall format version numberbecause experience has shown that format version numbers hurtportability as much as they help. Version numbers tend to be setunnecessarily high, leading to older decoders rejecting files thatthey could have processed (this was a serious problem for severalyears after the GIF89 spec came out, for example). Furthermore,private extensions can be made either critical or ancillary, andstandard decoders should react appropriately; overall version numbersare no help for private extensions.

A hypothetical chunk for vector graphics would be a critical chunk,since if ignored, important parts of the intended image would bemissing. A chunk carrying the Mandelbrot set coordinates for afractal image would be ancillary, since other applications coulddisplay the image without understanding what the image represents.In general, a chunk type should be madecritical only if it is impossible to display a reasonablerepresentation of the intended image without interpreting that chunk.

The public/private property bit ensures that any newly defined publicchunk type name cannot conflict with proprietary chunks that could bein use somewhere. However, this does not protect users of privatechunk names from the possibility that someone else may use the samechunk name for a different purpose. It is a good idea to putadditional identifying information at the start of the data for anyprivate chunk type.

When a PNG file is modified, certain ancillary chunks may need to bechanged to reflect changes in other chunks. For example, a histogramchunk needs to be changed if the image data changes. If the file editordoes not recognize histogram chunks, copying them blindly to a newoutput file is incorrect; such chunks should be dropped. Thesafe/unsafe property bit allows ancillary chunks to be markedappropriately.

Not all possible modification scenarios are covered by the safe/unsafesemantics. In particular, chunks that are dependent on the total filecontents are not supported. (An example of such a chunk is an indexofIDAT chunk locations within the file: adding a commentchunk would inadvertently break the index.) Definition of such chunks isdiscouraged. If absolutely necessary for a particular application,such chunks can be made critical chunks, with consequent loss ofportability to other applications. In general, ancillary chunks candepend on critical chunks but not on other ancillary chunks. It isexpected that mutually dependent information should be put into asingle chunk.

In some situations it may be unavoidable to make one ancillary chunkdependent on another. Although the chunk property bits are insufficientto represent this case, a simple solution is available: in thedependent chunk, record the CRC of the chunk depended on. It canthen be determined whether that chunk has been changed by some otherprogram.

The same technique can be useful for other purposes. For example, ifa program relies on the palette being in a particular order, it canstore a private chunk containing the CRC of thePLTE chunk.If this value matches when the file is again read in, then it provideshigh confidence that the palette has not been tampered with. Notethat it is not necessary to mark the private chunk unsafe-to-copywhen this technique is used; thus, such a private chunk can surviveother editing of the file.

12.14. Palette histograms

A viewer may not be able to provide as many colors as are listed inthe image's palette. (For example, some colors could be reserved by awindow system.) To produce the best results in this situation, it ishelpful to have information about the frequency with which each paletteindex actually appears, in order to choose the best palette fordithering or to drop the least-used colors. Since images are oftencreated once and viewed many times, it makes sense to calculate thisinformation in the encoder, although it is not mandatory for theencoder to provide it.

Other image formats have usually addressed this problem by specifyingthat the palette entries should appear in order of frequency of use.That is an inferior solution, because it doesn't give the viewernearly as much information: the viewer can't determine how much damagewill be done by dropping the last few colors. Nor does a sortedpalette give enough information to choose a target palette fordithering, in the case that the viewer needs to reduce the number ofcolors substantially. A palette histogram provides the informationneeded to choose such a target palette without making a pass over theimage data.

Previous page
Next page
Table of contents

[8]ページ先頭