Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

DjVu

From Wikipedia, the free encyclopedia
Computer file format
DjVu
Filename extensions
.djvu, .djv
Internet media type
image/vnd.djvu, image/x-djvu
Magic numberAT&T
Developed byAT&T Labs – Research
Initial release1998; 28 years ago (1998)
Latest release
Version 26[1]
April 2005; 20 years ago (2005-04)
Type of formatImage file formats
Contained byInterchange File Format
Open format?Yes

DjVu[a] is acomputerfile format designed primarily to storescanned documents, especially those containing a combination of text, line drawings,indexed color images, and photographs. It uses technologies such as image layer separation of text and background/images,progressive loading,arithmetic coding, andlossy compression forbitonal (monochrome) images. This allows high-quality, readable images to be stored in a minimum of space, so that they can be made available on theweb.

DjVu has been promoted as providing smaller files thanPDF for most scanned documents.[3] The DjVu developers report that color magazine pages compress to 40–70 kB, black-and-white technical papers compress to 15–40 kB, and ancient manuscripts compress to around 100 kB; a satisfactoryJPEG image typically requires 500 kB.[4] Like PDF, DjVu can contain anOCR text layer, making it easy to performcopy and paste and text search operations.

History

[edit]

The DjVu technology was originally developed, from 1996 to 2001,[4] byYann LeCun,Léon Bottou,Patrick Haffner,Paul G. Howard,Patrice Simard, andYoshua Bengio atAT&T Labs inRed Bank, New Jersey.[5]

Prior to the standardization ofPDF in 2008,[6][7] DjVu was considered superior because it is anopen file format,[citation needed] in contrast to theproprietary nature of PDF at the time. The declared higher compression ratio (and thus smaller file size) and the claimed ease of converting large volumes of text into DjVu format were other arguments for DjVu's superiority over PDF in 2004. Independent technologistBrewster Kahle in a 2004 talk on IT Conversations discussed the benefits of allowing easier access to DjVu files.[8][9]

The DjVu library distributed as part of the open-source packageDjVuLibre has become thereference implementation for the DjVu format. DjVuLibre has been maintained and updated by the original developers of DjVu since 2002.[10]

The DjVu file format specification has gone through a number of revisions, the most recent being from 2005.

Revision history
VersionRelease dateNotes
Unsupported: 1–19[citation needed]1996–1999Developmental versions by AT&T labs preceding the sale of the format toLizardTech.
Unsupported: Version 20[1]April 1999DjVu version 3. DjVu changed from a single-page format to a multipage format.
Supported: Version 21[1]September 1999Indirect storage format replaced. The searchable text layer was added.
Supported: Version 22[1]April 2001Page orientation, color JB2
Unsupported: Version 23[1]July 2002CID chunk
Unsupported: Version 24[1]February 2003LTAnno chunk
Supported: Version 25[1]May 2003NAVM chunk. Support for DjVu bookmarks (outlines) was added. Changes made by Versions 23 and 24 were made obsolete.
Latest version:Version 26[1]April 2005Text/line annotations
Legend:
Unsupported
Supported
Latest version
Preview version
Future version

The primary usage of the DjVu format has been the electronic distribution of documents with a quality comparable to that of printed documents. As that niche is also the primary usage for PDF, it was inevitable that the two formats would become competitors. It should however be observed that the two formats approach the problem of delivering high resolution documents in very different ways: PDF primarily encodes graphics and text asvectorised data, whereas DjVu primarily encodes them aspixmap images. This means PDF places the burden ofrendering the document on the reader, whereas DjVu places that burden on the creator.

During a number of years, significantly overlapping with the period when DjVu was being developed, there were no PDF viewers for free operating systems—a particular stumbling block was the rendering of vectorised fonts, which are essential for combining small file size with high resolution in PDF. Since displaying DjVu was a simpler problem for which free software was available, there were suggestions that thefree software movement should employ DjVu instead of PDF for distributing documentation; rendering for creating DjVu is in principle not much different from rendering for a device-specific printer driver, and DjVu can as a last resort be generated from scans of paper media. However, whenFreeType 2.0 in 2000 began to provide rendering of all major vectorised font formats, that specific advantage of DjVu began to erode.

In the 2000s, with the growth of theWorld Wide Web and before widespread adoption ofbroadband, DjVu was often adopted bydigital libraries as their format of choice, thanks to its integration with software likeGreenstone[11] and theInternet Archive,[12] browser plugins which allowed advanced online browsing, smaller file size for comparable quality of book scans and other image-heavy documents[13] and support for embedding[14] and searching full text fromOCR.[15][16]Some features such as the thumbnail previews were later integrated in the Internet Archive's BookReader[17] and DjVu browsing was deprecated in its favour as around 2015 some major browsers stopped supportingNPAPI and DjVu plugins with them.[18]

Design

[edit]

The DjVu file format is based on theInterchange File Format and is composed of hierarchically organized chunks. The IFF structure is preceded by a 4-byteAT&Tmagic number. Following is a singleFORM chunk with a secondary identifier of eitherDJVU orDJVM for a single-page or a multi-page document, respectively.

All the chunks can be contained in a single file in the case of the so called bundled documents, or can be contained in several files: one file for every page plus some files with shared chunks.

Chunk types in DjVu files
Chunk identifierContained byDescription
FORM:DJVUFORM:DJVMDescribes a single page. Can either be at the root of a document and be a single-page document or referred to from aDIRM chunk.
FORM:DJVMN/aDescribes a multi-page document. Is the document's root chunk.
FORM:DJVIFORM:DJVMContains data shared by multiple pages.
FORM:THUMFORM:DJVMContains thumbnails.
INFOFORM:DJVUMust be the first chunk. Describes the page width, height, format version,resolution,gamma, and rotation.
DIRMFORM:DJVMMust be the first chunk. References otherFORM chunks. These chunks can either follow this chunk inside theFORM:DJVM chunk or be contained in external files. These types of documents are referred to asbundled orindirect, respectively.
NAVMFORM:DJVMIf present, must immediately follow theDIRM chunk. Contains a BZZ-compressed outline of the document.
ANTa, ANTzFORM:DJVI or FORM:DJVUAnnotations.
TXTa, TXTzFORM:DJVUUnicode text and layout information.
INCLFORM:DJVUThe ID of an includedFORM::DJVI chunk.
SjbzFORM:DJVUBZZ compressed JB2 bitonal data used to store mask.
DjbzFORM:DJVI or FORM:DJVUShared shape table.
WMRM?JB2 data required to remove a watermark.
CIDaFORM:DJVUObsolete chunk with unknown content.

DjVu divides a single image into many different images, then compresses them separately. To create a DjVu file, the initial image is first separated into three images: a background image, a foreground image, and a mask image. The background and foreground images are typically lower-resolution color images (e.g., 100 dpi); the mask image is a high-resolution bilevel image (e.g., 300 dpi) and is typically where the text is stored. The background and foreground images are then compressed using awavelet-based compression algorithm named IW44.[4] The mask image is compressed using a method called JB2 (similar toJBIG2). The JB2 encoding method identifies nearly identical shapes on the page, such as multiple occurrences of a particular character in a given font, style, and size. It compresses the bitmap of each unique shape separately, and then encodes the locations where each shape appears on the page. Thus, instead of compressing a letter "e" in a given font multiple times, it compresses the letter "e" once (as a compressed bit image) and then records every place on the page it occurs.

Optionally, these shapes may be mapped toUTF-8 codes (either by hand or potentially by atext recognition system) and stored in the DjVu file. If this mapping exists, it is possible to select and copy text.

Since JB2 (also called DjVuBitonal) is a variation on JBIG2, working on the same principles,[19] both compression methods have the same problems when performing lossy compression. In 2013 it emerged that Xerox photocopiers and scanners had been substituting digits for similar looking ones, for example replacing a 6 with an 8.[20] A DjVu document has been spotted in the wild with character substitutions, such as an n with bleeding serifs turning into a u and an o with a spot inside turning into an e.[21] Whether lossy compression has occurred is not stored in the file.[1] Thus the DjView viewing application can't warn the user thatglyph substitutions might have occurred, neither when opening a lossy compressed file, nor in the Information or Metadata dialogue boxes.[22]

Licensing

[edit]

DjVu is anopen file format with patents.[3] The file format specification is published, as well as source code for the reference library.[3] The original authors distribute anopen-source implementation named "DjVuLibre" under theGNU General Public License and a patent grant.[23] The rights to the commercial development of the encoding software have been transferred to different companies over the years, includingAT&T Corporation,LizardTech,[24]Celartem[25] andePapyrus Solutions K.K. (formerlyCuminas[26] before joining ePapyrus Solutions, Inc.[27]).[28] Patents typically have an expiry term of about 20 years.

Celartem acquired LizardTech andExtensis.[29][30][25][31][32]

Format adoption

[edit]

Free creators, manipulators, converters, web browser plug-ins, and desktop viewers are available.[2]

In 2002, the DjVu file format was chosen by theInternet Archive as a format in which itsMillion Book Project provides scannedpublic-domain books online (along withTIFF and PDF).[33] In February 2016, the Internet Archive announced that DjVu would no longer be used for new uploads, among other reasons citing the format's declining use and the difficulty of maintaining theirJava applet based viewer for the format.[18]

Wikimedia Commons, a media repository used byWikipedia among others, conditionally permits PDF and DjVu media files.[34]

Format software

[edit]

any2djvu converts.ps.ps.gz.pdf to .djvu (a DjVu file) via the Any2DjVu server, maintained byLéon Bottou andYann LeCun, hosted by theCourant Institute of Mathematical Sciences atNew York University, with hardware donated by Caminova, Inc.[35][36]

Jakub Wilk'spdf2djvu creates DjVu files from PDF files for GNU/Linux OS[37] (archived),[38] includingUbuntu, andCygwin (orphaned).[39][40]

The selection of downloadable DjVu viewers is wider onLinux distributions than it is on Windows or macOS. Additionally, the format is rarely supported by proprietary scanning software.

DjVu is supported by a number of multi-format document viewers and e-book reader software on Linux (Okular,Evince, Zathura), Windows (Okular andSumatraPDF) and Android (Document Viewer,[41]FBReader, EBookDroid,[42] PocketBook).

DjVu.js Viewer is a project that develops aprogram library, aweb application, andbrowser extensions forFirefox[43] andGoogle Chrome,[44] to view DjVu files.[45]

See also

[edit]

Notes

[edit]
  1. ^Although usually pronounced as an initialism "D-J-V-U", the file type was intended to have the pronunciationDAY-zhah-VOO (/ˌdʒɑːˈv/) afterFrenchdéjà vu.[2]

References

[edit]
  1. ^abcdefghi"Lizardtech DjVu Reference"(PDF).Cuminas.jp. p. 25. Retrieved7 December 2021.
  2. ^ab"DjVu.org – the premier menu for djvu resources". djvu.org. Archived from the original on 2017-06-29. Retrieved2017-07-02.
  3. ^abc"What is DjVu – DjVu.org". DjVu.org. Archived fromthe original on 2019-01-21. Retrieved2009-03-05.
  4. ^abcBottou, Leon; Haffner, Patrick; Howard, Paul G.; Simard, Patrice; Bengio, Yoshua; Le Cun, Yann (1 July 1998)."High Quality Document Image Compression with DjVu"(PDF).Journal of Electronic Imaging.7 (3):410–425.doi:10.1117/1.482609.
  5. ^"Yann's DjVu Page".yann.lecun.com. Retrieved8 July 2025.
  6. ^"ISO 32000-1:2008 – Document management – Portable document format – Part 1: PDF 1.7".Iso.org. 2008-07-01. Retrieved2010-02-21.
  7. ^Orion, Egan (2007-12-05)."PDF 1.7 is approved as ISO 32000".The Inquirer.Incisive Media. Archived fromthe original on December 13, 2007. Retrieved2007-12-05.
  8. ^Brewster Kahle (December 16, 2004)."Universal Access to All Knowledge"(Audio; Speech at 1h:31 m:20s). Conversations Network.
  9. ^"LizardTech To Open Source A DjVu Java Viewer".ECM Connection. 7 December 2004. Retrieved18 August 2017.
  10. ^"DjVuLibre: Open Source DjVu library and viewer".djvu.sourceforge.net.
  11. ^"nzdl:projects - Greenstone".Wiki.greenstone.org. Retrieved7 December 2021.
  12. ^Eric Rumsey (2018-09-05)."Google Books vs DjVu in Internet Archive".Blog.libuiowa.edu. Archived fromthe original on 2018-08-22. Retrieved2018-08-21.
  13. ^Eric Rumsey (2018-09-10)."DjVu again".Blog.libuiowa.edu. Archived fromthe original on 2018-08-22. Retrieved2018-08-21.
  14. ^Solcan, Mihail Radu (2009-02-03)."Insert OCRed text in DjVu (automatic method)".www.ub-filosofie.ro. Faculty of Philosophy at theUniversity of Bucharest.
  15. ^Jeff Kaplan (2004-12-09)."New book collection: color scans, djvu, some pdf"(PDF).Blog.archive.org.
  16. ^Janusz S. Bień (2011-09-12). "Efficient search in hidden text of large DjVu documents".Advanced Language Technologies for Digital Libraries(PDF). Lecture Notes in Computer Science. Vol. 6699. pp. 1–14.doi:10.1007/978-3-642-23160-5_1.ISBN 978-3-642-23159-9.S2CID 3095526. Archived fromthe original(PDF) on 2021-11-03. Retrieved2021-10-16.
  17. ^Eric Rumsey (2010-09-10)."Internet Archive's BookReader Thumbnail View".Blog.libuiowa.edu. Archived fromthe original on 2018-08-22. Retrieved2018-08-21.
  18. ^abBrewster Kahle; Jeff Kaplan (2016-02-26)."DjVu files for new uploads".Archive.org.
  19. ^Artem Mikheev, Luc Vincent, Mike Hawrylycz & Léon Bottou:Electronic Document Publishing Using DjVu
  20. ^See theJBIG2 article for more details and references.
  21. ^"This document caused me a fair bit of consternation transcribing it on a site th... | Hacker News".News.ycombinator.com. Retrieved7 December 2021.
  22. ^"DjVuLibre".SourceForge.net. Retrieved7 December 2021.
  23. ^"DjVuLibre: Open Source DjVu library and viewer".
  24. ^Extensis."Company – About – LizardTech".Lizardtech.com. Archived fromthe original on 2018-01-15. Retrieved2018-01-14.
  25. ^ab"Celartem, Inc.: Private Company Information – Bloomberg".Bloomberg.com.
  26. ^"会社情報 - Cuminas Corporation".Cuminas.jp. Archived fromthe original on 2018-01-15. Retrieved2018-01-14.
  27. ^株式譲渡および完全子会社化のお知らせ [Notice regarding share transfer and becoming a wholly owned subsidiary].epapyrus.jp (in Japanese). 2022-06-03. Retrieved2024-12-08.
  28. ^会社名変更のお知らせ [Notice of company name change].epapyrus.jp (in Japanese). 2023-11-06. Retrieved2024-12-08.
  29. ^"Company Overview – Celartem Technology, Inc".Celartem.com. Archived fromthe original on 27 May 2019. Retrieved7 December 2021.
  30. ^"Celartem Technology Announces Merger of US Holdings – Extensis.com". Archived fromthe original on 2018-01-15. Retrieved2018-01-14.
  31. ^"Celartem Technology Inc.: Private Company Information – Bloomberg".Bloomberg.com.
  32. ^"Celartem Sells Extensis and LizardTech Plugins and XTensions to onOne Software – Big Picture – Wide Format Printing".bigpicture.net. 28 July 2005.
  33. ^"Image file formats – OLPC". Wiki.laptop.org. Retrieved2008-09-09.
  34. ^Wikimedia Commons. Project scope: PDF and DjVu.
  35. ^"Welcome to the Any2DjVu Server".DjVu.org. Retrieved8 July 2025.
  36. ^"help: What the Any2Djvu Server Does".any2djvu.djvu.org. Retrieved8 July 2025.
  37. ^"pdf2djvu".command-not-found.com. Retrieved8 July 2025.
  38. ^Wilk, Jakub (15 April 2025)."pdf2djvu".jwilk's archive. Retrieved8 July 2025 – viagithub.com.
  39. ^"pdf2djvu".Package Summary.Cygwin .com. Retrieved8 July 2025.
  40. ^"PDF Tricks for the Linux Command-Line".Open Source For You. 4 December 2023. Retrieved8 July 2025.Converting PDF to DjVu
  41. ^"Document Viewer". Sufficiently Secure. 2022-04-04. Retrieved2022-04-09.
  42. ^"EBookDroid".Google Code Archive - code.google.com. Archived fromthe original on 30 August 2021. Retrieved8 July 2025.a document viewer for Android.
  43. ^DjVu.js Viewer Firefox
  44. ^DjVu.js Viewer Google Chrome
  45. ^DjVu.js Viewer (github): "It requires access to third-party websites only to render embedded documents (<embed> tag) and open links to .djvu files (on any website). The extensions, by and large, are a local copy of theDjVu.js Viewer which is available ondjvu.js.org".

External links

[edit]
Wikimedia Commons has media related toDjVu file format.
Video
compression
ISO,IEC,
MPEG
ITU-T,VCEG
SMPTE
TrueMotion and AOMedia
Chinese Standard
  • AVS1 P2/AVS+(GB/T 20090.2/16)
  • AVS2 P2(GB/T 33475.2,GY/T 299.1)
    • HDR Vivid(GY/T 358)
  • AVS3 P2(GY/T 368)
Others
Audio
compression
ISO,IEC,
MPEG
ITU-T
IETF
3GPP
ETSI
Bluetooth SIG
Chinese Standard
Others
Image
compression
IEC,ISO,IETF,
W3C,ITU-T,JPEG
Others
Containers
ISO,IEC
ITU-T
IETF
SMPTE
Others
Collaborations
Methods
Lists
SeeCompression methods for techniques andCompression software for codecs
Editable document formats
Fixed document formats
Related topics
Raster
Raw
Vector
Compound
Metadata
Retrieved from "https://en.wikipedia.org/w/index.php?title=DjVu&oldid=1327443386"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp