Movatterモバイル変換


[0]ホーム

URL:


 
Digital Preservation Home |Digital Formats Home    

 Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction |Sustainability Factors |Content Categories |Format Descriptions |Contact
Format Description Categories >>Browse Alphabetical List

MBOX Email Format

>> Back
Table of Contents
Format Description PropertiesExplanation of format description terms

Identification and descriptionExplanation of format description terms

Full nameMBOX Email Format
Description

MBOX (sometimes known as Berkeley format) is a generic term for a family of related file formats used for storing collections of electronic mail messages. MBOX formats store all of the messages of an entire folder (not an entire mailbox) in a single database file and new messages are appended to the end of the file. Each message is immediately prefaced by a separation line and terminated by an empty line. Only the first message in an MBOX database file will be prefaced by a separator line, while every other message will begin with two end-of-line sequences (one at the end of the message itself, and another to mark the end of the message within the MBOX database file stream) and a separator line (marking the new message). The end of the database file is implicitly reached when no more message data or separator lines are found.

A message encoded in MBOX format begins with a "From " line, continues with a series of non-"From " lines, and ends with a blank line.  A "From " line means any line in the message or header that  begins with the five characters 'F', 'r', 'o', 'm', and ' ' (space). The "From " line structure isFrom sender date moreinfo:

  • sender, usually the envelope sender of the message (e.g.,[email protected]) is one word without spaces or tabs
  • date is the delivery date of the message which always contains exactly 24 characters in Standard C asctime format (i.e. in English, with the redundant weekday, and without timezone information)
  • moreinfo is optional and it may contain arbitrary information.

After the "From " line is the message itself inRFC 5322 format. The final line is a completely blank line with no spaces or tabs.

There are four variants of MBOX:MBOXO,MBOXRD,MBOXCL andMBOXCL2. The four versions all build on the common MBOX structure and are differentiated primarily by changes to the "From " line and and the use of the "Content Length:" field in the message header in determining the start of a new message within the aggregated file.  Moreover, the versions and tool sets for one version are not necessary compatible with one another. SeeGeneral section for incompatibility details.

MBOX files also include the message attachments, if any, in their original MIME format.

Production phaseUsed for content in initial (by message authors), middle (by archives) or final state (by message recipients/other end users).
Relationship to other formats
    Defined viaIMF, Internet Mail Format
    Has subtypeMBOXO, MBOXO Email Format
    Has subtypeMBOXRD, MBOXRD Email Format
    Has subtypeMBOXCL, MBOXCL Email Format
    Has subtypeMBOXCL2, MBOXCL2 Email Format
    Affinity toEMLX, Apple Mail Email Format

Local useExplanation of format description terms

LC experience or existing holdingsThe Library of Congress includes MBOX files in its collections, especially in the Manuscripts and Music Divisions as well as other personal papers repositories.
LC preferenceThe Library of Congress Recommended Formats Statement (RFS) lists MBOX as an acceptable format forEmail. The RFS does not specify a variant of MBOX.

Sustainability factorsExplanation of format description terms

DisclosureThere is no authoritative specification aside fromRFC 4155. Its subtypes are partially documented and there is variation within the subtypes.
    DocumentationInformation available from a number of sources includingRFC 4155, andQmail.org's mbox - file containing mail messages (link via Internet Archive). In addition,Jonathan de Boyne Pollard (link via Internet Archive) informally describes the many incompatibilities among the MBOX variants.
Adoption

Prom reports that, while not a native format for many proprietary clients, MBOX (andEML) has "achieved a certain status as de factostandards because most modern email clients and servers can import and export one or both ofthe formats" including Thunderbird, Apple Mail, Outlook and Eudora. In addition, external programs such as Aid4Mail, Emailchemy and Xena can convert betweenthe two formats and numerous proprietary formats. Once in an MBOX orEML format, the data can be parsed into XML using standardized schemas such as the Email Account Schema defined in theCERP project.

TheSmithsonian Institution Archives uses the CERP-developed toolset to normalize messages to MBOX before converting to XML. TheePADD project developed at Stanford University Librariesingests and exports MBOX (link via Internet Archive) alongside other formats as of 2023 with version 10.0. Native or normalized MBOX files also can be used as access copies because they can be imported into a variety of email clients.

    Licensing and patents[Unknown, probably none].
TransparencyText processing tools can be readily used on the plain text files used to store the email messages.
Self-documentation

The message structure helps declare the subtype but there’s a lot of variation even within the established patterns.

Accessibility Features

MBOX has no specific attributes to support accessibility.Comments welcome.

External dependenciesNone
Technical protection considerationsEncryption is possible through external applications but no encryption options are natively built into the format. See, for example,Protecting the contents of the profile - mail.

Quality and functionality factorsExplanation of format description terms


File type signifiers and format identifiersExplanation of format description terms

TagValueNote
Filename extensionmbox
MBOX database files sometimes have an "mbox" extension, but according to the specification, this is not required nor expected.
Internet Media Typeapplication/mbox
SeeIANA and also RFC 4155
Magic numbersNot applicable. 

MBOX database files, which are the focus of this document, do not have a magic number. As described inRFC 4155, MBOX database files can be recognized by having a leading character sequence of "From" followed by a single Space character (0x20), followed by additional printable character data.Gary Kessler states that MBOX TOC files, which act as an index to the MBOX database file, have the magic number 00 0D BB A0, followed by four bytes which appear to be the number of e-mails in the associated MBOX file.Comments welcome.

Pronom PUIDfmt/720
Seehttp://www.nationalarchives.gov.uk/PRONOM/fmt/720.
Wikidata Title IDQ285972
Seehttps://www.wikidata.org/wiki/Q285972
OtherNF00247
Seehttps://www.archives.gov/files/lod/dpframework/id/NF00247.ttl.

NotesExplanation of format description terms

General

Jonathan de Boyne Pollard (link via Internet Archive) describes the many incompatibilities among the MBOX formats:

  • Messages cannot be reliably read fromMBOXO andMBOXRD format mailboxes byMBOXCL andMBOXCL2 readers.
  • There is no guarantee that any "Content-Length:" headers in the original message are correct and appropriate, which are preserved exactly as they are byMBOXO andMBOXRD.
  • MBOXCL2 readers cannot return messages with no "Content-Length:" headers.
  • Messages cannot be reliably read fromMBOXCL2 format mailboxes byMBOXO orMBOXRD readers.
  • Delivering messages toMBOXCL2 format mailboxes withMBOXO orMBOXRD tools will corrupt the mailbox, rendering all subsequently delivered messages irretrievable
  • Because "From " at the start of a line is more probable than ">From " in real-world messages, anMBOXRD reader will restore a greater number of messages written to a mailbox by anMBOXO tool to their original forms than anMBOXRD tool, but will not restore all messages.
  • Conversely, when anMBOXO reader is used, less message corruption will be observed in the final results if the messages were written by anMBOXO tool than if they were written by anMBOXRD tool.

Wikipedia reports that "different MBOX formats use various mutually incompatible mechanisms to enable message file locking, including fcntl(), lockf(), and "dot locking" which are problematic in network mounted file systems, such as the Network File System (NFS). Because more than one message is stored in a single file, some form of file locking is needed to avoid the corruption that can result from two or more processes modifying the mailbox simultaneously. This could happen if a network email delivery program delivers a new message at the same time as a mail reader is deleting an existing message. MBOX files should be locked also while they are being read. Otherwise the reader may see corrupted message contents if another process is modifying the mbox at the same time, even though no actual file corruption occurs."

Because MBOX stores the contents of an entire folder in one file, the size of the MBOX single file can become exceedingly large. Any corruption in the file may affect the ability of certain clients to access individual messages or even the entire folder.

History

The naming scheme was developed by Daniel J. Bernstein, Rahul Dhesi, and others in 1996. Each version originated from a different version of Unix.


Format specificationsExplanation of format description terms


Useful references

URLs


Last Updated: 04/10/2025

 

[8]ページ先頭

©2009-2026 Movatter.jp