Movatterモバイル変換


[0]ホーム

URL:


[RFC Home] [TEXT|PDF|HTML] [Tracker] [IPR] [Info page]

UNKNOWN
Network Working Group                                 Annette L. DeSchonRequest for Comments: 971                                            ISI                                                            January 1986A SURVEY OF DATA REPRESENTATION STANDARDSStatus of This Memo   This RFC discusses data representation conventions in the   ARPA-Internet and suggests possible resolutions.  No proposals in   this document are intended as standards for the ARPA-Internet at this   time.  Rather, it is hoped that a general consensus will emerge as to   the appropriate approach to these issues, leading eventually to the   adoption of ARPA-Internet standards.  Distribution of this memo is   unlimited.1. Introduction   This report is a comparison of several data representation standards   that are currently in use.  The standards, or system type   definitions, that will be discussed are the CCITT X.409   recommendation, the NBS Computer Based Message System (CBMS)   standard, DARPA Multimedia Mail system, the Courier remote procedure   call protocol, and the SUN Remote Procedure Call package.   One purpose of this report is to determine how the CCITT standard,   which is gaining wide acceptance internationally, compares with some   of the other standards that have been developed in the areas of   electronic mail, distributed interprocess communication, and remote   procedure call.  The CCITT X.409 recommendation, which is entitled   "Presentation Transfer Syntax and Notation" is an international   standard which is a part of the X.400 series Message Handling Systems   (MHS) specifications [1].  It has been adopted by both the NBS and   the ISO standards organizations.  In addition, some commercial   organizations have announced intentions to support a CCITT interface   for electronic mail.  The NBS Computer Based Message System (CBMS)   standard was developed previously and was published as a Federal   Information Processing Standard (FIPS Publication 98) in 1983 [3].   The DARPA Multimedia Mail system is an experimental electronic mail   system which is in use in the DARPA Internet [2,4,5].  It is used to   create and distribute messages that incorporate text, graphics,   stored speech, and images and has been implemented on on several very   different machines.  Courier is the XEROX network systems remote   procedure call protocol [7].  The SUN Remote Procedure Call package   implements "network pipes" between UNIX machines [6].DeSchon                                                         [Page 1]

RFC 971                                                     January 1986A Survey of Data Representation Standards2. Background   This section presents a brief overview of the basic terminology and   approach of each data representation standard.   2.1. Interprocess Communication Standards      The standards that are oriented towards distributed interprocess      communication or remote procedure call, between like machines,      generally favor the use of types that map easily into the types      defined in the programming language in use on the system.  For      example, the types defined for the XEROX Courier system resemble      the types found in the Mesa programming language.  Similarly, the      SUN Remote Procedure Call system types resemble the types found in      the C programming language.  An advantage of a system implemented      using like machines is that the external data representation can      be defined in such a way that the conversion to and from the local      format is minimal.      2.1.1. Courier         The Courier standard data types are used to define the data         objects which are transported bi-directionally between system         elements that are running the Courier remote procedure call         protocol.  The "standard representation" of a type is the         encoding of the data which is transmitted.  The "standard         notation" refers to the conventions for the interpretation of         the data by higher-level applications.  The standard         representation of a data object encodes the value of the         object, but the type of the object is determined by the         software that generates or interprets the representation.      2.1.2. SUN Remote Procedure Call Package         The SUN Remote Procedure Call package includes routines which         allow a process on one UNIX machine to consume data produced by         a process on another UNIX machine.  This is called a "network         pipe" and is an extension of the standard UNIX pipe.  The         "eXternal Data Representation (XDR)" standard defines the         routines that are used to encode or "serialize" data for         transmission, or to decode or "deserialize" data for local         interpretation. The syntax suggests that perhaps it should be         called "remote interprocess communication" rather than "remote         procedure call".DeSchon                                                         [Page 2]

RFC 971                                                     January 1986A Survey of Data Representation Standards   2.2. Message Standards      The message oriented standards, including DARPA Multimedia Mail,      NBS CBMS, and the CCITT X.409 standards, seem to favor more      general, highly extensible type definitions.  This may have      something to do with the expectation that a system will include      many different machines, programmed using many different      programming languages.      2.2.1. DARPA Multimedia Mail         The DARPA Multimedia Mail system was developed for use in DoD         Internet community.  The set of data elements used in the         Multimedia Message Handling Facility (MMHF) is referred to as         its "presentation transfer syntax".  The encoding of these data         elements varies with the data type being represented. Each         begins with a one-octet "element-code".  Some data elements are         of a pre-determined length.  For example, the INTEGER data         element occupies five octets, one for the element-code, and         four which contain the "value component".  Other data elements,         however, may vary in length.  For example, the TEXT data         element, is made up of a one-octet element-code, a three-octet         count of the characters to follow, and a variable number of         octets, each containing one right-justified seven bit ASCII         character.  The element-code and the length constitute the "tag         component".         A "base data element" is self contained, while a "structured         data element" is formed using other data elements.  The LIST         data element is used to create structures composed of other         elements.  The tag component of a LIST is made up of a         one-octet element-code, a three-octet count of the number of         octets to follow, and a two-octet count of the number of         elements that follow.  The PROPLIST data element is used to         create a structure that consists of a set of unordered         name-value pairs.  The tag component of a PROPLIST is made up         of a one-octet element-code, a three-octet count of the number         of octets to follow, and a one-octet count of the number of         name-value pairs in the PROPLIST.  Both the LIST and the         PROPLIST elements are followed by an ENDLIST data element.      2.2.2. NBS Computer Based Message System         The NBS Computer Based Message System (CBMS) standard was         developed to specify the format of a message at the interface         between different computer-based message systems.  Each data         element consists of a series of "components".  The fiveDeSchon                                                         [Page 3]

RFC 971                                                     January 1986A Survey of Data Representation Standards         possible types of component are the "identifier octet", the         "length code", the "qualifier", the "property-list" component,         and the "data element contents".  Every data element contains         an identifier octet and a length code.  The identifier octet         contains a one-bit flag that signifies whether the data element         contains a property-list, and a code identifying the data         element and signifying whether it contains a qualifier. In the         NBS standard, the property-list is associated with a data         element and contains properties such as a "printing-name" or a         "comment".  The meaning of the qualifier depends on the data         element code.  The length code indicates the number of octets         following, and is between one and three octets in length.         Each data element is inherently a "primitive data element",         which contains a basic item of information, or a "constructor         data element", which contains one or more data elements.  The         "field" data element (itself a constructor) uses a qualifier         component, which contains a "field identifier" to indicate         which specific field is being represented within a message.      2.2.3. CCITT Recommendation X.409         The CCITT recommendation X.409 defines the notation and the         representational technique used to specify and to encode the         Message Handling System (MHS) protocols.  The following is a         description of the CCITT approach to encoding type definitions.         A data element consists of three components, the "identifier"         (type), the "length", and the "contents".  An element and its         components consist of a sequence of an integral number of         octets.  An identifier consists of a "class" ("universal",         "application-wide", "context-specific", or "private-use"), a         "form" ("primitive" or "constructor"), and the "id code".         There is a convention defined for both single-octet and         multi-octet identifiers.  The length specifies the length of         the contents in octets, and is itself variable in length.         There is also an "indefinite" value defined for the length;         this means that no length for the contents is specified, and         the contents is terminated with the the "end-of-contents" (EOC)         element.  In X.409 it is possible to determine whether a data         element is a primitive or a constructor from the form part of         the identifier.  In addition it is possible to "tag" the data         by attaching meaning to an id code within the context of a         specific application.DeSchon                                                         [Page 4]

RFC 971                                                     January 1986A Survey of Data Representation Standards3. Implicit Versus Explicit Representation   In both the SUN Remote Procedure Call system and the XEROX Courier   system the type definitions of external data are implicit.  This   means that for a given type of call, or message, the type definitions   which is to be used to interpret the data, are agreed upon by the   sender and the receiver in advance.  In other words, parameters (or   message fields) are assumed to be in a predefined order.  Each   parameter is assumed to be of a predefined type.  This means the data   cannot be reformated into the local form until it reaches a process   that knows about the types of specific parameters.  At this point,   the conversion can be accomplished using system routines that know   how to convert from the external format to the local format.  If the   system is homogeneous there may be very little conversion required.   In addition, no extra overhead of sending the type definitions with   the data is incurred.   In the DARPA Multimedia Mail system, the NBS CBMS standard, and the   CCITT X.409 recommendation, type definitions are explicit.  In this   case the type definitions are encoded into the message.  There are   several advantages to this approach.  One advantage is that it allows   a low level receiver process in the destination host to convert the   data from the standard form to a form appropriate for the local host,   as it received.  This can increase efficiency if it allows the   destination host to avoid passing around data that does not conform   to the local word boundaries.  Another advantage is that it provides   flexibility for future expansion.  Since the overall length is a part   of the type definition, it allows a host to deal with or ignore data   of types that it does not necessarily understand.  Since the   interpretation of the data is not dependent on its position, message   fields (or parameters) can be reordered, or optionally omitted.  The   disadvantages of this approach are as follows.  Assuming that no   field could be omitted, the external representation of the message   may be longer than it would have been if an implicit representation   had been used.  In addition, extra time may be consumed by the   conversion between external format and local format, since the   external format almost certainly will not match the local format for   any of the participants.DeSchon                                                         [Page 5]

RFC 971                                                     January 1986A Survey of Data Representation Standards4. Data Representation Standards Scorecard   The following table is a comparison of the data elements defined for   the various standards being discussed.  It is provided in order to   give a general idea of the types defined for each standard, but it   should be noted that the grouping of these types does not indicate   one type corresponds exactly to any other.  Where it is applicable,   the identifier code appears in parantheses following the name of the   data element.  Under "NUMBER", "S" stands for signed, "U" stands for   unsigned, "V" stands for variable, and the number represents the   number of bits.  For example, "Integer S16" means a "signed 16-bit   integer". Type       CCITT        MMM         NBS         XEROX       Sun ----------------------------------------------------------------------- END    | End-of-   | ENDLIST   | End-of-    |    --     |    --        |  Contents |   (11)    | Constructor|           |        |    (0)    |           |    (1)     |           |        |           |           |            |           | PAD    | Null (5)  | NOP (0)   | No-Op (0)  |    --     |    --        |           | PAD (1)   | Padding    |           |        |           |           |   (33)     |           |        |           |           |            |           | RECORD | Set (17)  | PROPLIST  | Set (11)   |    --     |    --        |           |   (14)    |            |           |        | Sequence  | LIST (9)  | Sequence   | Sequence  | Structure        |   (16)    |           |   (10)     |           |        |           |           |            | Record    |        |           |           | Message    |           |        |           |           |   (77)     |           |        |    --     |    --     |     --     | Array     | Fixed Array        |           |           |            |           | Counted Array        | "Choice"  |    --     |     --     | Choice    |Discriminated-        | "Any"     |           |            |           |   Union        |           |           |            |           |        | "Tagged"  | "name"    | Field (76) |    --     |    --        |           |           |Unique-ID(9)|           |        |    --     | SHARE-TAG |     --     |    --     |    --        |           |   (12)    |            |           |        |           | SHARE-REF |            |           |        |           |   (13)    |            |           |        |           |           |            |           |        |    --     |    --     | Compressed |    --     |    --        |           |           |   (70)     |           |        |    --     | ENCRYPT   | Encrypted  |    --     |    --        |           |   (14)    |    (71)    |           |DeSchon                                                         [Page 6]

RFC 971                                                     January 1986A Survey of Data Representation Standards Type       CCITT        MMM         NBS         XEROX       Sun ----------------------------------------------------------------------- BOOLEAN| Boolean(1)| BOOLEAN(2)| Boolean(8) | Boolean   | Boolean        |           |           |            |           | NUMBER | Integer(2)| EPI (5)   | Integer(32)| Integer   | Integer        |   SV      |   SV      |   SV       |   S16     |  S32        |           | INDEX (3) |            | Cardinal  | Unsigned Int        |           |   U16     |            |   U16     |  U32        |           | INTEGER(4)|            |Unspecified|Enumeration        |           |   S32     |            |   16      |  32        |           |           |            | Long Int  |Hyper Integer        |           |           |            |   S32     |  S64        |           |           |            | Long Card |Uns Hyper Int        |           |           |            |   U32     |  U64        |           |           |            |           | Double Prec        |           |           |            |           |   64        |    --     | FLOAT (15)|     --     |    --     | Float Pt        |           |   64      |            |           |   32        |           |           |            |           | BIT-   | Bit String| BITSTR(6) | Bit-String |    --     |    --  STRING|   (3)     |           |   (67)     |           |        | Octet-    |    --     |     --     |    --     | Opaque        |  String(4)|           |            |           |        |           |           |            |           | STRING | IA5 (22)  | TEXT (8)  | ASCII-     | String    | Counted-        |           |           |  String (2)|           |  Byte String        |           | NAME (7)  |            |           |        | Numeric   |           |            |           |        |   (18)    |           |            |           |        | Printable |           |            |           |        |   (19)    |           |            |           |        | T.61 (20) |           |            |           |        | Videotex  |           |            |           |        |   (21)    |           |            |           |DeSchon                                                         [Page 7]

RFC 971                                                     January 1986A Survey of Data Representation Standards Type       CCITT        MMM         NBS         XEROX       Sun ----------------------------------------------------------------------- OTHER  | UTC Time  |    --     | Date (40)  |    --     |    --        |   (23)    |           |            |           |        | Gen Time  |           |            |           |        |   (24)    |           |            |           |        |    --     |    --     | Property-  |    --     |    --        |           |           |   List (36)|           |        |    --     |    --     |Property(69)|    --     |    --        |           |           |            |           |        |    --     |    --     |    --      | Procedure |    --        |           |           |            |           |        |    --     |    --     | Vendor-    |    --     |    --        |           |           |  Defined   |           |        |           |           |   (127)    |           |        |           |           | Extension  |           |        |           |           |   (126)    |           |5. Conclusions   Of the standards discussed in this survey, the CCITT approach (X.409)   has already gained wide acceptance.  For a system that will include a   number of dissimilar hosts, as might be the case for an Internet   application, a standard that employs explicit representation, such as   the CCITT X.409, would probably work well.  Using the CCITT X.409   standard it is possible to construct most of the data elements that   are specified for the other standards, with the possible exception of   the "floating point" type. However, some of the flexibility that has   been built into this standard, such as the "private-use class" may   lead to ambiguity and a lack of coordination between implementors at   different sites.  If a standard such as the CCITT were to be used in   an Internet experiment a fully defined (but large) subset would   probably have to be selected.DeSchon                                                         [Page 8]

RFC 971                                                     January 1986A Survey of Data Representation Standards6. References   [1]  "Message Handling Systems: Presentation Transfer Syntax and        Notation", Recommendation X.409, Document AP VIII-66-E,        International Telegraph and Telephone Consultative Committee        (CCITT), Malaga-Torremolinos, June, 1984.   [2]  J. Garcia-Luna, A. Poggio, and D. Elliot, "Research into        Multimedia Message System Architecture", SRI International,        February, 1984.   [3]  "Specification for Message Format for Computer Based Message        Systems", FIPS Pub 98 (also published asRFC 841), National        Bureau of Standards, January, 1983.   [4]  J. Postel, "Internet Multimedia Mail Transfer Protocol", USC        Information Sciences Institute, MMM-11 (RFC-759 revised), March,        1982.   [5]  J. Postel, "Internet Multimedia Mail Document Format", USC        Information Sciences Institute, MMM-12 (RFC-767 revised), March,        1982.   [6]  "Extended Data Representation Reference Manual", SUN        Microsystems, September, 1984.   [7]  "Courier: The Remote Procedure Call Protocol", XSIS-038112,        XEROX Corporation, December, 1981.DeSchon                                                         [Page 9]

[8]ページ先頭

©2009-2025 Movatter.jp