Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

MARC-8

From Wikipedia, the free encyclopedia
Metadata standard

TheMARC-8 charset is aMARC standard used inMARC-21 library records.[1] The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form, and they are frequently used inlibrary database systems. Thecharacter encoding now known as MARC-8 was introduced in 1968 as part of the MARC format. Originally based on theLatin alphabet, from 1979 to 1983 theJACKPHY initiative expanded the repertoire to include Japanese, Arabic, Chinese, and Hebrew characters (among others), with the later addition of Cyrillic and Greek scripts. If a character is not representable in MARC-8 of a MARC-21 record, thenUTF-8 must be used instead. UTF-8 has support for many more characters than MARC-8, which is rarely used outside library data.

Technical details

[edit]

MARC-8 uses a variant of theISO-2022 encoding. It uses escape characters to represent characters beyond the 7-bitASCII range of characters.

It generally uses the same logicalBiDi ordering asUnicode.

The combining characters and base characters are in a different order than used in Unicode. The following are some examples. The combining characters are not always stored in reverse order asUnicode normalization. The MARC-21 standard describes the MARC-8 Unicode conversion issues in more detail.

Displayed

Character

Unicode

NFD

MARC-8
áa  ́  ́ a
a   ̣   ̂  ̂   ̣ a

Code structure

[edit]

TheISO/IEC 2022 coding specifies a two-layer mapping between character codes and displayed characters. In MARC-8, character codes from the 7-bit ASCII graphic range (0x20–0x7F) are referred to as "G0" codes, while codes from the "high ASCII" range (0xA0–0xFF) are referred to as the "G1" codes. Graphic character sets are designated and invoked by means of a multiple byte escape sequence consisting of the escape character, an Intermediate character sequence, and a Final character in the form ESCIF.

The following table shows the intermediate byte after the ESC byte (hexadecimal 1B), and the corresponding ASCII characters.

Intermediate Bytes[2]
G0 setG1 set
SBCSMBCSSBCSMBCS
Normal ISO-202228(24$29)24 29$)
Alternate ISO-2022 (additional 63+16 sets)2C,24 2C$,2D-24 2D$-

The following table shows the final bytes in hexadecimal and the corresponding ASCII characters after the intermediate bytes.

Final Bytes[2]
BytesCharactersNameTypeComment
311Chinese, Japanese, Korean (EACC)MBCS
322Basic HebrewSBCS
333Basic ArabicSBCS
344Extended ArabicSBCS
42BBasic Latin (ASCII)SBCS
21 45!EExtended Latin (ANSEL)SBCSThe 21(hex) technically is a second byte of the Intermediate segment of this escape sequence.
4ENBasic CyrillicSBCS
51QExtended CyrillicSBCS
53SBasic Greek (ISO 5428)SBCS

The EACC is the only multibyte encoding of MARC-8, it encodes eachCJK character in three ASCII bytes.

For example, to encode the U+4EBA CJK character (人) you will need the following bytes

 \x1B\x24\x31\x21\x30\x64

The \x1B\x24\x31 switches to EACC/CJK, and the \x21\x30\x64 corresponds to the U+4EBA.

Custom set extension

[edit]

In addition to the ISO-2022 character sets, the following custom sets are available too. The byte designation follows the escape byte (hexadecimal 1B). There is no intermediate byte.

Final Bytes[2]
BytesCharactersNameTypeComment
62bSubscript setSBCS
67gGreek Symbol setSBCSThe alpha, beta, gamma characters normally do not round trip map to Unicode.
70pSuperscript setSBCS
73sBasic Latin (ASCII)SBCS

C0 control codes

[edit]

MARC 21 usesGS (0x1D) as a record terminator,RS (0x1E) as a field terminator andUS (0x1F) as a subfield delimiter.[3]

C1 control codes

[edit]

The following alternativeC1 control code set is defined for bibliographic applications such aslibrary systems. It is mostly concerned with string collation, and with markup of bibliographic fields. Slightly different variants are defined in the German standardDIN 31626[4] (published in 1978 and since withdrawn)[5] and theISO standardISO 6630,[6][7] the latter of which has also been adopted in Germany asDIN ISO 6630.[8] Where these differ is noted in the table below where applicable. MARC-8 uses the coding ofNSB andNSE from this set, and adds some additional format effectors in locations not used by the ISO version; however, MARC 21 uses this control set only in MARC-8 records, not in Unicode-format records.[3]

If using theISO/IEC 2022 extension mechanism, the DIN 31626 set is designated as the active C1 control character set with the sequence0x1B 0x22 0x45 (ESC " E),[4] and the ISO 6630 / DIN ISO 6630 set is designated with the sequence0x1B 0x22 0x42 (ESC " B).[6] The 1985 expansion of the ISO 6630 set can also be explicitly specified by using the sequence0x1B 0x26 0x40 0x1B 0x22 0x42 (ESC & @ ESC " B).[7]

Esc+DecHexAcroNameDescription[4][6][7]
G13587CUSClose-Up for Sorting(DIN 31626, ISO 6630) Declares that two successive character sequences separated by a space or separator should be treated as one word for collation purposes.
H13688NSBNon-Sorting Characters Begin(DIN 31626, ISO 6630, MARC 21) Marks the start of a sequence of characters to be ignored for collation purposes.MARC 21 uses this character in MARC-8 records, but uses 0x98 (SOS) in Unicode records for the same purpose.[3][9]
I13789NSENon-Sorting Characters End(DIN 31626, ISO 6630, MARC 21) Marks the end of a sequence of characters to be ignored for collation purposes. MARC 21 uses this character in MARC-8 records, but uses 0x9C (ST) in Unicode records for the same purpose.[3][9]
J1388AFILFiller Character(DIN 31626) Substitutes for a mandatory alphanumeric character in a field.
K1398BTCITag in Context Indicator(DIN 31626) Within a bibliographic field, used to refer to data in another bibliographic field by its tag number.
PLDPartial Line Down(ISO 6630) Not in the original edition of ISO 6630.[6] In the 1985 edition of ISO 6630,[7] used for Partial Line Down (seePLD).
L1408CICIIdentification Number in Context Indicator(DIN 31626) Within a bibliographic field, used to refer to data in another bibliographic record by its ID number.
PLUPartial Line Up(ISO 6630) Not in original edition of ISO 6630.[6] In the 1985 edition of ISO 6630,[7] used for Partial Line Up (seePLU).
M1418DOSC[a]Optional Syllabification[b] Control(DIN 31626) Marks a syllable boundary in a long word. See alsosoft hyphen.
ZWJJoiner(MARC 21) In MARC-8, used for theZero-Width Joiner, while U+200D is used in Unicode-format MARC records.[3][9]
N1428ESS2Single-Shift 2(DIN 31626) Non-locking shift code, seeSS2.
ZWNJNon-Joiner(MARC 21) In MARC-8, used for theZero-Width Non-Joiner, while U+200C is used in Unicode-format MARC records.[3][9]
O1438FSS3Single-Shift 3(DIN 31626) Non-locking shift code, seeSS3.
P14490-(reserved)
Q14591EABEmbedded Annotation Beginning(DIN 31626, ISO 6630) Marks the start of a variable-length annotation which is embedded within a bibliographic field, as opposed to separated using content designation.
R14692EAEEmbedded Annotation End(DIN 31626, ISO 6630) Marks the end of a variable-length embedded annotation.
S14793ISBItem Specification Beginning(DIN 31626) Marks the start of a string of specific information of some description, other than a keyword or a permutation string.
T14894ISEItem Specification End(DIN 31626) Marks the end of a string of specific information.
U14995SIBSorting Interpolation Beginning(ISO 6630) Marks the beginning of a sequence of characters used for collation purposes only.
V15096SIESorting Interpolation End(ISO 6630) Marks the end of a sequence of characters used for collation purposes only.
W15197SSBSecondary Sorting Value Beginning(ISO 6630) Marks the start of a string with subordinate collation value.
X15298SSESecondary Sorting Value End(ISO 6630) Marks the end of a string with subordinate collation value.
Y15399INCIndicator for Non-Standard Character(DIN 31626) Identifies a following non-standard character.
Z1549A-(reserved)
[1559B-(reserved)
\1569CKWBKeyword Beginning(DIN 31626, ISO 6630) Marks the start of a keyword within a bibliographic field.
]1579DKWEKeyword End(DIN 31626, ISO 6630) Marks the end of a keyword within a bibliographic field.
^1589EPSBPermutation String Beginning(DIN 31626, ISO 6630) Marks the start of a string which is to be permuted to the front of the element when references orindices are generated. Terminated by PSE or by the end of the element.
_1599FPSEPermutation String End(DIN 31626, ISO 6630) Marks the end of a string which is to be permuted to the front of the element.

Notes

[edit]
  1. ^Not the same as theOperating System Command (OSC) in the ISO/IEC 6429 C1 code set.
  2. ^Spelled "Syllabication [sic]" in the ISO-IR-040 document, along with "syllable" being spelled "syllabe [sic]" in the description. These are presumably typographical errors.

References

[edit]
  1. ^"Character Sets: Introduction: MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media (Library of Congress)".Library of Congress.
  2. ^abc"Character Sets: MARC-8 Encoding Environment: MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media (Library of Congress)".Library of Congress.
  3. ^abcdef"Control function codes".MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media.Library of Congress. 2007-12-04.
  4. ^abcDIN (1979-07-15).Additional Control Codes for Bibliographic Use according to German Standard DIN 31626(PDF). ITSCJ/IPSJ.ISO-IR-40.
  5. ^"Information processing; bibliographic control characters". Beuth: publishing DIN. DIN 31626:1978-12.
  6. ^abcdeISO/TC 46 (1983-06-01).Additional Control Codes for Bibliographic Use according to International Standard ISO 6630(PDF). ITSCJ/IPSJ.ISO-IR-67.{{citation}}: CS1 maint: numeric names: authors list (link)
  7. ^abcdeISO/TC 46 (1986-02-01).Additional Control Codes for Bibliographic Use according to International Standard ISO 6630(PDF). ITSCJ/IPSJ.ISO-IR-124.{{citation}}: CS1 maint: numeric names: authors list (link)
  8. ^"DIN ISO 6630 December 1997".AFNOR Editions Online Store.
  9. ^abcd"Code Table Extended Latin (ANSEL)".MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media.Library of Congress. 2007-12-05.

External links

[edit]
Early telecommunications
ISO/IEC 8859
Bibliographic use
National standards
ISO/IEC 2022
Mac OSCode pages
("scripts")
DOS code pages
IBM AIX code pages
Windows code pages
EBCDIC code pages
DEC terminals (VTx)
Platform specific
Unicode /ISO/IEC 10646
TeX typesetting system
Miscellaneous code pages
Control character
Related topics
Retrieved from "https://en.wikipedia.org/w/index.php?title=MARC-8&oldid=1248028862"
Category:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp