Movatterモバイル変換
[0]ホーム
[RFC Home] [TEXT|PDF|HTML] [Tracker] [IPR] [Info page]
UNKNOWN
RFC 803 Dacom 450/500 Facsimile Data Transcoding A. Agarwal, M. J. O'Connor and D. L. Mills 2 November 19811. Introduction As part of our effort in support of the DARPA Internet Program,software modules to encode and decode facsimile data for the Dacom 450and 500 models Computerfax facsimile machines have been constructed.The purpose of these modules is to map the data representations used bythese machines to and from bit-map and run-length representations inprograms for editing, transmission and archiving facsimile images. Themodules are written in the PDP-11 MACRO-11 assembly language and can beincorporated into programs for, among others, the RT-11 operating systemand the DCNET BOS or VOS operating systems. The first part of this report describes in detail the Dacom 450data compression algorithm and is an update and correction to an earliermemorandum [2]. Following this, the encoding and decoding algorithmsare described along with the supporting programs and file formats.Reference [3] describes another implementation of the decodingalgorithm. Grateful acknowledgment is made to E. A. Poe of Rapicomfor his assistance in this effort. The second part of this report describes briefly the Dacom 500 datacompression algorithm as used by the INTELPOST electronic-mail networkunder development by the US Postal Service and several foreignadministrations. These machines conform to the CCITT T.4 DraftRecommendation, described in [5]. Supporting programs and file formatsare described.2. Dacom 450 Data Compression Principles The encoding algorithm for the Dacom 450 processes lines scanned bythe machine to produce a two-dimensional run-length code described byWeber [1]; however, this article contains a number of errors andomissions, many of which were discovered only after considerableanalysis and experimentation [2,3]. The machine operates over acoordinate space of l726 by approximately 2200 pels when inhigh-resolution (detail) mode. In normal (quality) mode the verticalresolution is halved, so that about 1100 lines are transmitted, while inexpress mode about 733 lines are transmitted (missed lines are filled inon playback by replicating previous lines). Data are encoded two rows at a time using a two-dimensionalrun-length code. Each row-pair is scanned left-to-right and theline-pairs themselves processed top-to-bottom of the document. Figure 1shows how the pels are represented.
Dacom 450/500 Facsimile Data Transcoding PAGE 2 | | | ----+----------+----------+---- ... | x(1,j) | x(1,j+1) | ... ----+----------+----------+---- ... | x(2,j) | x(2,j+1) | ... ----+----------+----------+---- | | | Direction of scan -> Figure 1. Data Representation For each j the vector (x(1,j),x(2,j)) represents the contents ofthe jth column, where x(i,j) can take on values of zero (white) or one(black). Each of the four possible vectors ranging over these valueswill be called a state (Dacom calls these "modes") with the successionof transitions between these states determined by the picture content ofthe particular line-pair. Scanning of the line-pairs follows one afterthe other with no special end-of-line code in the data itself. For thepurpose of later discussion and comparison with the published data, thefollowing conventions will be used (note: the pels read top-bottom): Pels Vector State --------------------- W-W (0,0) 0 B-W (1,0) 1 W-B (0,1) 2 B-B (1,1) 3 The algorithm used by Dacom to generate the transmitted data as thecolumns are scanned can be described as the non-deterministicfinite-state automaton (nfsa) shown in Figure 2. Conceptually, the nfsastarts at the beginning of a page in a designated state and at a pointjust after scanning the jth column in the jth state. It then scans the(j + 1)th column and enters that state while emitting the string of bitsshown in the figure. In the states corresponding to W-W (0) and B-B (3) a specialrun-length encoding techniques is used. There are two state variablesassociated with each of these two states, one variable used as arun-length counter and the other the field length (in bits) of thiscounter. Upon each entry to either of these two states the counter isinitialized at zero and counts up for every additional column of thesame state. At the end of the run the value of this counter istransmitted extending with high-order zeros, if necessary, to fill thefield length specified. If, however, the counter equals 2**n - 1, wheren is the field length, then a sequence of n one-bits is emitted and thecounter re-initialized at zero with a field length of n + 1. Thus, ifn = 3, a run length of three is transmitted as {010} and a run length ofseven as {110}, while a run length of eight as two words, {111} followedby {0000}. The field-length variables are maintained separately forboth the W-W and B-B states, and at each re-entry to either of thesestates the previous values are used.
Dacom 450/500 Facsimile Data Transcoding PAGE 3 0100 .--------------------->----------------------------------. | | | .-----------------<------------------------------. | | | 1 | | | V | | .--------------. .---------------. | | | | | | | | | | 010 | | | | .->| 1 |-------------------->| 2 |->. | | | | | | | | | | 0| | B-W | 101 | W-B | |1 | | \<-| |<--------------------| |<-' | | | | | | | | | | .---->| | | | \--------------' | \---------------' | | | A | | | A | | | | .--------->------' | | | | | | | | 1 | | | | | | | | | | | A V | | | | | | | | 0111| |1 | | 1000| |1 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1011 | | | | | | | | .-------<----------' | | | | V | | | V | | | .--------------. | .---------------. | | | |<--' | | | | | | 0 | | | | | 3 |<--------------------| 0 |-----' | | | | | | | B-B | | W-W | | | |-------------------->| |<--------' | | 0 | | | | | | \--------------' \---------------' | A | A | | | | \----' \----' run run Figure 2. NFSA Model of Encoding
Dacom 450/500 Facsimile Data Transcoding PAGE 4 Field-length values are constrained not to exceed seven, so thatruns exceeding l27 with n = 7 will be encoded as a separate 7-bit wordof one-bits for each run of l27 except the last, which must alwayscontain at least one zero-bit. The field length n is decreased by oneunder the following circumstances: the current run has been encoded as asingle n-bit field, and for n in the range four through seven the twohigh-order bits are zero or for n equal to three the single high-orderbit is zero. The field length is not allowed to be reduced below twobits. The encoding algorithm starts in state 0 with both field lengthsset to 7.2.1. Dacom 450 Decoding Algorithm For reasons of speed and simplicity it is desirable that the Dacom450 decodingalgorithm be modeled on the basis of a deterministicfinite-state automaton (dfsa). Using straightforward formal procedures,the dfsa of Figure 3 can be constructed. This machine makes one statetransition for every bit, except for the W-W (0) and B-B (3) states,which must be treated specially in any case. The states are labeled insuch a way as to correspond to those of Figure 2 for states numberedfrom zero to three. The decoded output symbols, in this case the columns correspondingto each of the states, are represented by the states themselves. Uponentry to state B-W (1) or W-B (2) a run-length counter is initialized toone. Each traversal of a loop back to the same state increments thiscounter and, upon exit to any other state, the value of this counterrepresents the number of columns to be produced. Upon entry to stateW-W (0) or B-B (3) the run-length counter is initialized to zero and theassociated field-length state variable n established. For eachsuccessive n bits of all-ones, the counter is increased by 2**n - 1 andthen n itself increased by one, but not above seven. If the next n bitsare not all ones, then the counter is increased by the value representedby the n-bit field plus one. Finally, if upon entry to either state thenext n bits are not all ones, n is decreased by one according to therule mentioned in the preceding section.
Dacom 450/500 Facsimile Data Transcoding PAGE 5 .-----------. .-----------. .-----| | | |-----. | | 9 | | 6 | | | .-| |<--. .-->| |-. | | | \-----------' \ / \-----------' | | 1| 0| \ / |1 |0 | | .->Error \ / Error<-. | | | | 0| \ / |1 | | | | .-----------. \ / .-----------. | | | 1 | | | \ / | | | 0 | | .---| 7 | \ | 10 |---. | | | | | | / \ | | | | | | | | \-----------' / \ \-----------' | | | | | | A / \ A | | | | | | | / \ | | | | | | | 1| / \ |0 | | | | | | .-----------. 0 / \ 1 .-----------. | | | | | | | |---' \---| | | | | | | | | 5 | | 8 | | | | | | | | | | | | | | | | | \-----------' \-----------' | | | | | | A A | | | | | | | | | | | | | | 1| |0 | | | | | | .-----------. .-----------. | | | | | ->| | | |<- | | | | | 1 | | 2 | | | | | | B-W |<-----. .----->| W-B | | | | | \-----------' | | \-----------' | | | | | A | | A | | | | | | | |0 1| | | | | | | \-----' | | \-----' | | | | 0 .-----------. 0 | | | | | | | | | | | 4 | | | | | RUN | | RUN | | | | .-----. \-----------' .-----. | | | | | | A A | | | | | | | V | | V | | | | | .-----------. 1 | | 1 .-----------. | | | \-->| |------' 0 \------| |<--' | | | 3 |<--------------------| 0 | | \---->| B-B |-------------------->| W-W |<----' \-----------' 0 \-----------' Figure 3. DFSA Model of Encoding
Dacom 450/500 Facsimile Data Transcoding PAGE 62.2. Formatting Considerations Data are encoded for transmission by the Dacom 450 in 585-bitframes, consisting of a 24-bit synchronization code, 37-bit leader,512-bit information area and l2-bit checksum. There are two kinds offrames distinguished by leader format, one for setup or initializationand the other for the data itself. Serial binary image data are placedin the data area of succeeding data frames. The header of each frame is shown in Figure 4. The various fieldsare defined in Table 1 following the Figure. +-----------+--------+-------------------+----------+ | Sync Code | Leader | Data | CRC Code | +-----------+--------+-------------------+----------+ 24 / 37 \ 512 12 .-------' \----------------------. / \ +-------+-------+-------+-------+-------+-------+ | Flags | Count | X Pos | Black | White | State | +-------+-------+-------+-------+-------+-------+ | 7 \ 10 12 3 3 2 | \--------------------------. | \ +-----+-----+------+-----+-------+-----+ | Seq | RUN | COFB | RPT | Spare | SUB | +-----+-----+------+-----+-------+-----+ 2 1 1 1 1 1 Figure 4. Frame Format
Dacom 450/500 Facsimile Data Transcoding PAGE 7 Table 1. Header Field DefinitionsField Width Function Setup Data (bits) Block Block-----------------------------------------------------Sync Code 24 Synchronization 30474730 (octal)Seq 2 Sequence number 00 00,01,10,11RUN 1 Initialize-start 0 1COFB 1 Unknown 0 0RPT 1 Unknown 1 0Spare 1 Unknown 0 0SUB 1 Indicates setup frame 1 0Count 10 Number of bits in data All 1's field (0 - 512)X Pos 12 Current position on All 1's scan line (0 - 1725)Black 3 Current black field All 1's length (2 - 7)White 3 Current white field All 1's length (2 - 7)State 2 Current state (0 - 3) All 1'sData 512 Data (0 - 512 bits)CRC Code 12 CRC checksum. Uses polynomial x**12 + x**8 + x**7 + x**5 + x**3 + 1
Dacom 450/500 Facsimile Data Transcoding PAGE 8 Setup frames have additional information in the data field; thevarious fields and their functions are described in Table 2. Table 2. Field Definitions for Setup Frame.Field Width Function--------------------------------Start bit 1 Always zeroSpeed bit 1 Set if express modeDetail bit 1 Set if detail mode (speed and detail bits both zero for quality mode)14 inch 1 Set if 14-inch paper5 inch 1 Set if 5-inch inch paper (14-inch and 5-inch inch paper bits both zero for 11-inch paper)Paper present 1 Set if paper present in scannerSpare 5 Can have any valueMulti-page 1 Set if multi-page mode 20 All 0's 480 Alternate 1's and 0's The tailing setup frames differ from the leading setup frames onlyin bits which indicate whether the system is operating in single ormultiple page mode and whether paper is present in the scanner. All n-bit numeric fields (except Seq) are transmitted by the Dacom450 machineleast-significant-bit (LSB) first (i.e. Count, X Pos,Black, White, State, CRC, and run length words in the data field).All other fields are transmitted left-most bit first. There are a few important points to be considered in regard to theheader of a data frame. The header contains enough information aboutthe state of the decoding algorithm to be able to re-establish correctdecoding in the event of loss or mutilation of a data frame. Thedecoding algorithm resets its state variables to those in the headereach time it begins decoding a new data frame. One of the mostdifficult problems encountered while constructing the decoding algorithmwas the correct synchronization of the algorithm as it proceeds acrossthe frame boundary with respect to the header information. In order forsynchronization to be maintained, the operation of the algorithm must
Dacom 450/500 Facsimile Data Transcoding PAGE 9follow exactly that described in the previous section. This requirement for every data frame to be self-synchronizing,leads to a few subtleties in the encoding algorithm which seem quitenatural, but were not very obvious in the beginning.1. Transition bits(s) labeling the arcs on the state transition diagram in Figure 2 are not broken across frames. Similarly, individual run-length words are not broken across frames.2. If a frame ends with a transition, theheader of the next frame contains the state to which the transition is made.3. If a frame ends with a transition out of state0 or 3, then the transition bit (0 or 1) is inserted at the end of the current frame (not at the beginning of the next frame).4. The field lengths for black and white runsin the header include changes that may have been caused at the end of the previous frame.5. If a frame begins with a whiteor black run, then this run is treated (for purpose of decreasing its field length) as if it were the beginning of a new run, since there is no information in the header to indicate otherwise. The decoding algorithm is initialized at the first data framereceived after the sequence of setup frames at the beginning oftransmission. The first data frame has a count of zero, indicating nodata bits are in the frame. The second data frame begins the actualdocument; however, its X position appears to be irrelevant. Instead, weassume the initial X position at this time is one pel to the left of theright margin (-l mod l726). With these assumptions succeeding Xpositions of the algorithm and the frame headers agree.2.3. The Decoding Program The decoding algorithm described above has been implemented in thePDP-11 MACRO-11 assembly language for the RT-11 operating system. Thisprogram contains extensive features for selectively dumping frames andtracing the operation of the algorithm. It is designed to operate on afile containing the raw data generated by the machine and does notdepend upon any prior reformatting of the data. However, it willoperate also on files in the so-called UCL format [4], which has beenadopted as the standard for use in the Internet Program. The existingDCNET supporting software for the Dacom 450 uses the UCL format andoperates normally to copy data directly between the machine and thefile, with decoding operations done at a later time. However, there isno intrinsic factor, except processing-rate limitations, why input datacould not be decoded directly from the machine. In operation, the program scans the input data one bit at a timeand searches for the synchronization pattern. Note that all dataprocessed are inverted from the natural interface conventions. When a
Dacom 450/500 Facsimile Data Transcoding PAGE 10synchronization pattern is found, the header and data portions areextracted and the various state variable checked and reset, ifnecessary. Checksum verification is performed according to thepolynomial 1 + x**3 + x**5 + x**7 + x**8 + x**12. In the case of setupframes the format (detail, quality, express), page length (14, 8-l/2,5-l/4) and multiple-page indicators are extracted from the data area.Finally, under control of specified options, the header and dataportions of the frame are printed with appropriate headings. The decoding algorithm itself is called for each data frame. Itproduces an output consisting of a sequence of run-length pairs whichcan be used to form bit maps and other representations of the data.Optionally, a printed trace of the operations performed by the algorithmcan be produced.2.4. The Encoding Program The encoding algorithm has been implemented in the PDP-11 MACRO-11assembly language for the RT-11 operating system. The program acceptsfacsimile data in 16-bit run-length format or bit-map format. The inputdata would normally be in a file, possibly obtained by translating someother representation (e.g., T.4 format) to run-length or bit-map format.The program produces an output consisting of data compressed in Dacom450 format and packed in 585-bit framesalong with the correspondingheader and checksum information. The encoding program needs to be careful about how to break dataacross frames and how many bits of data to insert in each frame. Therules mentioned insection 2.2. help to solve the first problem. Thesecond problem is a little less understood. The problem arises becausedata bits are required by the printing mechanism at a constant rate, butsuccessive frames transmitted at the line rate can contain differentamounts of decoded information, leading to buffer overrun in extremecases. In order to compensate for the rate mismatch, it has been foundsufficient to control the size of the data portion of the frameaccording to a simple set of empirical rules which produce results quitesimilar to the scanner iteslf. According to these rules, a frame is"full" when it contains more than 500 bits of data or when the datarepresents more than 4800*X pels (or columns) of information,where X = 2 for transmission rate 2.4 kbs, X = 1 for transmission rate 4.8 kbs, X = 1/2 for transmission rate 9.6 kbs.2.5. Dacom 450 File Formats Dacom 450 facsimile data is ordinarily stored as an RT-11 file inthe so-called UCL format [4]. In this format, each 585-bit frame isstored in a 76-byte record. The first byte specifies the length of therecord, the second specifies a command and the remaining 72 bytescontain the 585 bits of the original Dacom 450 frame zero-filled at the
Dacom 450/500 Facsimile Data Transcoding PAGE 11end. The command byte is coded as follows:a. 56 (70 octal): The data field contains a setup frame (the first record of the file). The length byte is 76 (114 octal).b. 57 (71 octal): The data field contains a data frame (the remaining records in the file except the last one). The length byte is 76 (114 octal).c. 58 (72 octal): End of file (the last frame of the file). There is no data field and the length byte is 2.2.6. Run-Length and Bit-Map File Formats The decode program produces 16-bit run length words as its output.Each run is encoded in a 16-bit word, with white in positive and blackin negative two's complement values. A zero word terminates each line,with the trailing white run suppressed if present. An all-white line isencoded as a single run of length one followed by a zero word. The fileis terminated by a line of length zero, that is, a single zero word. Bit-map files consist of a four-byte header followed by the data.The header consists of two 2-byte quantities, the first of whichrepresents the number of pels in a line and the second the number oflines in the page. Each scanning line of data is represented in anintegral number of bytes, the last byte of a line zero-filled ifnecessary.3. Dacom 500 Data Compression Principles The Dacom 500 machines are high-speed versions of the Dacom 450machines and operate in the 50-Kbps range using the T.4 compressionalgorithm. This algorithm, described in the [5], is a one-dimensionalone, rather than the two-dimensional one used in the Dacom 450 anddescribed in previous sections. Since this algorithm is well known andthe subject of an international standard, it will not be furtherdiscussed here.3.1. Dacom 500 Decoding Algorithm The decoding program has been implemented in the PDP-11 MACRO-11assembly language for the DCNET and RT-11 operating systems. Itoperates on a file containing facsimile data encoded using the T.4algorithm and produces a file in bit-map format. The decoding program scans the input data bit-by-bit and recognizessequences of bits which form valid run-length codes (see the tables in[5]). The table of Huffman codes can be represented as a binary treewith the values of the run lengths (e.g. 1, 2, 64, 1728, etc.) at theterminal nodes and each branch labeled 0 or 1. The code for any runlength then is the sequence of branch labels on the path from the rootto the terminal node representing this length.
Dacom 450/500 Facsimile Data Transcoding PAGE 12 The tables for black and white run-length codes are stored asseparate binary trees in the decoding program. The decoding algorithmstarts by initializing an accumulator at zero. It then begins at theroot of the corresponding tree and traverses the tree as it consumesbits one-by-one from the input. When a terminal node is reached, thevalue stored at that node is added to the accumulator. If a make-upnode is reached, the value at that node is added to the accumulator andthe search is resumed with the same tree to obtain the terminatingvalue; otherwise, the accumulator represents the current run length andthe search resumes with the alternate tree.3.2. Dacom 500 Encoding Program The encoding program is also implemented in the PDP-11 MACRO-11assembly language for the DCNET and RT-11 operating systems. It scansthe bit-map input and encodes each run of black or white pels by asimple table lookup of the Huffman codes. It operates on a filecontaining facsimile data in bit-map format and produces a file in T.4format. The T.4 specifications [5] require a minimum transmission timeper scan line of 4.3 milliseconds, which at 50-Kbps corresponds to 242bits (DATA bits plus any required FILL bits plus the EOL bits equal 242bits minimum).3.3. Dacom 500 File Formats The file consists of a number of 512-byte blocks, the first ofwhich is the header. The header contains a list of two-byte entries,the first of which contains the number of pages and the remaining thelengths (in blocks) of each page in turn. The remaining blocks of thefile contain the data for each page in T.4 format. The data for eachpage is preceded by a page-setup command and succeeded by apage-end-of-record command, as transmitted by the Dacom 500. The formatof both commands consists of the 12-bit T.4 EOL code (000000000001)repeated six times and followed by a special 4-bit code word alsorepeated six times. The special code word consists of bits B1 throughB4 as defined below.B1: VERTICAL RESOLUTION 0 = 7.7 lines per millimeter 1 = future option, not implementedB2: OUTPUT PAPER LENGTH 0 = short length (Letter size) 1 = long length (Legal size)B3: DOCUMENT IN SCANNER 0 = no document present (end of page) 1 = document present (beginning of page)B4: ODD PARITY OVER B1-B4
Dacom 450/500 Facsimile Data Transcoding PAGE 133.4. Comparison of Facsimile Formats and Transcoding Speeds Four different file formats are presently used in our system forfacsimile data storage: Dacom 450, Dacom 500 (T.4), 16-bit run-lengthand bit-map. The sizes of typical files (in megabits) in these formatsare shown below for comparison. File Dacom Dacom 16-bit 450 500 run-length ---------------------------------- PNGUIN 0.22 0.5 0.27 INTELP 0.62 0.77 3.3 PANDA 1.02 2.03 6.41The file called PNGUIN is a block drawing of dancing penguins andrepresents a "small" file. The file called INTELP is a compositeINTELPOST test image with text and graphics and represents a "medium"file. Finally, the file called PANDA is a half-tone newspaperphotograph of a giant panda and represents a "monster" file (this filewas recorded on the Dacom 450 in quality mode and is therefore abouthalf the size it would be in detail mode). The size of the bit-map filefor all these images is 3.8 megabits. Figure 5 shows the file sizes (in 512-byte blocks) and transcodingtimes (in seconds) for the INTELPOST test page. The file sizes areindicated on the boxes, while the transcoding times are indicated on thearrows. All times were obtained on the LSI-11/23 processor. 193 925 +-----------+ 95 +-----------+ | |----------->| | | T.4 | | Bit-map | | |<-----------| | +-----------+ 165 +-----------+ A | 60 | | .----------------------' |110 | | | V +-----------+ 89 +-----------+ | |----------->| | |Run-length | | Dacom 450 | | |<-----------| | +-----------+ 153 +-----------+ 413 155 Figure 5. File Sizes and Transcoding Times
Dacom 450/500 Facsimile Data Transcoding PAGE 144. References1. Weber, D.R.An adaptive run-length encoding algorithm. ICC-75, IEEE, San Francisco, California, June 1975.2. Palmer, L.C.Final Report, COMSAT Participation in the DARPA Packet Satellite Internetworking and Speech Applications Program. COMSAT Laboratories, July 1980.3. Katz, A.Decoding Facsimile Data from the Rapicom 450. DARPA Network Working Group ReportRFC-798, USC/Information Sciences Institute, September 1981.4. Postel, J.Rapicom 450 Facsimile File Formats. DARPA Network Working Group ReportRFC-769, USC/Information Sciences Institute, September 1980.5. Draft Recommendation T.4 - Standardization of Group 3 Facsimilefor Document Transmission. CCITT Study Group XIV Contribution #25-E, December 1977. (Also inRFC-804).
[8]ページ先頭