SPECIFICATIONA method of and apparatus for specifying and forming charactersThis invention relates to a method of and apparatus for character processing including specifying and forming characters and in particular to a method of and apparatus for specifying and forming language characters such as Chinese,Japanese, Korean and other characters.
A standard Chinese typewriter is normally provided with a font of some 2,000 most used characters and a standby font of a further 2,000 lesser used characters and, in areas where a high degree of literacy or technical application is necessary, there may be still a further font of some 2,000 characters.
In type setting the fonts available to the type setter can well contain of the order of some 8,000 to 10,000 characters.
When typewriting or type setting, the correct character has to be selected, imprinted and returned to the font, which makes typewriting and type setting an extremely tedious and complicated procedure.
There have been many proposals for providing a simple way of specifying and forming Chinese characters but none of these have, as yet, been practical. Some have been developed by computer companies and use a specific number of digits per character, which digits are based on an arbitrary coding system, such as the four digit telegraphic code, others use a predetermined number of strokes which give simplified characters but which can be ambiguous, others transliterate Chinese characters into Roman letters and also provide a co-ordinated arrangement to locate the particular portions of a character, which method overcomes ambiguity but is extremely complicated.There is another system, a Chinese system, which defines each character by a predetermined set of digits, normally six, which draws its information from a few predetermined locations of the character and the digits are determined by the features found at these locations. This system still has ambiguities.
A still further arrangement stores complete material in chips but which, apparently, is designed for the storage of documents.
It is an object of the invention to provide a method of an apparatus for character processing including specifying and forming characters and in particular to provide means whereby a character can be transliterated at sight into an alphanumeric code which code can be used to provide the character in some other manner.
The invention includes a method of character processing by specifying and forming a character from a limited number of components and which is formed by forming the components in a particular order which method comprises specifying the character by allotting each element type of each component a unique alphanumeric code, forming a code train for the particular character by ordering the codes in a predetermined order of the components and using the train to effect identification of the characters, providing means to display the characters so identified and means, which on actuation, form the character.
The ordering of the codes may be as the components and specifically the elements thereof are traditionally ordered in writing.
It is preferred that the method includes means whereby ambiguities can be determined and resolved.
Each character can be formed of one or more strokes, each of which may comprise a number of components, the components each comprising elements which, in this particular form, are a dot being a relatively short or small component, hereinafter called a dot, a vertical, horizontal or diagonal, in either sense, line located in a particular position in a grid. We have found that the use of only this small number of components leads to minimal ambiguities, as will be described hereafter and by allotting one of the elements to each part of each component the operator can rapidly and readily specify the character. The dot element refers to any short stroke which may be a dot, comma like or serif which does not fit clearly into any of the other categories of elements.
Previously it has been believed that to specify characters in this way without major ambiguity it would be essential to use a very much larger number of elements.
The invention also includes, using the method previously specified, an apparatus for printing characters in which each component is identified as element(s) (as hereinbefore described); each component comprising the character, in an ordinate/abscissa position within a grid, is selected and printed, either mechanically, electrostatically, optically or photographically, the apparatus then moving to an abscissa/ordinate and repeating the process until the character is completed.
In this specification it will be appreciated that the printer does not 'read' the character so in practice the abscissa/ordinate can readily be reversed and the specification shall be so read.
The components are made to start at points which bear a constant relationship to a grid. This relationship need not be identical for all components. The grid need not necessarily be rectangular or square, although a square grid may be preferred.
The number of grid points and the variety of different components required to produce all possible characters in a given font is determined by trial and the complexity of character font required. We have found a 10 x 10 grid sufficient to provide satisfactorily readable Chinese characters.
The minimum number of components required is less than 5n (where n is the number of grid points along one side of the square into which the character is intended to fit). By using the basic components, all strokes, and therefore characters, can be approximated by combinations of these. A more elaborate choice of font was found to require  less than 1 50 components and positions.
The printing of a character occurs by a sequence of impressions of the required components. The abscissa may be provided by successive movements of the printer carriage or component head and the ordinate, by movements of the platen or of the component head.
Alternatively, depending on the type of printing being used, each element may be duplicated but located to be printed at different positions on the ordinate so the movement of the platen or component head can be minimised.
The elements may be located on a golf-ball, on keys, on cylinders, on daisy wheels, on a dot matrix or an x-y plotter, or on a transparent backing for use in photocomposition. These devices will also include punctuation marks, digits and other symbols. In Japanese printers they will also include katakana and hiragana symbols. InKorean they will include the hangul alphabet.
In order to avoid overprinting at points where components cross or overlap, a single-use ribbon may be used, which ribbon remains stationary during the printing of all components of a single character The ribbon is then advanced after the completion of each character.
The sequence of elements required and their abscissas (ordinates) i.e. components can be composed by an operator or stored in some memory device (mechanical, electronic, including any type of computer memory).
More than one golf ball, set of keys, cylinders, daisy wheels or other devices may be used either to increase the types of available components, or to increase the rate of printing by simultaneous printing of different parts of the text or of different portions of characters. More than one pass may be desirable for printing the characters in order to improve their quality or for any other reason.
Once a sufficient number of components is defined additional characters can be formed at will and included in the memory of the system.
Each stroke can be a single element component, as is normally understood, or can be comprised of multiple element components, either along the abscissa or the ordinate, or both.
Depending on the language concerned, so the particular keyboard can be varied. For example, to provide the elements only five keys are required but on the Chinese keyboard we may provide a full set of numerals with the numbers from 6 to 9 and 0 being useable for auxilliary purposes and permitting up to approximately thirty keys for characters, components or instructions.
As indicated, we prefer to use alphanumeric codes to identify the elements and provided the number of elements is sufficiently small the codes may satisfactorily be digits. In the preferred form there are only five elements.
As will be further described hereinafter, we may adopt a convention of using a predetermined maximum number of elements or component codes to define a character. Under such a system there can be ambiguities of two types if there are two or more characters equally described by thismaximum number of elements. In the first type the two characters are fully specified by the samecode and have a total number of elements equal toor less than the number used in the code. In the second type, although the codes of the twocharacters are different they are identical to thenumber of elements selected.
One way of resolving ambiguity may compriseboth means whereby when a number of alphanumeric codes sufficient to overcome ambiguities is reached an indication to this effect is given and wherein when all of the presented codes have been considered and ambiguity is still present then the possible characters are formed either at one time or sequentially and a selection can be made by an operator.
The method is particularly suitable for the formation of Chinese (and Japanese, Korean andother) characters in which there can be considered to be five elements, a dot and horizontal, vertical,positive diagonal and negative diagonal strokesand these can be provided with the analogue digits 1 to 5 respectively. The alphanumeric codemay then be a digital train suitable to beintroduced, by a keyboard, to a computer whichhas stored therein completed characters which can be displayed on a V.D.U., or which can cause the display or any selected part thereof to be printed or otherwise formed.
Whilst the method of the invention is particularly useful for the formation of characters of the types indicated hereinbefore, it is also applicable to other alphabetic or symbolic representations. For example, the method can be used to form Latin letters from a restricted keyboard. For example, a printing calculator or the like using a thermal printing system can be used to print letters rather than numbers by the changing of a mode and applying a numeric code as an analogue of each letter of the alphabet. All letters of the Latin alphabet can be defined by, say, a three number code with little ambiguity which can be satisfied in the same manner as is described herein for other characters. As only twenty-six letters are required, any ambiguity can be avoided simply by applying a predetermined convention to the code.
The invention also includes apparatus for character formation comprising an input means which can receive alphanumeric codes corresponding to the order of formation of components of the character, processing means whereby the alphanumeric codes identify, with or without ambiguity, the particular characters or character, means whereby the characters or character can be displayed and if ambiguous characters are displayed means whereby the required character can be selected and output means whereby the character can be retrieved and formed.
The input means may be a typewriter, a standard or a specially formed keyboard, the processing means may be a computer, mini computer or a microprocessor, the display means may be a V.D.U. and the output means may be a  mechanical, electrical, optical, photographic or thermal device.
The processing means may comprise means to reject further members of the alphanumeric train as soon as there is no ambiguity. It may also comprise means for various forms of character processing so that arrangement of the various characters can be widely manipulated.
In this specification we shall basically refer to the formation of Chinese characters although the invention is not to be considered specifically for this. It is completely applicable to Japanese andKorean text and may well also be applied to other languages using characters, an alphabet or symbolic representations which are formed in a particular manner. Comment relating to the application of the invention to Japanese andKorean text will be made hereafter.
In forming Chinese characters, hereinafter for ease simply stated to be characters, the characters are commonly formed by writing or painting the strokes of the character in a particular order. It can also be shown that any character can be considered to be made up of elements of only five types, there being dots and horizontal, vertical, positive diagonal and negative diagonal strokes in specific positions. We have adopted as alphanumeric codes of these elements the digits 1 to 5 respectively and thus by applying the required digit to each element of the character in the traditional order that it is made when forming the character we obtain a code train which represents the formation of a particular character. It will be appreciated that this train can be very short in that it can have as little as one or two digits or, for more complicated characters, can have thirty or more digits.We have found, however, that when the 2,000 most frequently used characters are considered the number of ambiguities formed by defining each character in this way, that is the percentage of times more than one character is defined by tne same code is of the order of only some 4% to 5%. In these cases the characters are different, but employ the same elements written in the same order but the placing of the elements varies.
We have found that by restricting the alphanumeric codes to a predetermined number of digits, say seven, although the number of ambiguities increase the total percentage is still low, and as will be described hereinafter, we can provide means to eliminate these ambiguities.
We have applied this to a method of forming characters. The method makes use of an input, a computer, a mini computer or microprocessor, aV.D.U. and an output which may well be associated with the V.D.U. but preferably is adapted to provide a hard copy.
The computer is programmed with the required number of characters, each of which can be addressed by its corresponding alphanumeric code. The input device, which may be a typewriter or teletype or which may be a specially designed unit, may have only the five digit keys or may, as we shall discuss later, have additional keys. An operator can then, by viewing the character, determine the required alphanumeric code, to the required number of digits, and feed this through the input device to the correct address in the computer memory and the character is then available. Preferably the character is immediately brought up on the V.D.U. and the operator can ascertain whether or not the input was correct and the character shown is that which was required.
Should there be any ambiguity and this could be because of an inherent ambiguity by two characters being defined by the same code or an ambiguity introduced by restricting the total code length, then we can provide an indication of such ambiguity and means whereby it can be overcome.
A first way is for each of the possible characters to be displayed and to provide means to select the required character from the display. The second is to display the most likely character, and we have found that normally, statistically, the most likely character will be correct at least 75% of the time and should this character not be correct provide means to select the next most likely character, and so on. In this case the selection may be left to an edit mode effected after the input is completed and the ambiguous characters can be tagged for consideration.
Alternatively, as the number of digits necessary for non-ambiguous recall of a particular character is very often less than the total number of digits necessary to fully specify the character, we can arrange for the computer to provide an indication as soon as a condition of non-ambiguity is indicated, thus reducing the time taken by the operator in introducing the total number of digits necessary to fully determine the character.
We have found that the forming of the characters of the invention can be realised by using a grid which is 10 units square, although this is not necessarily essential and to provide the strokes necessary we can use as few as 40 individual components. We have found that a standard daisy wheel or, more particularly, a slightly modified daisy wheel can operate perfectly satisfactorily, particularly for commercial Chinese, providing the necessary elements, numerals and punctuation, but it is also readily possible to use a machine with two daisy wheels to give an expanded range or high speed and, similarly, a machine using two or more golf balls can provide a full range of characters in that components from the various positions can be provided along each ordinate (abscissa).Further, if the additional components are available it is possible to provide strokes which more nearly follow brush marks in that instead of being of a fixed width there can be variation of width along the particular stroke. The control of the forming device is by the computer memory relating to each character.
The apparatus used can be so arranged that where two components overlap the same portion of tape, particularly where this is of the carbon face type, they can be impressed so that at the overlap position the portion of the tape which has  already been used, and has thus lost its coating, is above the position of previous use and the intensity of the strike does not vary across the character as a whole. It is also possible to provide the strokes to make the character, or the complete characters, on a photographic card or fiche with a carrier which is arranged to locate the transparency to a required position before a lens or to vary the lens position relative to the transparency.
Also, if required, particular strokes or part strokes can be provided as well as the elements on the printing device so that speed can be achieved and, further, the strokes or part strokes can be angled to the ordinate/abscissa and can vary in thickness.
As can be understood, the output may be of any one of a number of forms, either mechanical, electrical, thermal or optical.
Purely for example, beyond the specific embodiments previously discussed, the elements could be formed on the arms of a large number of daisy wheels which could be addressed by the computer so that the required daisy wheel would be located in position and the required character imprinted on paper. Such an arrangement would be satisfactory for standard typing. Alternatively, a multiple ball typewriter could be provided, photographic or xerographic copies could be either directly exposed or charged or could work from the V.D.U. image, jet-ink, dot matrix, electron beam, electrostatic or thermal printers could also be readily used and means could be provided to remove particular characters from a type font.
We can, if we require, incorporate supplementary keys on the input which relate to very frequently occurring component combinations. In this way, providing the operator recognises a particular combination, several digits of the train can be inputted on the stroke of only one key, although if the operator does not immediately recognise the combination then the component elements can still be initiated individually. Also we can provide certain keys which relate to characters which are very often used.
Also by use of combinations of alphanumeric codes frequently occurring phrases may be rapidly inputted into the device. For example, the first character or component could be entered together with a special symbol and part or all of the next relevant component to indicate the complete phrase.
The invention is also applicable to telex transmission of messages constituted of characters. In this case, of course, the operator of the sending telex digitises the characters and these are transmitted to the receiving telex where they can either be received normally, or, preferably, used as a direct input to a computer which can provide an output in characters.
It will be appreciated that where the system is arranged for different languages the necessary number of digits may need to be varied and where being used for Japanese the keyboard could include the katakana and hiragana and the Korean keyboard could include the hangul alphabet.
Japanese poses a more complex situation than does Chinese. To fully meet the Japanese requirements it is necessary to provide 1. Katakana2. Hiragana3. Kanji4. Punctuation5. Numerals6. Abbreviations of Kanji and Kana7. Latin AlphabetIt is possible to provide each of these requirements by using each key for a number of functions, as is presently the case with word and data processors, by providing one or more additional keys, or one or more of the keys of the keyboard to effect a change in mode.
In this way it is relatively simple to provide a keyboard which is identical for both languages, with the operator using Chinese not generally having the necessity to enter the specialised modes.
The Korean hangul alphabet may also necessitate different modes and also uses different sized symbols which may be provided by the operation of selected keys or by computer controlled selection of different size symbols and their composition into the traditionally shaped syllables.
The system of the invention is also applicable to computer processing of mixed characters and numeric data and can be applied to both input and output operations which have to do with such data.
The apparatus can also be used as in a word processor to manipulate the various characters and other text displayed and can also be provided with a statistical function, communications option, typesetting and computer programming inChinese or Japanese, rather than English.
Other applications such as pattern recognition can be provided.