In computer data, asubstitute character (␚) is acontrol character that is used to pad transmitted data in order to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid, erroneous or unrepresentable on a given device. It is also used as an escape sequence in someprogramming languages.
In theASCII character set, this character is encoded by the number 26 (1Ahex). Standardkeyboards transmit this code when theCtrl andZ keys are pressed simultaneously (Ctrl+Z, often documented by convention as^Z).[1]Unicode inherits this character from ASCII, but recommends that thereplacement character (�, U+FFFD) be used instead to represent un-decodable inputs, when the output encoding is compatible with it.
Historically, underPDP-6 monitor,[2]RT-11,VMS, andTOPS-10,[3] and in early PCCP/M 1 and 2operating systems (and derivatives likeMP/M) it was necessary to explicitly mark theend of a file (EOF) because the nativefilesystem could not record the exact file size by itself; files were allocated in extents (records) of a fixed size, typically leaving some allocated but unused space at the end of each file.[4][5][6][7] This extra space was filled with1A16 (hex) characters under CP/M. The extended CP/M filesystems used by CP/M 3 and higher (and derivatives likeConcurrent CP/M,Concurrent DOS, andDOS Plus) did support byte-granular files,[8][9] so this was no longer a requirement, but it remained as a convention (especially fortext files) in order to ensure backward compatibility.
InCP/M,86-DOS,MS-DOS,PC DOS,DR-DOS, and their various derivatives, the SUB character was also used to indicate the end of a character stream,[citation needed] and thereby used to terminate user input in an interactivecommand line window (and as such, often used to finish console input redirection, e.g. as instigated by the commandCOPY CON: TYPEDTXT.TXT
).
While no longer technically required to indicate the end of a file, as of 2017, many text editors[which?] and program languages still support this convention, or can be configured to insert this character at the end of a file when editing, or at least properly cope with them in text files.[citation needed] In such cases, it is often termed a "soft" EOF, as it does not necessarily represent the physical end of the file, but is more a marker indicating that "there is no useful data beyond this point". In reality, more data may exist beyond this character up to the actual end of the data in the file system, thus it can be used to hide file content when the file is entered at the console or opened in editors. Many file format standards (e.g.PNG orGIF) include the SUB character in their headers to perform precisely this function. Some modern text file formats (e.g.CSV-1203[10]) still recommend a trailing EOF character to be appended as the last character in the file. However, typingControl+Z does not embed an EOF character into a file in eitherDOS orWindows, nor do theAPIs of those systems use the character to denote the actual end of a file.
Some programming languages (e.g.Visual Basic) will not read past a "soft" EOF when using the built-in text file reading primitives (INPUT, LINE INPUT etc.),[citation needed] and alternate methods must be adopted, e.g. opening the file in binary mode or using the File System Object to progress beyond it.
Character 26 was used to mark "End of file" even though ASCII calls this character Substitute, and has other characters to indicate "End of file". Number 28 which is called "File Separator" has also been used for similar purposes.
InUnix-like operating systems, this character is typically used inshells as a way for the user tosuspend the currently executing interactive process.[11] The suspended process can then be resumed inforeground (interactive) mode, or be made to resume execution inbackground mode, or beterminated. When entered by a user at theircomputer terminal, the currently running foreground process is sent a "terminal stop" (SIGTSTP) signal, which generally causes the process to suspend its execution. The user can later continue the process execution by using the "foreground" command (fg
) or the "background" command (bg
).
The Unicode Security Considerations report[12] recommends this character as a safe replacement for unmappable characters during character set conversion.
In many GUIs and applications,Control+Z (⌘ Command+Z onmacOS) can be used toundo the last action. In many applications, earlier actions than the last one can also be undone by pressingControl+Z multiple times.Control+Z was one of a handful ofkeyboard sequences chosen by the program designers atXerox PARC to controltext editing.
ASCII andUnicode representation of "substitute":
[...] The end of anASCII file is denoted by acontrol-Z character (1AH) or a real end of file, returned by theCP/M read operation. Control-Z characters embedded within machine code files (e.g.,COM files) are ignored, however, and the end of file condition returned by CP/M is used to terminate read operations. [...](56 pages)
[...]CP/M marks the end of anASCII file by placing aCONTROL-z character in the file after the last data character. If the file contains an exact multiple of 128 characters, in which case adding the CONTROL-Z would waste 127 characters, CP/M does not do so. Use of the CONTROL-Z character as theend-of-file marker is possible because CONTROL-z is seldom used as data in ASCII files. In a non-ASCII file, however, CONTROL-Z is just as likely to occur as any other character. Therefore, it cannot be used as the end-of-file marker. CP/M uses a different method to mark the end of a non-ASCII file. CP/M assumes it has reached the end of the file when it has read the last record (basic unit of disk space) allocated to the file. The disk directory entry for each file contains a list of the disk records allocated to that file. This method relies on the size of the file, rather than its content, to locate the end of the file. [...][1][2]