text
Computing concept and file format
Text, in computing terms, is a sequence of bytes which have an unambiguous mapping to units of human writing.These units are calledcharacters, although their precise definition varies.Incomputer languages, segments of text are represented by data structures known asstrings.Text may also be saved to thefile system as atext file.
§ Text Files
Text files are exactly those files whose bytes are intended to be directly interpreted as symbols of human writing.Not all files which represent human writing are text files; for example, aPNG of street graffiti contains human writing, but because the bytes of the file represent image data (and not letters or other symbols), it is not a text file.Additionally, the textual representation of some text files may not be their primary or most useful one; for example, anSVG file is a kind of text file, but it is usually rendered as an image.
Because all text files may be represented as a linear stream of human writing (even if they have other representations), any program which knows how to translate bytes into writing can open any kind of text file.When the goal of this program is to display and edit the text file for a human user, the program is called atext editor.
Numerous text editors exist for every platform, and their accessibility and ease‐of‐use have made human‐readable text files a core part of both theUnix philosophy and theWeb.
§ Text Encodings
The mapping of bytes into writing used for a piece of text is itsencoding.Today, most text is encoded as either UTF‐8 or UTF‐16, both of which map bytes to the set of characters defined by Unicode.If a program tries to read text but is incorrect about the encoding, the result is often an illegible string of characters known asmojibake.For example, if a computer accidently tries to read the UTF‐8 string ‹ Hello world! › as UTF‐16, the result is ‹ 䡥汬漠睯牬搡 ›.
§ Plain & Rich Text
Text comes in two varieties: plain and rich.Inplain text, every character in the text is expected to hold its literal meaning, and no information about the semantics, formatting, or presentation of the text is provided.In contrast,rich text uses certain sequences of characters to imbue the text with additional meaning or properties, for example annotating that a given span is emphasized, or that it should appear in the colour blue.These sequences are known asmarkup, and defined collections of markup symbols together form amarkup language.