Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Data conversion

From Wikipedia, the free encyclopedia
Conversion of digital data between formats
This article is about conversion of file formats. For conversion of data types, seeType conversion. For conversion of analog information to digital data, seeAnalog-to-digital converter.
This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Data conversion" – news ·newspapers ·books ·scholar ·JSTOR
(November 2023) (Learn how and when to remove this message)
Data transformation
Concepts
Transformation languages
Techniques and transforms
Applications
Related

Data conversion is the conversion ofcomputer data from oneformat to another. Throughout a computer environment, data isencoded in a variety of ways. For example,computer hardware is built on the basis of certain standards, which requires that data contains, for example,parity bit checks. Similarly, theoperating system is predicated on certain standards for data and file handling. Furthermore, each computer program handles data in a different manner. Whenever any one of these variables is changed, data must be converted in some way before it can be used by a different computer, operating system or program. Even different versions of these elements usually involve different data structures. For example, the changing ofbits from one format to another, usually for the purpose of application interoperability or of the capability of using new features, is merely a data conversion. Data conversions may be as simple as the conversion of atext file from onecharacter encoding system to another; or more complex, such as the conversion of office file formats, or theconversion of image formats andaudio file formats.

There are many ways in which data is converted within the computer environment. This may be seamless, as in the case of upgrading to a newer version of a computer program. Alternatively, the conversion may require processing by the use of a special conversion program, or it may involve a complex process of going through intermediary stages, or involving complex "exporting" and "importing" procedures, which may include converting to and from a tab-delimited or comma-separated text file. In some cases, a program may recognize several data file formats at the data input stage and then is also capable of storing the output data in several different formats. Such a program may be used to convert a file format. If the source format or target format is not recognized, then at times a third program may be available which permits the conversion to an intermediate format, which can then be reformatted using the first program. There are many possible scenarios.

Information basics

[edit]

Before any data conversion is carried out, the user or application programmer should keep a few basics of computing andinformation theory in mind. These include:

  • Information can easily be discarded by the computer, but adding information takes effort.
  • The computer can add information only in a rule-based fashion.[citation needed]
  • Upsampling the data or converting to a morefeature-rich format does not add information; it merely makes room for that addition, which usually a human must do.
  • Data stored in an electronic format can be quickly modified and analyzed.

For example, atrue color image can easily be converted to grayscale, while the opposite conversion is a painstaking process. Converting aUnix text file to aMicrosoft (DOS/Windows) text file involves adding characters, but this does not increase theentropy since it is rule-based; whereas the addition of color information to a grayscale image cannot be reliably done programmatically, as it requires adding new information, so any attempt to add color would requireestimation by the computer based on previous knowledge. Converting a 24-bitPNG to a 48-bit one does not add information to it, it only pads existingRGB pixel values with zeroes[citation needed], so that a pixel with a value of FF C3 56, for example, becomes FF00 C300 5600. The conversion makes it possible to change a pixel to have a value of, for instance, FF80 C340 56A0, but the conversion itself does not do that, only further manipulation of the image can. Converting an image or audio file in alossy format (likeJPEG orVorbis) to alossless (likePNG orFLAC) or uncompressed (likeBMP orWAV) format only wastes space, since the same image with its loss of original information (the artifacts of lossy compression) becomes the target. A JPEG image can never be restored to the quality of the original image from which it was made, no matter how much the user tries the "JPEG Artifact Removal" feature of his or her image manipulation program.

Automatic restoration of information that was lost through alossy compression process would probably require important advances inartificial intelligence.

Because of these realities of computing and information theory, data conversion is often a complex and error-prone process that requires the help of experts.

Pivotal conversion

[edit]

Data conversion can occur directly from one format to another, but many applications that convert between multiple formats use anintermediate representation by way of which any source format is converted to its target.[1] For example, it is possible to convertCyrillic text fromKOI8-R toWindows-1251 using a lookup table between the two encodings, but the modern approach is to convert the KOI8-R file toUnicode first and from that to Windows-1251. This is a more manageable approach; rather than needing lookup tables for all possible pairs of character encodings, an application needs only one lookup table for each character set, which it uses to convert to and from Unicode, thereby scaling the number of tables down from hundreds to a few tens.[citation needed]

Pivotal conversion is similarly used in other areas. Office applications, when employed to convert between office file formats, use their internal, default file format as a pivot. For example, aword processor may convert anRTF file to aWordPerfect file by converting the RTF toOpenDocument and then that to WordPerfect format. An image conversion program does not convert aPCX image toPNG directly; instead, when loading the PCX image, it decodes it to a simple bitmap format for internal use in memory, and when commanded to convert to PNG, that memory image is converted to the target format. An audio converter that converts fromFLAC toAAC decodes the source file to rawPCM data in memory first, and then performs the lossy AAC compression on that memory image to produce the target file.

Lost and inexact data conversion

[edit]

The objective of data conversion is to maintain all of the data, and as much of the embedded information as possible. This can only be done if the target format supports the same features and data structures present in the source file. Conversion of a word processing document to a plain text file necessarily involves loss of formatting information, because plain text format does not support word processing constructs such as marking a word as boldface. For this reason, conversion from one format to another which does not support a feature that is important to the user is rarely carried out, though it may be necessary for interoperability, e.g. converting a file from one version ofMicrosoft Word to an earlier version to enable transfer and use by other users who do not have the same later version of Word installed on their computer.

Loss of information can be mitigated by approximation in the target format. There is no way of converting a character likeä toASCII, since the ASCII standard lacks it, but the information may be retained by approximating the character asae. Of course, this is not an optimal solution, and can impact operations like searching and copying; and if a language makes a distinction betweenä andae, then that approximation does involve loss of information.

Data conversion can also suffer from inexactitude, the result of converting between formats that are conceptually different. TheWYSIWYG paradigm, extant in word processors anddesktop publishing applications, versus the structural-descriptive paradigm, found inSGML,XML and many applications derived therefrom, likeHTML andMathML, is one example. Using a WYSIWYG HTML editor conflates the two paradigms, and the result is HTML files with suboptimal, if not nonstandard, code. In the WYSIWYG paradigm a double linebreak signifies a new paragraph, as that is the visual cue for such a construct, but a WYSIWYG HTML editor will usually convert such a sequence to <BR><BR>, which is structurally no new paragraph at all. As another example, converting fromPDF to an editable word processor format is a tough chore, because PDF records the textual information like engraving on stone, with each character given a fixed position and linebreaks hard-coded, whereas word processor formats accommodate text reflow. PDF does not know of a word space character—the space between two letters and the space between two words differ only in quantity. Therefore, a title with ample letter-spacing for effect will usually end up with spaces in the word processor file, for example INTRODUCTION with spacing of 1em as I N T R O D U C T I O N on the word processor.

Open vs. secret specifications

[edit]

Successful data conversion requires thorough knowledge of the workings of both source and target formats. In the case where the specification of a format is unknown,reverse engineering will be needed to carry out conversion. Reverse engineering can achieve close approximation of the original specifications, but errors and missing features can still result.

Electronics

[edit]

Data format conversion can also occur at the physical layer of an electronic communication system. Conversion betweenline codes such asNRZ andRZ can be accomplished when necessary.

See also

[edit]

References

[edit]
  1. ^Dragos-Anton Manolescu; Markus Voelter; James Noble (2006).Pattern Languages of Program Design 5. Addison-Wesley Professional. pp. 271–.ISBN 978-0-321-32194-7.

Manolescu, FirstName (2006).Pattern Languages of Program Design 5. Upper Saddle River, NJ: Addison-Wesley.ISBN 0321321944.

Authority control databases: NationalEdit this at Wikidata
Retrieved from "https://en.wikipedia.org/w/index.php?title=Data_conversion&oldid=1275770881"
Category:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp