Document processing is a field of research and a set ofproduction processes aimed at making an analogdocument digital. Document processing does not simply aim to photograph orscan a document to obtain adigital image, but also to make it digitally intelligible. This includes extracting the structure of the document or thelayout and then the content, which can take the form of text or images. The process can involve traditionalcomputer vision algorithms, convolutional neural networks or manual labor. The problems addressed are related tosemantic segmentation,object detection,optical character recognition (OCR),handwritten text recognition (HTR) and, more broadly,transcription, whetherautomatic or not.[1] The term can also include the phase of digitizing the document using a scanner and the phase of interpreting the document, for example usingnatural language processing (NLP) orimage classification technologies. It is applied in many industrial and scientific fields for the optimization of administrative processes, mail processing and the digitization of analogarchives and historical documents.
Document processing was initially as is still to some extent a kind of production line work dealing with the treatment ofdocuments, such as letters and parcels, in an aim of sorting, extracting or massively extracting data. This work could be performed in-house or throughbusiness process outsourcing.[2][3] Document processing can indeed involve some kind of externalized manual labor, such asmechanical Turk.
As an example of manual document processing, as relatively recent as 2007,[4] document processing for "millions of visa and citizenship applications" was about use of "approximately 1,000 contract workers" working to "manage mail room anddata entry."
While document processing involved data entry via keyboard well before use of acomputer mouse or acomputer scanner, a 1990 article inThe New York Times regarding what it called the "paperless office" stated that "document processing begins with the scanner".[5] In this context, a formerXerox vice-president,Paul Strassmann, expressed a critical opinion, saying that computers add rather than reduce the volume of paper in an office.[5] It was said that the engineering and maintenance documents for an airplane weigh "more than the airplane itself"[citation needed].
As thestate of the art advanced, document processing transitioned to handling "document components ... as database entities."[6]
A technology called automatic document processing or sometimes intelligent document processing (IDP) emerged as a specific form ofIntelligent Process Automation (IPA), combiningartificial intelligence such asMachine Learning (ML),Natural Language Processing (NLP) orIntelligent Character Recognition (ICE) to extract data from several types documents.[7][8] Advancements in automatic document processing, also called Intelligent Document Processing, improve the ability to processunstructured data with fewer exceptions and greater speeds.[9]
Automatic document processing applies to a whole range of documents, whether structured or not. For instance, in the world of business and finance, technologies may be used to process paper-based invoices, forms, purchase orders, contracts, and currency bills.[10] Financial institutions use intelligent document processing to process high volumes of forms such as regulatory forms or loan documents. ID uses AI to extract and classify data from documents, replacing manual data entry.[11]
In medicine, document processing methods have been developed to facilitate patient follow-up and streamline administrative procedures, in particular by digitizing medical or laboratory analysis reports. The goal is also to standardize medical databases.[12] Algorithms are also directly used to assist physicians in medical diagnosis, e.g. by analyzingmagnetic resonance images,[13][14] ormicroscopic images.[15]
Document processing is also widely used in thehumanities anddigital humanities, in order to extract historicalbig data from archives or heritage collections. Specific approaches were developed for various sources, including textual documents, such as newspaper archives,[16] but also images,[17] or maps.[18][19]
If, from the 1980s onward, traditional computer vision algorithms were widely used to solve document processing problems,[20][21] these have been gradually replaced by neural network technologies in the 2010s.[22] However, traditional computer vision technologies are still used, sometimes in conjunction with neural networks, in some sectors.
Many technologies support the development of document processing, in particularoptical character recognition (OCR), andhandwritten text recognition (HTR), which allow the text to be transcribed automatically. Text segments as such are identified using instance orobject detection algorithms, which can sometimes also be used to detect the structure of the document. The resolution of the latter problem sometimes also usessemantic segmentation algorithms.
These technologies often form the core of document processing. However, other algorithms may intervene before or after these processes. Indeed, documentdigitization technologies are also involved, whether in the form of classical or three-dimensional scanning.[23] The digitization of 3D documents can in particular resort to derivatives ofphotogrammetry. Sometimes, specific 2D scanners must also be developed to adapt to the size of the documents or for reasons of scanning ergonomics.[17] The document processing also depends on the digital encoding of the documents in a suitablefile format. Furthermore, the processing of heterogeneous databases can rely onimage classification technologies.
At the other end of the chain are various image completion, extrapolation or data cleanup algorithms. For textual documents, the interpretation can usenatural language processing (NLP) technologies.
{{cite book}}
: CS1 maint: multiple names: authors list (link)