Amachine-readable document is adocument whose content can be readily processed bycomputers. Such documents are distinguished from more generalmachine-readable data by virtue of having further structure to provide the necessary context to support the business processes for which they are created.
Data withoutcontext is meaningless and lacks the four essential characteristics of trustworthybusiness records specified inISO 15489 Information and documentation – Records management:[1]
The vast bulk of information isunstructured data and, from a business perspective, that means it is "immature", i.e., Level 1 (chaotic) of theCapability Maturity Model. Such immaturity fosters inefficiency, diminishes quality, and limits effectiveness. Unstructured information is also ill-suited forrecords management functions, provides inadequateevidence for legal purposes, drives up the cost ofdiscovery inlitigation, and makes access and usage needlessly cumbersome in routine, ongoingbusiness processes.
There are at least four aspects to machine-readability:
As early as 1983, the U.S.Government Accountability Office (GAO) began emphasizing the benefits of machine-readable information.[2] Still sooner, in 1981, GAO began reporting on the problem of inadequate record-keeping practices in theU.S. federal government.[3] Such deficiencies are not unique to government and advances in information technology mean that most information is now "born digital" and thus potentially far more easily managed by automated means.[4] However, in testimony to Congress in 2010, GAO highlighted problems with managing electronic records, and as recently as 2015, GAO has continued to report inadequacies in the performance of Executive Branch agencies in meeting records management requirements.[5][6] Moreover, more than two decades after a major and formerly highly respected auditing firm,Arthur Andersen, met its demise due to a records destruction scandal, record-keeping practices became a central issue in the 2016 Presidential election.
On January 4, 2011, President Obama signed H.R. 2142, theGovernment Performance and Results Act (GPRA) Modernization Act of 2010 (GPRAMA), into law as P.L. 111-352. Section 10 of GPRAMA requires U.S. federal agencies to publish their strategic and performance plans and reports in searchable, machine-readable format.[7]Additionally, in 2013, he issuedExecutive Order 13642, Making Open and Machine Readable the New Default for Government Information in general.[8]On July 28, 2016, theOffice of Management and Budget (OMB) followed up by including in the revised issuance of Circular A-130 direction for agencies to use open, machine-readable formats,[9] and to publish "public information online in a manner that promotes analysis and reuse for the widest possible range of purposes",[10] meaning that the information is both publicly accessible and machine-readable. On January 14, 2019, President Trump signed into law H.R. 4174,[11] theOPEN Government Data Act (OGDA), which codifies in law the requirement for agencies to make their public data assets available in machine-readable format. On June 28, 2019, in Circular A-11,[12] OMB expressed intent to begin complying with section 10 of GPRAMA.[13]
In support of such policy direction, technological advancement is enabling more efficient and effective management and use of machine-readable electronic records.Document-oriented databases have been developed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Extensible Markup Language (XML) is a World Wide Web Consortium (W3C)Recommendation setting forth rules for encoding documents in a format that is bothhuman-readable and machine-readable. ManyXML editor tools have been developed and most, if not all major information technology applications support XML to greater or lesser degrees. The fact that XML itself is an open, standard, machine-readable format makes it relatively easy for application developers to do so.
The W3C's accompanying XML Schema (XSD) Recommendation specifies how to formally describe the elements in an XML document. With respect to the specification of XML schemas, theOrganization for the Advancement of Structured Information Standards (OASIS) is a leadingstandards-developing organization. However, many technical developers prefer to work withJSON, and to define the structure of JSON data for validation, documentation, and interaction control,JSON Schema[broken anchor] was developed by theInternet Engineering Task Force (IETF).
ThePortable Document Format (PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems. Each PDF file encapsulates a complete description of the presentation of the document, including the text, fonts, graphics, and other information needed to display it.PDF/A is an ISO-standardized version of the PDF specialized for use in the archiving and long-term preservation of electronic documents. PDF/A-3 allows embedding of other file formats, includingXML, intoPDF/A conforming documents, thus potentially providing the best of both human- and machine-readability. The W3C'sXSL-FO (XSL Formatting Objects)markup language is commonly used to generate PDF files
Metadata, data about data, can be used to organize electronic resources, provide digital identification, and support the archiving and preservation of resources. In well-structured, machine-readable electronic records, the content can berepurposed as both data and metadata. In the context of electronic record-keeping systems, the terms "management" and "metadata" are virtually synonymous. Given proper metadata, records management functions can be automated, thereby reducing the risk ofspoliation of evidence and other fraudulent manipulations of records. Moreover, such records can be used to automate the process ofauditing data maintained indatabases, thereby reducing the risk of single points of failure associated with theMachiavellian concept of asingle source of truth.
Blockchains allow to create and maintain continuously-growing lists of records secured from tampering and revision. A key feature is that every node in a decentralized system has a copy of the blockchain so there is nosingle point of failure subject to manipulation andfraud.