Page Analysis and Ground Truth Elements (PAGE) is anXML standard for encoding digitised documents.[1] Comparable toAnalyzed Layout and Text Object (ALTO), it allows the organisation and structure of a page and its contents to be described.
PAGE XML can be used to describe:[citation needed]
The format is developed by the Pattern Recognition & Image Analysis Lab (PRIMA) at theUniversity of Salford in Manchester.[citation needed]
It was designed to be used in conjunction with automatic segmentation and transcription techniques (optical character recognition (OCR) andhandwritten text recognition (HTR)): indeed, PAGE aims to support each of the different steps in the processing chain for image document analysis (from image enhancement to layout analysis to OCR).[citation needed]
The PAGE XML schema is used as an export and import format by automatic transcription software such aseScriptorium[2] andTranskribus.[3] It is also an export format used by Kraken, a turnkey OCR system optimised for documents in historical and non-Latin scripts[4] and by the OCR softwareTesseract.[5]