Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Page Analysis and Ground Truth Elements

From Wikipedia, the free encyclopedia

Page Analysis and Ground Truth Elements (PAGE) is anXML standard for encoding digitised documents.[1] Comparable toAnalyzed Layout and Text Object (ALTO), it allows the organisation and structure of a page and its contents to be described.

PAGE XML can be used to describe:[citation needed]

  • page content (regions, lines of text, words, glyphs, reading order, text content, ...)
  • the evaluation of the layout analysis (evaluation profiles, evaluation results, ...)
  • the cutting of the document image (cutting grids)

The format is developed by the Pattern Recognition & Image Analysis Lab (PRIMA) at theUniversity of Salford in Manchester.[citation needed]

It was designed to be used in conjunction with automatic segmentation and transcription techniques (optical character recognition (OCR) andhandwritten text recognition (HTR)): indeed, PAGE aims to support each of the different steps in the processing chain for image document analysis (from image enhancement to layout analysis to OCR).[citation needed]

The PAGE XML schema is used as an export and import format by automatic transcription software such aseScriptorium[2] andTranskribus.[3] It is also an export format used by Kraken, a turnkey OCR system optimised for documents in historical and non-Latin scripts[4] and by the OCR softwareTesseract.[5]

References

[edit]
  1. ^"PAGE-XML". July 12, 2022 – via GitHub.
  2. ^"eScripta – Digital Tools and Techniques for the Study of Ancient Writing".
  3. ^"How To Export Documents from Transkribus".READ-COOP.
  4. ^Kiessling, Benjamin (April 5, 2022)."The Kraken OCR system" – via GitHub.
  5. ^"Tesseract Open Source OCR Engine".GitHub. Retrieved2025-07-07.

External links

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Page_Analysis_and_Ground_Truth_Elements&oldid=1337089960"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp