JP2023001657A

Movatterモバイル変換

Info

Publication number: JP2023001657A
Application number: JP2021102506A
Authority: JP
Inventors: 克仁島▲崎▼; Katsuhito Shimazaki; 祥太横川; Shota Yokokawa
Original assignee: PFU Ltd
Current assignee: PFU Ltd
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2023-01-06

Abstract

To provide an image processing apparatus which extracts a title from image data with higher accuracy.SOLUTION: The image processing apparatus includes: a content extraction unit which extracts information about contents of an image from optically read image data; a logic selection unit which selects a determination logic for determining a title, in accordance with the information extracted by the content extraction unit; and a title determination unit which determines the title of the image data on the basis of the determination logic selected by the logic selection unit.SELECTED DRAWING: Figure 4

Description

Translated fromJapanese

本発明は、画像処理装置、画像処理方法、及びプログラムに関する。 The present invention relates to an image processing device, an image processing method, and a program.

例えば、特許文献１には、画像データにおける、文字列が記載された箇所を複数特定する特定部と、特定された複数の箇所の夫々について、画像データにおける位置、および記載された文字のサイズを含むレイアウト情報を取得するレイアウト情報取得部と、複数の箇所の夫々について、他の箇所との位置関係および他の箇所とのサイズ関係に基づいて、画像データに含まれ得る所定の属性を有する文字列としての尤度を算出し、尤度に基づいて、所定の属性を有する文字列が記載された箇所を推定する推定部とを備えた情報処理装置が開示されている。 For example, Patent Literature 1 discloses a specification part that specifies a plurality of places where a character string is described in image data, and for each of the specified places, the position in the image data and the size of the described characters. a layout information acquisition unit that acquires layout information including characters having predetermined attributes that can be included in image data based on the positional relationship with other locations and the size relationship with other locations for each of a plurality of locations; An information processing device is disclosed that includes an estimation unit that calculates a likelihood as a string and estimates a location where a character string having a predetermined attribute is written based on the likelihood.

また、特許文献２には、原稿画像Ｉｋから抽出されたページ情報に従って原稿画像Ｉｋにファイル名を付与するファイル管理装置が開示されている。 Further, Japanese Patent Application Laid-Open No. 2002-200001 discloses a file management device that assigns a file name to a document image Ik according to page information extracted from the document image Ik.

また、特許文献３には、原稿を光学的に読み取って画像データを取得する画像読取部１０と、取得された画像データにおいて複数の文字の集まりからなる文字ブロック及び当該文字ブロックに含まれる文字列を認識する文字列認識部１０２と、認識された各文字列について文字サイズ、行数、面積及び原稿における配置位置を特定する文字列外観特定部１０３と、特定された情報に基づいて原稿のタイプを判別する原稿タイプ判別部１０４と、認識された各文字列について原稿タイプ判別部１０４により判別された原稿タイプに応じた基準に準じた複数の評価項目で加重評価を行う文字列評価部１０５と、認識された文字列のうち文字列評価部１０５による評価点が高い文字列を画像データのファイル名候補として選出するファイル名候補選出部１０６とを備える原稿読取装置１が開示されている。 Further, Patent Document 3 describes an image reading unit 10 for optically reading a document to acquire image data, a character block formed of a group of characters in the acquired image data, and a character string included in the character block. a character string recognition unit 102 that recognizes the character string, a character string appearance identification unit 103 that identifies the character size, the number of lines, the area and the arrangement position in the document for each recognized character string, and the type of the document based on the identified information and a character string evaluation unit 105 for weighted evaluation of each recognized character string using a plurality of evaluation items according to the criteria corresponding to the document type determined by the document type determination unit 104. , and a file name candidate selection unit 106 that selects a character string having a high evaluation score by a character string evaluation unit 105 among recognized character strings as a file name candidate for image data.

特許第６０５０８４３Patent No. 6050843特開２００６－２５２４５５JP 2006-252455特許第６７５３３７０Patent No. 6753370

より高い精度で、画像データからタイトルを抽出する画像処理装置を提供することを目的とする。 An object of the present invention is to provide an image processing device that extracts a title from image data with higher accuracy.

本発明に係る画像処理装置は、光学的に読み取られた画像データから、画像の内容に関する情報を抽出するコンテンツ抽出部と、前記コンテンツ抽出部により抽出された情報に応じて、タイトルを決定するための決定ロジックを選択するロジック選択部と、前記ロジック選択部により選択された決定ロジックに基づいて、前記画像データのタイトルを決定するタイトル決定部とを有する。 An image processing apparatus according to the present invention includes a content extraction unit for extracting information about content of an image from optically read image data, and determining a title according to the information extracted by the content extraction unit. and a title determination unit for determining the title of the image data based on the determination logic selected by the logic selection unit.

好適には、前記決定ロジックは、前記画像の内容からタイトルを抽出するためのロジックであり、前記タイトル決定部は、選択された前記決定ロジックに従って、前記画像データから、タイトルの要素を抽出する。 Preferably, the determination logic is logic for extracting a title from the content of the image, and the title determination unit extracts title elements from the image data according to the selected determination logic.

好適には、前記コンテンツ抽出部は、前記画像データから、単語に関する情報、文体、文字種、文字列の外観に関する情報、色の分布、及び、文字サイズの分布、の少なくとも一つを抽出し、前記ロジック選択部は、前記コンテンツ抽出部により抽出された、単語に関する情報、文体、文字種、文字列の外観に関する情報、色の分布、及び、文字サイズの分布、の少なくとも一つに基づいて、決定ロジックを選択する。 Preferably, the content extraction unit extracts at least one of information on words, style of writing, type of characters, information on appearance of character strings, color distribution, and character size distribution from the image data, The logic selection unit performs decision logic based on at least one of information on words, style of writing, type of characters, information on appearance of character strings, color distribution, and character size distribution extracted by the content extraction unit. to select.

好適には、前記コンテンツ抽出部は、前記画像データから、既定の単語が出現する出現頻度、又は、既定の単語が出現した出現位置を抽出し、前記ロジック選択部は、前記コンテンツ抽出部により抽出された、既定の単語が出現する出現頻度、又は、既定の単語が出現した出現位置に基づいて、決定ロジックを選択する。 Preferably, the content extraction unit extracts from the image data the frequency of appearance of a predetermined word or the appearance position of the predetermined word, and the logic selection unit extracts by the content extraction unit A decision logic is selected based on the frequency of occurrence of the predetermined word or the position of occurrence of the predetermined word.

好適には、前記ロジック選択部は、既定の単語が出現する出現頻度、及び、既定の単語が出現した出現位置を入力データとした機械学習モデルを用いて、決定ロジックを選択する。 Preferably, the logic selection unit selects the decision logic by using a machine learning model whose input data are the frequency of appearance of a predetermined word and the appearance position of the predetermined word.

好適には、前記タイトル決定部により決定されたタイトルを含むファイル名を、前記画像データのデータファイルに付与するファイル名付与部をさらに有する。 Preferably, the apparatus further includes a file name assigning unit that assigns a file name including the title determined by the title determining unit to the data file of the image data.

好適には、前記決定ロジックには、画像の内容に基づいて複数のカテゴリー名の中から、採用するカテゴリーを選択する決定ロジックが含まれている。 Preferably, the decision logic includes decision logic for selecting a category to adopt from among a plurality of category names based on image content.

また、本発明に係る画像処理方法は、光学的に読み取られた画像データから、画像の内容に関する情報を抽出するコンテンツ抽出ステップと、前記コンテンツ抽出ステップにより抽出された情報に応じて、タイトルを決定するための決定ロジックを選択するロジック選択ステップと、前記ロジック選択ステップにより選択された決定ロジックに基づいて、前記画像データのタイトルを決定するタイトル決定ステップとを有する。 Further, an image processing method according to the present invention includes a content extraction step of extracting information about the content of an image from optically read image data, and determining a title according to the information extracted by the content extraction step. and a title determination step of determining the title of the image data based on the determination logic selected by the logic selection step.

また、本発明に係るプログラムは、光学的に読み取られた画像データから、画像の内容に関する情報を抽出するコンテンツ抽出ステップと、前記コンテンツ抽出ステップにより抽出された情報に応じて、タイトルを決定するための決定ロジックを選択するロジック選択ステップと、前記ロジック選択ステップにより選択された決定ロジックに基づいて、前記画像データのタイトルを決定するタイトル決定ステップとをコンピュータに実行させる。 Further, a program according to the present invention includes a content extraction step of extracting information about the content of an image from optically read image data, and determining a title according to the information extracted by the content extraction step. and a title determination step of determining the title of the image data based on the determination logic selected by the logic selection step.

より高い精度で、画像データからタイトルを抽出することができる。 Titles can be extracted from image data with higher accuracy.

画像処理システム１の全体構成を例示する図である。1 is a diagram illustrating the overall configuration of an image processing system 1; FIG.タイトル抽出の失敗例を説明する図である。It is a figure explaining the failure example of title extraction.画像処理装置２のハードウェア構成を例示する図である。2 is a diagram illustrating a hardware configuration of an image processing device 2; FIG.画像処理装置２の機能構成を例示する図である。2 is a diagram illustrating a functional configuration of an image processing device 2; FIG.画像処理システム１における全体動作（Ｓ１０）を説明するフローチャートである。4 is a flowchart for explaining the overall operation (S10) in the image processing system 1;コンテンツに対応した決定ロジックを例示する図である。FIG. 4 is a diagram illustrating decision logic corresponding to content;

図１は、画像処理システム１の全体構成を例示する図である。
図１に例示するように、画像処理システム１は、画像処理装置２及びスキャナ装置４を含み、ケーブル７を介して互いに接続している。なお、本例では、ＵＳＢケーブルなどのケーブル７で接続する形態を具体例として説明するが、これに限定されるものではなく、例えば、無線により接続してもよい。また、スキャナ装置４の筐体内に、画像処理装置２の機能が内蔵されていてもよい。
画像処理装置２は、コンピュータ端末であり、スキャナ装置４により読み取られた画像データを処理する。具体的には、画像処理装置２は、スキャナ装置４により連続的に読み取られた画像データに関して、タイトルを決定し、決定されたタイトルを含むファイル名を付与する。
スキャナ装置４は、光学式の画像読取装置である。本例のスキャナ装置４は、原稿台にセットされた原稿を１枚ずつ送る自動原稿送り装置を含み、原稿台にセットされた原稿から、画像データを生成し、生成された画像データのデータファイルを画像処理装置２に転送する。FIG. 1 is a diagram illustrating the overall configuration of an image processing system 1. As shown in FIG.
As illustrated in FIG. 1, an image processing system 1 includes an image processing device 2 and a scanner device 4, which are connected to each other via a cable 7. FIG. In this example, a form of connection using a cable 7 such as a USB cable will be described as a specific example, but the present invention is not limited to this, and may be connected wirelessly, for example. Further, the functions of the image processing device 2 may be incorporated in the housing of the scanner device 4 .
The image processing device 2 is a computer terminal and processes image data read by the scanner device 4 . Specifically, the image processing device 2 determines titles for the image data continuously read by the scanner device 4, and assigns file names including the determined titles.
The scanner device 4 is an optical image reading device. The scanner device 4 of this example includes an automatic document feeder that feeds the originals set on the original platen one by one, generates image data from the originals set on the original platen, and creates a data file of the generated image data. is transferred to the image processing device 2 .

上記構成において、スキャナ装置４が原稿をスキャンし、画像処理装置２が、スキャン文書の内容を容易に把握するためにその文書のタイトルを自動で抽出し、それを含むファイル名を付与する場合がある。
この場合、論文や報告書などのオーソドックスな文書であれば上部中央付近に大きな文字で書かれることが多く、タイトルを抽出することは容易だった。しかし、文書にはさまざまな種類があり、プレゼン資料や冊子、パンフレット、新聞などでは、適切にタイトルを抽出できないことが多かった。例えば、図２に例示するプレゼン資料のケースでは、原稿上部の大きな文字をタイトル候補として優先するため、資料名（図中の「２０ＸＸ年新製品のご紹介」）ではなく、会社名（図中の「ＩＴコーポレーション」）を誤って抽出してしまう。原稿が新聞紙や会報等である場合も同様である。In the above configuration, there is a case where the scanner device 4 scans a document, and the image processing device 2 automatically extracts the title of the document in order to easily grasp the contents of the scanned document, and gives a file name including the title. be.
In this case, orthodox documents such as papers and reports are often written in large characters near the top center, making it easy to extract the title. However, there are various types of documents, and in many cases titles cannot be appropriately extracted from presentation materials, booklets, pamphlets, newspapers, and the like. For example, in the case of the presentation material shown in Fig. 2, the company name ( "IT Corporation") is mistakenly extracted. The same applies when the manuscript is a newspaper, bulletin, or the like.

そこで、本実施形態の画像処理システム１では、タイトル決定に用いる決定ロジックを画像コンテンツに応じて切り替えることにより、画像コンテンツに適したタイトルの抽出が可能になる。すなわち、１種類のロジックでは種々のコンテンツに対応できないため、画像処理装置２は、複数の決定ロジックの中から、画像内容に応じた決定ロジックを選択し、選択した決定ロジックでタイトルを決定する。 Therefore, in the image processing system 1 of the present embodiment, by switching the determination logic used for title determination according to the image content, it is possible to extract a title suitable for the image content. That is, since one kind of logic cannot cope with various contents, the image processing device 2 selects a decision logic corresponding to the image content from a plurality of decision logics, and decides the title with the selected decision logic.

図３は、画像処理装置２のハードウェア構成を例示する図である。
図３に例示するように、画像処理装置２は、ＣＰＵ２００、メモリ２０２、ＨＤＤ２０４、ネットワークインタフェース２０６（ネットワークＩＦ２０６）、表示装置２０８、及び、入力装置２１０を有し、これらの構成はバス２１２を介して互いに接続している。
ＣＰＵ２００は、例えば、中央演算装置である。
メモリ２０２は、例えば、揮発性メモリであり、主記憶装置として機能する。
ＨＤＤ２０４は、例えば、ハードディスクドライブ装置であり、不揮発性の記録装置としてコンピュータプログラム（例えば、図４の画像処理プログラム３）やその他のデータファイルを格納する。
ネットワークＩＦ２０６は、有線又は無線で通信するためのインタフェースであり、例えば、スキャナ装置４との通信を実現する。
表示装置２０８は、例えば、液晶ディスプレイである。
入力装置２１０は、例えば、キーボード及びマウスである。FIG. 3 is a diagram illustrating the hardware configuration of the image processing device 2. As shown in FIG.
As illustrated in FIG. 3, the image processing apparatus 2 has aCPU 200, amemory 202, anHDD 204, a network interface 206 (network IF 206), adisplay device 208, and aninput device 210. These components are connected via a bus 212. connected to each other.
CPU 200 is, for example, a central processing unit.
Thememory 202 is, for example, a volatile memory and functions as a main memory.
The HDD 204 is, for example, a hard disk drive device, and stores computer programs (eg, the image processing program 3 in FIG. 4) and other data files as a non-volatile recording device.
The network IF 206 is an interface for wired or wireless communication, and realizes communication with the scanner device 4, for example.
Thedisplay device 208 is, for example, a liquid crystal display.
Input device 210 is, for example, a keyboard and mouse.

図４は、画像処理装置２の機能構成を例示する図である。
図４に例示するように、画像処理装置２には、画像処理プログラム３がインストールされている。
画像処理プログラム３は、コンテンツ抽出部３００、ロジック選択部３１０、タイトル決定部３２０、及びファイル名付与部３３０を有する。
なお、画像処理プログラム３の一部又は全部は、ＡＳＩＣなどのハードウェアにより実現されてもよく、また、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）の機能を一部借用して実現されてもよい。FIG. 4 is a diagram illustrating the functional configuration of the image processing device 2. As shown in FIG.
As illustrated in FIG. 4, the image processing program 3 is installed in the image processing device 2 .
The image processing program 3 has acontent extraction section 300 , alogic selection section 310 , atitle determination section 320 and a filename assignment section 330 .
Part or all of the image processing program 3 may be realized by hardware such as ASIC, or may be realized by partially borrowing functions of an OS (Operating System).

画像処理プログラム３において、コンテンツ抽出部３００は、光学的に読み取られた画像データから、画像の内容に関する情報を抽出する。抽出される情報は、例えば、単語に関する情報、文体、文字種、文字列の外観に関する情報、色の分布、及び、文字サイズの分布などである。より具体的には、コンテンツ抽出部３００は、画像データに対して文字認識処理を施して、既定の単語が出現する出現頻度、及び、既定の単語が出現した出現位置を抽出する。 In the image processing program 3, thecontent extraction unit 300 extracts information about the content of the image from the optically read image data. The information to be extracted includes, for example, information on words, style of writing, types of characters, information on the appearance of character strings, distribution of colors, and distribution of character sizes. More specifically, thecontent extraction unit 300 performs character recognition processing on the image data, and extracts the appearance frequency of the predetermined word and the appearance position of the predetermined word.

ロジック選択部３１０は、コンテンツ抽出部３００により抽出された情報に応じて、タイトルを決定するための決定ロジックを選択する。例えば、ロジック選択部３１０は、単語に関する情報、文体、文字種、文字列の外観に関する情報、色の分布、及び、文字サイズの分布、の少なくとも一つに基づいて、複数の決定ロジックの中から、採用する決定ロジックを選択する。 Thelogic selection unit 310 selects the determination logic for determining the title according to the information extracted by thecontent extraction unit 300. FIG. For example, thelogic selection unit 310 selects from among a plurality of decision logics based on at least one of information on words, style of writing, type of characters, information on appearance of character strings, color distribution, and character size distribution, Select the decision logic to employ.

タイトル決定部３２０は、ロジック選択部３１０により選択された決定ロジックに基づいて、画像データのタイトルを決定する。例えば、タイトル決定部３２０は、ロジック選択部３１０により選択された決定ロジックにより指定された画像領域の文字列を、タイトルの要素として抽出し、抽出された文字列を用いてタイトルを決定する。 Thetitle determination section 320 determines the title of the image data based on the determination logic selected by thelogic selection section 310 . For example, thetitle determination unit 320 extracts the character string of the image area specified by the determination logic selected by thelogic selection unit 310 as an element of the title, and determines the title using the extracted character string.

ファイル名付与部３３０は、タイトル決定部３２０により決定されたタイトルを含むファイル名を、画像データのデータファイルに付与する。例えば、ファイル名付与部３３０は、タイトル決定部３２０により決定されたタイトルと、スキャンした日付とを配列してファイル名とする。 The filename assigning section 330 assigns a file name including the title determined by thetitle determining section 320 to the data file of the image data. For example, the filename assigning unit 330 arranges the title determined by thetitle determining unit 320 and the date of scanning to form a file name.

図５は、画像処理システム１における全体動作（Ｓ１０）を説明するフローチャートである。
図５に例示するように、ステップ１００（Ｓ１００）において、スキャナ装置４は、原稿台にセットされた原稿を読取位置までフィードして、原稿から画像を読み取り、読み取られた画像データを画像処理装置２に送信する。
画像処理装置２のコンテンツ抽出部３００（図４）は、スキャナ装置４から受信した画像データから、画像の内容に関する情報を抽出し、抽出された情報に基づいて、画像コンテンツを識別する。抽出される情報は、例えば、ＯＣＲ処理により抽出された、既定の単語の出現頻度、既定の単語の記載位置、単語や文をベクトル表現により意味の数値化（word2vec、sec2vecなど）、文体の種類（～ですます調、～ましょう調、～だ断定調、など）、文字種の含有割合（漢字、かな、カナ、数字、アルファベットなど）、及び、レイアウトや色など画像情報などである。学校関係や地域関係、各趣味関係などでは登場しやすい単語の頻度傾向があり、また、会報や定期的な文書や帳票などでは、特定の位置に同じ単語が記載されることが多く、さらには、各々のコンテンツジャンルにより、文体は揃っていることが多いので、上記の抽出情報によってコンテンツの識別が可能になる。FIG. 5 is a flow chart for explaining the overall operation (S10) in the image processing system 1. As shown in FIG.
As illustrated in FIG. 5, at step 100 (S100), the scanner device 4 feeds the document set on the document platen to the reading position, reads an image from the document, and transfers the read image data to the image processing device. 2.
The content extraction unit 300 (FIG. 4) of the image processing device 2 extracts information about the content of the image from the image data received from the scanner device 4, and identifies the image content based on the extracted information. The information to be extracted includes, for example, the appearance frequency of predetermined words extracted by OCR processing, the description position of predetermined words, the digitization of the meaning of words and sentences by vector representation (word2vec, sec2vec, etc.), the type of writing style. (~ desu masu tone, ~ masho tone, ~ da da jitsu tone, etc.), content ratio of character types (kanji, kana, kana, numbers, alphabet, etc.), and image information such as layout and color. There is a tendency for the frequency of words that are likely to appear in school-related, community-related, and hobby-related topics. Since writing styles are often the same depending on each content genre, it is possible to identify the content based on the extracted information.

ステップ１０５（Ｓ１０５）において、ロジック選択部３１０は、コンテンツ抽出部３００により識別されたコンテンツに応じて、タイトルを決定するための決定ロジックを選択する。例えば、図６に例示するように、コンテンツ分類（ビジネス文書、地域行政文書、子供学校関係文書など）に応じた決定ロジックが用意されている。例えば、図６のビジネス文書用決定ロジック、地域行政文書用決定ロジック、及び、子供学校関係用決定ロジックは、コンテンツの識別が成功した場合に、適用されるものであり、カテゴリ選択型決定ロジックは、コンテンツの識別ができなかった場合に、適用されるものであり、画像の内容に基づいて複数のカテゴリー名の中から、いずれかのカテゴリーを選択し、選択されたカテゴリーに対応する領域の文字列をタイトルとするロジックである。 At step 105 (S105), thelogic selection unit 310 selects the determination logic for determining the title according to the content identified by thecontent extraction unit 300. FIG. For example, as exemplified in FIG. 6, determination logic is prepared according to content classification (business documents, local government documents, children's school-related documents, etc.). For example, the decision logic for business documents, the decision logic for regional administrative documents, and the decision logic for children's school relations in FIG. , applied when the content could not be identified, select one of the multiple category names based on the content of the image, and select the characters in the area corresponding to the selected category This is the logic that the column is the title.

ステップ１１０（Ｓ１１０）において、タイトル決定部３２０は、ロジック選択部３１０により選択された決定ロジックに基づいて、画像データのタイトルを決定する。例えば、タイトル決定部３２０は、決定ロジックにより指定された画像領域の文字列及び特徴を抽出し、抽出された特徴と、決定ロジックで定義された重み付け係数とに基づいて、抽出された各文字列のタイトルらしさを示すスコアを算出する。スコアの算出に用いる特徴は、例えば、抽出された文字列の大きさ絶対値、文字列の大きさ比（周囲の文字サイズとの比）、原稿中の文字列の位置、文字列周囲の余白、文字列に含まれる特定キーワード、文字列の色、及び、文字列の装飾（ボールド、下線付き、枠や飾りでの囲い、等）である。タイトル決定部３２０は、スコアの最も高い文字列をタイトルに決定する。 At step 110 ( S<b>110 ), thetitle determination section 320 determines the title of the image data based on the determination logic selected by thelogic selection section 310 . For example, thetitle determination unit 320 extracts the character strings and features of the image region specified by the decision logic, and based on the extracted features and weighting factors defined by the decision logic, each extracted character string Calculate a score that indicates the title-likeness of the title. The features used to calculate the score are, for example, the absolute value of the size of the extracted character string, the size ratio of the character string (ratio to the surrounding character size), the position of the character string in the document, and the margin around the character string. , a specific keyword contained in the character string, the color of the character string, and the decoration of the character string (bold, underlined, surrounded by a frame or decoration, etc.). Thetitle determination unit 320 determines the character string with the highest score as the title.

ステップ１１５（Ｓ１１５）において、ファイル名付与部３３０は、タイトル決定部３２０により決定されたタイトルと、スキャンした日付とを配列してファイル名とし、スキャナ装置４から受信した画像データのデータファイルに、ファイル名を自動付与する。 At step 115 (S115), the filename assigning unit 330 arranges the title determined by thetitle determining unit 320 and the date of scanning to form a file name, and stores the data file of the image data received from the scanner device 4 as follows: Automatically assign a file name.

以上説明したように、本実施形態の画像処理システム１によれば、スキャナ装置４によりスキャンされた画像データについて、画像の内容に応じて決定ロジックを選択し、選択された決定ロジックを用いて画像データからタイトルを抽出する。これにより、文書の分類に適した決定ロジックでタイトルを抽出できるため、タイトル抽出の精度向上が期待できる。その結果、スキャン画像に対して適切なファイル名が自動的に付与され、ユーザが文書ファイルを探す際の効率が上がる。 As described above, according to the image processing system 1 of the present embodiment, for image data scanned by the scanner device 4, a determination logic is selected according to the content of the image, and the image is processed using the selected determination logic. Extract titles from data. As a result, the title can be extracted with a decision logic suitable for document classification, so an improvement in the accuracy of title extraction can be expected. As a result, an appropriate file name is automatically assigned to the scanned image, and the user's efficiency in searching for the document file increases.

（コンテンツ識別処理の変形例）
次に、上記実施形態の変形例を説明する。まず、コンテンツ抽出部３００によるコンテンツ識別処理の変形例を説明する。
コンテンツ抽出部３００は、機械学習の学習モデルを用いて、スキャン画像のコンテンツを識別してもよい。すなわち、コンテンツ抽出部３００は、画像データから特徴を抽出し、抽出された特徴と、学習モデルとに基づいて、コンテンツを識別する。学習モデルは、例えば、例えば、ナイーブベイズ、ロジスティック回帰、SVM (Support Vector Machine)、又は、ランダムフォレストなどであり、複数のサンプル原稿を用意して、予め学習モデルを用いて識別の特徴境界面を算出しておく。(Modified example of content identification processing)
Next, a modification of the above embodiment will be described. First, a modified example of content identification processing by thecontent extraction unit 300 will be described.
Thecontent extractor 300 may use machine learning learning models to identify the content of the scanned image. That is, thecontent extraction unit 300 extracts features from image data, and identifies content based on the extracted features and the learning model. The learning model is, for example, Naive Bayes, Logistic Regression, SVM (Support Vector Machine), or Random Forest. Calculate.

また、コンテンツ抽出部３００は、上記の方法と、BERT（Bidirectional Encoder Representations from Transformers）やトピックモデルなどの高度な言語処理によるコンテンツ分類、文字列の位置やサイズ、罫線などのレイアウト情報でのコンテンツ分類、又は、写真やイラストから物体認識でのコンテンツ分類を組み合わせてもよい。より高精度なコンテンツ識別が期待できる。
例えば、写真やイラストの物体認識を行う場合、パスタ、ステーキ、オムライスなどが認識された場合に、料理系コンテンツであり、犬、猫、熱帯魚、カメレオンなどが認識された場合に、ペット系のコンテンツであり、テント、焚火、寝袋、ランタンなどが認識された場合に、キャンプ系コンテンツであり、フェラーリ、コルベット、ホンダNSXなどが認識された場合に、スポーツカー系のコンテンツであると分類できる。In addition, thecontent extraction unit 300 performs content classification by the above method, advanced language processing such as BERT (Bidirectional Encoder Representations from Transformers) and topic models, and content classification by layout information such as the position and size of character strings and ruled lines. Or you may combine the content classification by object recognition from a photograph or an illustration. Content identification with higher accuracy can be expected.
For example, when recognizing objects in photos and illustrations, pasta, steak, omelet rice, etc. are recognized as food-related content, and dogs, cats, tropical fish, chameleons, etc. are recognized as pet-related content. When tents, bonfires, sleeping bags, lanterns, etc. are recognized, it can be classified as camping content, and when Ferrari, Corvette, Honda NSX, etc. are recognized, it can be classified as sports car content.

また、コンテンツ抽出部３００は、画像データにおける色分布やレイアウトなどの画像特徴に基づいてコンテンツを分類してもよい。画像特徴としては、例えば、原稿上の色分布特徴（画像を小領域に分割し、領域ごとの使用色数をカウントしたもの）、色ごと画素数の分布特徴（色ごとに、記載された画素数をカウントしたもの）、文字位置の分布特徴（画像を小領域に分割し、領域ごとの文字数をカウントしたもの）、文字サイズの分布特徴（文字サイズごとに、記載された文字数をカウントしたもの）、文字列の行方向特徴（文字列の縦横方向を分け、それぞれの文字数をカウントしたもの）、又は、多段組の特徴（ひと固まりの文字列から、段組み数を推定したもの）などがある。 Also, thecontent extraction section 300 may classify the content based on image features such as color distribution and layout in the image data. Image features include, for example, color distribution features on a document (image is divided into small regions and the number of colors used in each region is counted), pixel number distribution features for each color (pixels character position distribution feature (image is divided into small areas and the number of characters in each area is counted), character size distribution feature (counts the number of characters in each character size) ), line direction feature of the character string (dividing the character string vertically and horizontally and counting the number of characters in each), or multi-column feature (estimating the number of columns from a block of character strings), etc. be.

（タイトル決定処理の変形例）
次に、タイトル決定部３２０によるタイトル決定処理の変形例を説明する。
タイトル決定部３２０は、機械学習も学習モデルを用いて、タイトルを決定してもよい。例えば、タイトル決定部３２０は、機械学習を用いてスコア計算を行う。機械学習に使うモデルは、例えば、ナイーブベイズ、ロジスティック回帰、SVM、ランダムフォレストなどである。複数のサンプル原稿を用意し、上記学習モデルを用いて原稿中の各文字列がタイトルか否かを判定する判定器を作成する。そのタイトル判定器は、タイトルか否かの判定結果と、タイトルらしさのスコアを出力する。タイトル抽出時には、タイトル決定部３２０が、原稿中の複数の各文字列に対して、上記タイトル判定器を使用しタイトルか否かを判定し、タイトルと判定された文字列の中から最もタイトルらしさのスコアが高かった文字列をタイトルとして出力する。
また、モデルの学習は、コンテンツの分類ごとに行う。コンテンツごとの学習は、上記特徴量の重みパラメータをコンテンツごとに調整することを意味する。コンテンツの分類ごとに複数のサンプル原稿を用意し、上記学習モデルを用いてタイトルか否かを学習させる。結果、コンテンツの分類ごとに最適化されたタイトル抽出器が仕上がる。(Modified example of title determination processing)
Next, a modified example of title determination processing by thetitle determination unit 320 will be described.
Thetitle determination unit 320 may determine the title using machine learning or a learning model. For example, thetitle determination unit 320 performs score calculation using machine learning. Models used in machine learning include Naive Bayes, Logistic Regression, SVM, Random Forest, etc. A plurality of sample manuscripts are prepared, and using the above learning model, a judgment device is created that judges whether each character string in the manuscript is a title or not. The title determiner outputs the result of determination as to whether or not the title is a title and the score of title-likeness. At the time of title extraction, thetitle determining unit 320 uses the above-mentioned title determiner to determine whether each character string in the document is a title or not, and selects the most likely title from among the character strings determined to be titles. output the character string with the highest score as the title.
In addition, model learning is performed for each content classification. Learning for each content means adjusting the weight parameter of the feature amount for each content. A plurality of sample manuscripts are prepared for each content classification, and the learning model is used to learn whether or not it is a title. The result is a title extractor optimized for each content classification.

タイトル決定部３２０及びファイル名付与部３３０は、コンテンツごとに抽出およびタイトル生成エンジンを切り替えてもよい。タイトル決定部３２０及びファイル名付与部３３０は、コンテンツごとに適したタイトル抽出ロジックを適用し、抽出エンジンを切り替える。具体的には、ビジネス文書ではレイアウトベースのタイトル抽出エンジンを使用する。請求書の場合、タイトル決定部３２０及びファイル名付与部３３０は、請求元会社名や支払期限など項目抽出ベースのタイトル抽出エンジンを使用し、抽出された項目値を連結しファイル名とする。会報の場合は、タイトル決定部３２０及びファイル名付与部３３０は、レイアウトベースで会報名を抽出した上、項目抽出ベースで発行日や号数を抽出し、それらを連結しファイル名とする。小説書籍などは会報と同様に、タイトル決定部３２０及びファイル名付与部３３０は、レイアウトベースで書籍タイトルを抽出した上、項目抽出ベースで著者や出版社を抽出し、それらを連結しファイル名とする。 Thetitle determination unit 320 and the filename assignment unit 330 may switch the extraction and title generation engine for each content. Thetitle determination unit 320 and filename assignment unit 330 apply title extraction logic suitable for each content and switch extraction engines. Specifically, business documents use a layout-based title extraction engine. In the case of invoices, thetitle determining unit 320 and the filename assigning unit 330 use a title extraction engine based on extracting items such as the name of the billing company and the payment deadline, and concatenate the extracted item values to form a file name. In the case of a bulletin, thetitle determining unit 320 and the filename assigning unit 330 extract the bulletin name based on the layout, extract the issue date and issue number based on the item extraction, and concatenate them to form a file name. For novels and the like, thetitle determining unit 320 and the filename assigning unit 330 extract the book title based on the layout, extract the author and publisher based on the item extraction, and concatenate them to create a file name. do.

（その他の変形例）
上記実施形態では、タイトル決定部３２０により決定されたタイトルをファイル名の一部として利用する形態を説明したが、これに限定されるものではなく、例えば、検索キーワードとして利用してもよい。(Other modifications)
In the above embodiment, the title determined by thetitle determining unit 320 is used as part of the file name, but the title is not limited to this, and may be used as a search keyword, for example.

１…画像処理システム
２…画像処理装置
３…画像処理プログラム
４…スキャナ装置
３００…コンテンツ抽出部
３１０…ロジック選択部
３２０…タイトル決定部
３３０…ファイル名付与部REFERENCE SIGNS LIST 1 image processing system 2 image processing device 3 image processing program 4scanner device 300content extraction unit 310logic selection unit 320title determination unit 330 file name assignment unit

Claims

Translated fromJapanese

光学的に読み取られた画像データから、画像の内容に関する情報を抽出するコンテンツ抽出部と、
前記コンテンツ抽出部により抽出された情報に応じて、タイトルを決定するための決定ロジックを選択するロジック選択部と、
前記ロジック選択部により選択された決定ロジックに基づいて、前記画像データのタイトルを決定するタイトル決定部と
を有する画像処理装置。a content extraction unit that extracts information about the content of an image from optically read image data;
a logic selection unit that selects a determination logic for determining a title according to the information extracted by the content extraction unit;
an image processing apparatus comprising: a title determination unit that determines the title of the image data based on the determination logic selected by the logic selection unit.

前記決定ロジックは、前記画像の内容からタイトルを抽出するためのロジックであり、
前記タイトル決定部は、選択された前記決定ロジックに従って、前記画像データから、タイトルの要素を抽出する
請求項１に記載の画像処理装置。the decision logic is logic for extracting a title from the content of the image;
The image processing device according to claim 1, wherein the title determination unit extracts title elements from the image data according to the selected determination logic.

前記コンテンツ抽出部は、前記画像データから、単語に関する情報、文体、文字種、文字列の外観に関する情報、色の分布、及び、文字サイズの分布、の少なくとも一つを抽出し、
前記ロジック選択部は、前記コンテンツ抽出部により抽出された、単語に関する情報、文体、文字種、文字列の外観に関する情報、色の分布、及び、文字サイズの分布、の少なくとも一つに基づいて、決定ロジックを選択する
請求項２に記載の画像処理装置。The content extraction unit extracts at least one of information on words, style of writing, types of characters, information on the appearance of character strings, color distribution, and character size distribution from the image data,
The logic selection unit makes decisions based on at least one of information on words, style of writing, type of characters, information on appearance of character strings, color distribution, and character size distribution extracted by the content extraction unit. The image processing apparatus according to claim 2, wherein logic is selected.

前記コンテンツ抽出部は、前記画像データから、既定の単語が出現する出現頻度、又は、既定の単語が出現した出現位置を抽出し、
前記ロジック選択部は、前記コンテンツ抽出部により抽出された、既定の単語が出現する出現頻度、又は、既定の単語が出現した出現位置に基づいて、決定ロジックを選択する
請求項３に記載の画像処理装置。The content extraction unit extracts, from the image data, the frequency of appearance of predetermined words or the appearance position of predetermined words,
4. The image according to claim 3, wherein the logic selection unit selects the decision logic based on the appearance frequency of the predetermined word or the appearance position of the predetermined word extracted by the content extraction unit. processing equipment.

前記ロジック選択部は、既定の単語が出現する出現頻度、及び、既定の単語が出現した出現位置を入力データとした機械学習モデルを用いて、決定ロジックを選択する
請求項４に記載の画像処理装置。5. The image processing according to claim 4, wherein the logic selection unit selects the decision logic using a machine learning model whose input data is the appearance frequency of a predetermined word and the appearance position of the predetermined word. Device.

前記タイトル決定部により決定されたタイトルを含むファイル名を、前記画像データのデータファイルに付与するファイル名付与部
をさらに有する請求項５に記載の画像処理装置。6. The image processing apparatus according to claim 5, further comprising a file name assigning unit that assigns a file name including the title determined by the title determining unit to the data file of the image data.

前記決定ロジックには、画像の内容に基づいて複数のカテゴリー名の中から、採用するカテゴリーを選択する決定ロジックが含まれている
請求項６に記載の画像処理装置。7. The image processing apparatus according to claim 6, wherein said decision logic includes decision logic for selecting a category to adopt from among a plurality of category names based on image content.

光学的に読み取られた画像データから、画像の内容に関する情報を抽出するコンテンツ抽出ステップと、
前記コンテンツ抽出ステップにより抽出された情報に応じて、タイトルを決定するための決定ロジックを選択するロジック選択ステップと、
前記ロジック選択ステップにより選択された決定ロジックに基づいて、前記画像データのタイトルを決定するタイトル決定ステップと
を有する画像処理方法。a content extraction step of extracting information about the content of the image from the optically read image data;
a logic selection step of selecting a determination logic for determining a title according to the information extracted by the content extraction step;
and a title determination step of determining a title of the image data based on the determination logic selected by the logic selection step.

光学的に読み取られた画像データから、画像の内容に関する情報を抽出するコンテンツ抽出ステップと、
前記コンテンツ抽出ステップにより抽出された情報に応じて、タイトルを決定するための決定ロジックを選択するロジック選択ステップと、
前記ロジック選択ステップにより選択された決定ロジックに基づいて、前記画像データのタイトルを決定するタイトル決定ステップと
をコンピュータに実行させるプログラム。a content extraction step of extracting information about the content of the image from the optically read image data;
a logic selection step of selecting a determination logic for determining a title according to the information extracted by the content extraction step;
and a title determination step of determining the title of the image data based on the determination logic selected by the logic selection step.