Movatterモバイル変換


[0]ホーム

URL:


loading
PapersPapers/2022PapersPapers/2022

Scitepress Logo

The Search is performed on all of the following fields:

Note: Please use complete words only.
  • Publication Title
  • Abstract
  • Publication Keywords
  • DOI
  • Proceeding Title
  • Proceeding Foreword
  • ISBN (Completed)
  • Insticc Ontology
  • Author Affiliation
  • Author Name
  • Editor Name
If you already have a Primoris Account you can use the same username/password here.
Research.Publish.Connect.

The Search is performed on all of the following fields:

Note: Please use complete words only.
  • Publication Title
  • Abstract
  • Publication Keywords
  • DOI
  • Proceeding Title
  • Proceeding Foreword
  • ISBN (Completed)
  • Insticc Ontology
  • Author Affiliation
  • Author Name
  • Editor Name
If you're looking for an exact phrase use quotation marks on text fields.

Paper

Paper Unlock

Authors:Anirban Chakraborty1;Kripabandhu Ghosh1 andUtpal Roy2

Affiliations:1Indian Statistical Institute, India;2Visva-Bharati, India

Keyword(s):Erroneous Text, Cooccurrence, Pointwise Mutual Information.

RelatedOntology Subjects/Areas/Topics:Artificial Intelligence ;Clustering and Classification Methods ;Knowledge Discovery and Information Retrieval ;Knowledge-Based Systems ;Symbolic Systems

Abstract:OCR errors hurt retrieval performance to a great extent. Research has been done on modelling and correctionof OCR errors. However, most of the existing systems use language dependent resources or training textsfor studying the nature of errors. Not much research has been reported on improving retrieval performancefrom erroneous text when no training data is available. We propose an algorithm of detecting OCR errors andimproving retrieval performance from the erroneous corpus. We present two versions of the algorithm: onebased on word cooccurrence and the other based on Pointwise Mutual Information. Our algorithm does notuse any training data or any language specific resources like thesaurus. It also does not use any knowledgeabout the language except that the word delimiter is a blank space. We have tested our algorithm on erroneousBangla FIRE collection and obtained significant improvements.

Full Text

Download
Please type the code

CC BY-NC-ND 4.0

Sign In

Guests can use SciTePress Digital Library without having a SciTePress account. However, guests have limited access to downloading full text versions of papers and no access to special options.
Guests can use SciTePress Digital Library without having a SciTePress account. However, guests have limited access to downloading full text versions of papers and no access to special options.
Guest:Register as new SciTePress user now for free.

Sign In

Download limit per month - 500 recent papers or 4000 papers more than 2 years old.
SciTePress user: please login.

PDF ImageMy Papers

PopUp Banner

Unable to see papers previously downloaded, because you haven't logged in as SciTePress Member.

If you are already a member please login.
You are not signed in, therefore limits apply to your IP address 153.126.140.213

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total
Popup Banner

PDF ButtonFull Text

Download
Please type the code

Paper citation in several formats:
Chakraborty, A., Ghosh, K. and Roy, U. (2014).A Word Association Based Approach for Improving Retrieval Performance from Noisy OCRed Text. InProceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2014) - KDIR; ISBN 978-989-758-048-2; ISSN 2184-3228, SciTePress, pages 450-456. DOI: 10.5220/0005157304500456

@conference{kdir14,
author={Anirban Chakraborty and Kripabandhu Ghosh and Utpal Roy},
title={A Word Association Based Approach for Improving Retrieval Performance from Noisy OCRed Text},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2014) - KDIR},
year={2014},
pages={450-456},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005157304500456},
isbn={978-989-758-048-2},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2014) - KDIR
TI - A Word Association Based Approach for Improving Retrieval Performance from Noisy OCRed Text
SN - 978-989-758-048-2
IS - 2184-3228
AU - Chakraborty, A.
AU - Ghosh, K.
AU - Roy, U.
PY - 2014
SP - 450
EP - 456
DO - 10.5220/0005157304500456
PB - SciTePress

    - Science and Technology Publications, Lda.
    RESOURCES

    Proceedings

    Papers

    Authors

    Ontology

    CONTACTS

    Science and Technology Publications, Lda
    Avenida de S. Francisco Xavier, Lote 7 Cv. C,
    2900-616 Setúbal, Portugal.

    Phone: +351 265 520 185(National fixed network call)
    Fax: +351 265 520 186
    Email:info@scitepress.org

    EXTERNAL LINKS

    PRIMORIS

    INSTICC

    SCITEVENTS

    CROSSREF

    PROCEEDINGS SUBMITTED FOR INDEXATION BY:

    dblp

    Ei Compendex

    SCOPUS

    Semantic Scholar

    Google Scholar

    Microsoft Academic


    [8]
    ページ先頭

    ©2009-2025 Movatter.jp