Movatterモバイル変換


[0]ホーム

URL:


Information and Media Technologies
Online ISSN : 1881-0896
ISSN-L : 1881-0896
Media (processing) and Interaction
A Web Corpus and Word Sketches for Japanese
Irena Srdanovic´ ErjavecTomaz ErjavecAdam Kilgarriff
Author information
  • Irena Srdanovic´ Erjavec

    Tokyo Institute of Technology

  • Tomaz Erjavec

    Jozef Stefan Institute

  • Adam Kilgarriff

    Lexical Computing Ltd. and Universities of Leeds and Sussex

Corresponding author

ORCID
Keywords:Japanese web corpus,Corpus query tool,Sketch Engine,Word sketches
JOURNALFREE ACCESS

2008 Volume 3Issue 3Pages 529-551

DOIhttps://doi.org/10.11185/imt.3.529
Details
  • Published: 2008Received: July 03, 2007Available on J-STAGE: September 15, 2008Accepted: -Advance online publication: -Revised: -
Download PDF(562K)
Download citationRIS

(compatible with EndNote, Reference Manager, ProCite, RefWorks)

BIB TEX

(compatible with BibDesk, LaTeX)

Text
How to download citation
Contact us
Article overview
Share
Abstract
Of all the major world languages, Japanese is lagging behind in terms of publicly accessible and searchable corpora. In this paper we describe the development of JpWaC (Japanese Web as Corpus), a large corpus of 400 million words of Japanese web text, and its encoding for the Sketch Engine. The Sketch Engine is a web-based corpus query tool that supports fast concordancing, grammatical processing, ‘word sketching’ (one-page summaries of a word's grammatical and collocational behaviour), a distributional thesaurus, and robot use. We describe the steps taken to gather and process the corpus and to establish its validity, in terms of the kinds of language it contains. We then describe the development of a shallow grammar for Japanese to enable word sketching. We believe that the Japanese web corpus as loaded into the Sketch Engine will be a useful resource for a wide number of Japanese researchers, learners, and NLP developers.
References (32)
Related articles (0)
Figures (0)
Content from these authors
Supplementary material (0)
Result List ()
Cited by (0)
© 2008 by The Association for Natural Language Processing
Previous articleNext article
Favorites & Alerts
Related articles

Recently viewed articles
    Share this page
    feedback
    Top

    Register with J-STAGE for free!

    Register

    Already have an account? Sign inhere


    [8]ページ先頭

    ©2009-2025 Movatter.jp