Movatterモバイル変換

Part of the book series:Theory and Applications of Natural Language Processing ((NLP))

695Accesses

Abstract

There has been a long tradition in the digitization and manual documentation of cultural heritage data, yet the need for indexing and retrieval that goes beyond mere bibliographic information has only recently been recognized. This chapter reports on completed work aimed at highlighting textual cultural resources that, as of yet, remain under-exploited by creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural heritage of the neighboring areas of Greece and Bulgaria and to raise awareness about their common cultural identity, the focus being on literature, folklore and language. To this end, a bilingual collection of literary and folklore texts in Greek and Bulgarian was developed along with a number of accompanying resources. The authors present the methodology adopted for the automatic annotation of the textual data at various levels of linguistic analysis elaborating on the Greek and Bulgarian text processing tools that are integrated in the cross-lingual search and retrieval mechanisms, and discuss issues and problems encountered in the course of the project life-cycle.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Hardcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Historical corpora meet the digital humanities: the Jerusalem Corpus of Emergent Modern Hebrew

Article04 June 2019

MegaLite-2: An Extended Bilingual Comparative Literary Corpus

Automatic Translation and Multilingual Cultural Heritage Retrieval: A Case Study with Transcriptions in Europeana

References

Aarne, A.: The Types of the Folktale: A Classification and Bibliography., 2nd rev. ed. edn. Suomalainen Tiedeakatemia / FF Communications, Helsinki (1961). Translated and Enlarged by Stith Thompson.
Google Scholar
Bontcheva, K., Maynard, D., Cunningham, H., Saggion, H.: Using human language technology for automatic annotation and indexing of digital library content. In: Proc. of the 6th European Conference on Research and Advanced Technology for Digital Libraries.,Lecture Notes In Computer Science, vol. 2458, pp. 613–625 (2002)
Google Scholar
Borin, L., Forsberg, M., Kokkinakis, D.: Diabase: Towards a diachronic BLARK in support of historical studies. In: Proc. of LREC (2010)
Google Scholar
Borin, L., Kokkinakis, D., Olsson, L.J.: Naming the past: Named entity and animacy recognition in the 19th century swedish literature. In: Proc. of the ACL Workshop: Language Technology for Cultural Heritage Data (LaTeCH.)., pp. 1–8. ACL, Prague (2007)
Google Scholar
Boutsis, S., Prokopidis, P., Giouli, V., Piperidis., S.: A robust parser for unrestricted greek text. In: Proc. of the 2nd Language and Resources Evaluation Conference, pp. 467–473. Athens, Greece (2000)
Google Scholar
Brill, E.: A corpus-based approach to language learning. Ph.D. thesis, University of Pennsylvania (1997)
Google Scholar
Crane, G.: Cultural heritage digital libraries: Needs and components. In: Proc. of the 6th European Conference on Research and Advanced Technology for Digital Libraries.,Lecture Notes In Computer Science, vol. 2458, pp. 51–60 (2002)
Google Scholar
Georgantopoulos, B., Piperidis, S.: Term-based identification of sentences for text summarization. In: Proceedings of LREC2000 (2000)
Google Scholar
Giouli, V., Konstandinidis, A., Desypri, E., Papageorgiou., H.: Multi-domain multi-lingual named entity recognition: Revisiting & grounding the resources issue. In: Proceedings of LREC 2006 (2006)
Google Scholar
IMDI: Metadata elements for session descriptions, version 2.1 (June 2001)
Google Scholar
IMDI: Metadata elements for session descriptions, version 3.0.4 (Sept. 2003).http://www.mpi.nl/IMDI/documents/Proposals/IMDI_MetaData_3.0.4.pdf. Accessed 22.01.2007.
Liddy, E.D., Allen, E., Harwell, S., Corieri, S., Yilmazel, O., Ozgencil, N., Diekema, A., McCracken, N., Silverstein, J., Sutton, S.: Automatic metadata generation & evaluation. In: The 25th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pp. 401–402. Tampere, Finland (2002)
Google Scholar
Nissim, M., Matheson, C., Reid, J.: Recognizing geographical entities in scottish historical documents. In: Proc. of the Workshop on Geographic Information Retrieval at SIGIR 2004 (2004)
Google Scholar
Papageorgiou, H., Cranias, L., Piperidis., S.: Automatic alignment in parallel corpora. In: Proceedings of ACL 1994 (1994)
Google Scholar
Papageorgiou, H., Prokopidis, P., Giouli, V., Demiros, I., Konstantinidis, A., Piperidis, S.: Multi-level XML-based corpus annotation. In: Proceedings of the 3nd Language and Resources Evaluation Conference (2002)
Google Scholar
Papageorgiou, H., Prokopidis, P., Giouli, V., Piperidis, S.: A unified pos tagging architecture and its application to greek. In: Proceedings of the 2nd Language and Resources Evaluation Conference, pp. 1455–1462. Athens, Greece (2000)
Google Scholar
Piperidis, S.: Interactive corpus based translation drafting tool. In: ASLIB Proceedings, vol. 47(3) (1995)
Google Scholar
Raptis, S., Spais, I., Tsiakoulis., P.: A tool for enhancing web accessibility: Synthetic speech and content restructuring. In: Proc. HCII 2005: 11th International Conference on Human-Computer Interaction. Las Vegas, Nevada, USA (2005)
Google Scholar
Simov, K., Osenova, P.: A hybrid system for MorphoSyntactic disambiguation in Bulgarian. In: Proc. of the RANLP 2001 Conference, pp. 288–290. Tzigov Chark, Bulgaria (2001)
Google Scholar
Witte, R., Gitzinger, T., Kappler, T., Krestel, R.: A semantic Wiki approach to cultural heritage data management. In: Language Technology for Cultural Heritage Data (LaTeCH 2008), Workshop at LREC 2008. Marrakech, Morocco (2008)
Google Scholar

Download references

Acknowledgements

The work presented here was conducted in the framework of a project funded under the Community Initiative Programme INTERREG III A / PHARE CBC Greece – Bulgaria. The project was implemented by the Institute for Language and Speech Processing (ILSP,http://www.ilsp.gr) and a group of researchers from the Bulgarian Academy of Sciences, (http://www.bultreebank.org/).

Author information

Authors and Affiliations

Institute for Language and Speech Processing Epidavrou 6 & Artemidos, 15125, Athens, Greece
Voula Giouli
Institute of Parallel Processing, Bulgarian Academy of Sciences, Acad. G. Bonchev 25A, 1113, Sofia, Bulgaria
Kiril Simov & Petya Osenova

Authors

Voula Giouli
View author publications
You can also search for this author inPubMed Google Scholar
Kiril Simov
View author publications
You can also search for this author inPubMed Google Scholar
Petya Osenova
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toVoula Giouli.

Editor information

Editors and Affiliations

, Computational Linguistics / MMCI, Saarland University, Saarbrücken, 66041, Germany
Caroline Sporleder
Fac. Humanities, Tilburg University, Tilburg, Netherlands
Antal van den Bosch
Tilburg School for Humanities, Tilburg Center for Cognition and Communi, University of Tilburg, Tilburg, 5000, Netherlands
Kalliopi Zervanou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Giouli, V., Simov, K., Osenova, P. (2011). A Parallel Greek-Bulgarian Corpus: A Digital Resource of the Shared Cultural Heritage. In: Sporleder, C., van den Bosch, A., Zervanou, K. (eds) Language Technology for Cultural Heritage. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20227-8_6

Download citation

DOI:https://doi.org/10.1007/978-3-642-20227-8_6
Published:26 April 2011
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-642-20226-1
Online ISBN:978-3-642-20227-8
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Movatterモバイル変換

A Parallel Greek-Bulgarian Corpus: A Digital Resource of the Shared Cultural Heritage

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Historical corpora meet the digital humanities: the Jerusalem Corpus of Emergent Modern Hebrew

MegaLite-2: An Extended Bilingual Comparative Literary Corpus

Automatic Translation and Multilingual Cultural Heritage Retrieval: A Case Study with Transcriptions in Europeana

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Access this chapter

Subscribe and save

Buy Now