Part of the book series:Theory and Applications of Natural Language Processing ((NLP))
695Accesses
Abstract
There has been a long tradition in the digitization and manual documentation of cultural heritage data, yet the need for indexing and retrieval that goes beyond mere bibliographic information has only recently been recognized. This chapter reports on completed work aimed at highlighting textual cultural resources that, as of yet, remain under-exploited by creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural heritage of the neighboring areas of Greece and Bulgaria and to raise awareness about their common cultural identity, the focus being on literature, folklore and language. To this end, a bilingual collection of literary and folklore texts in Greek and Bulgarian was developed along with a number of accompanying resources. The authors present the methodology adopted for the automatic annotation of the textual data at various levels of linguistic analysis elaborating on the Greek and Bulgarian text processing tools that are integrated in the cross-lingual search and retrieval mechanisms, and discuss issues and problems encountered in the course of the project life-cycle.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 11439
- Price includes VAT (Japan)
- Softcover Book
- JPY 14299
- Price includes VAT (Japan)
- Hardcover Book
- JPY 14299
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aarne, A.: The Types of the Folktale: A Classification and Bibliography., 2nd rev. ed. edn. Suomalainen Tiedeakatemia / FF Communications, Helsinki (1961). Translated and Enlarged by Stith Thompson.
Bontcheva, K., Maynard, D., Cunningham, H., Saggion, H.: Using human language technology for automatic annotation and indexing of digital library content. In: Proc. of the 6th European Conference on Research and Advanced Technology for Digital Libraries.,Lecture Notes In Computer Science, vol. 2458, pp. 613–625 (2002)
Borin, L., Forsberg, M., Kokkinakis, D.: Diabase: Towards a diachronic BLARK in support of historical studies. In: Proc. of LREC (2010)
Borin, L., Kokkinakis, D., Olsson, L.J.: Naming the past: Named entity and animacy recognition in the 19th century swedish literature. In: Proc. of the ACL Workshop: Language Technology for Cultural Heritage Data (LaTeCH.)., pp. 1–8. ACL, Prague (2007)
Boutsis, S., Prokopidis, P., Giouli, V., Piperidis., S.: A robust parser for unrestricted greek text. In: Proc. of the 2nd Language and Resources Evaluation Conference, pp. 467–473. Athens, Greece (2000)
Brill, E.: A corpus-based approach to language learning. Ph.D. thesis, University of Pennsylvania (1997)
Crane, G.: Cultural heritage digital libraries: Needs and components. In: Proc. of the 6th European Conference on Research and Advanced Technology for Digital Libraries.,Lecture Notes In Computer Science, vol. 2458, pp. 51–60 (2002)
Georgantopoulos, B., Piperidis, S.: Term-based identification of sentences for text summarization. In: Proceedings of LREC2000 (2000)
Giouli, V., Konstandinidis, A., Desypri, E., Papageorgiou., H.: Multi-domain multi-lingual named entity recognition: Revisiting & grounding the resources issue. In: Proceedings of LREC 2006 (2006)
IMDI: Metadata elements for session descriptions, version 2.1 (June 2001)
IMDI: Metadata elements for session descriptions, version 3.0.4 (Sept. 2003).http://www.mpi.nl/IMDI/documents/Proposals/IMDI_MetaData_3.0.4.pdf. Accessed 22.01.2007.
Liddy, E.D., Allen, E., Harwell, S., Corieri, S., Yilmazel, O., Ozgencil, N., Diekema, A., McCracken, N., Silverstein, J., Sutton, S.: Automatic metadata generation & evaluation. In: The 25th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pp. 401–402. Tampere, Finland (2002)
Nissim, M., Matheson, C., Reid, J.: Recognizing geographical entities in scottish historical documents. In: Proc. of the Workshop on Geographic Information Retrieval at SIGIR 2004 (2004)
Papageorgiou, H., Cranias, L., Piperidis., S.: Automatic alignment in parallel corpora. In: Proceedings of ACL 1994 (1994)
Papageorgiou, H., Prokopidis, P., Giouli, V., Demiros, I., Konstantinidis, A., Piperidis, S.: Multi-level XML-based corpus annotation. In: Proceedings of the 3nd Language and Resources Evaluation Conference (2002)
Papageorgiou, H., Prokopidis, P., Giouli, V., Piperidis, S.: A unified pos tagging architecture and its application to greek. In: Proceedings of the 2nd Language and Resources Evaluation Conference, pp. 1455–1462. Athens, Greece (2000)
Piperidis, S.: Interactive corpus based translation drafting tool. In: ASLIB Proceedings, vol. 47(3) (1995)
Raptis, S., Spais, I., Tsiakoulis., P.: A tool for enhancing web accessibility: Synthetic speech and content restructuring. In: Proc. HCII 2005: 11th International Conference on Human-Computer Interaction. Las Vegas, Nevada, USA (2005)
Simov, K., Osenova, P.: A hybrid system for MorphoSyntactic disambiguation in Bulgarian. In: Proc. of the RANLP 2001 Conference, pp. 288–290. Tzigov Chark, Bulgaria (2001)
Witte, R., Gitzinger, T., Kappler, T., Krestel, R.: A semantic Wiki approach to cultural heritage data management. In: Language Technology for Cultural Heritage Data (LaTeCH 2008), Workshop at LREC 2008. Marrakech, Morocco (2008)
Acknowledgements
The work presented here was conducted in the framework of a project funded under the Community Initiative Programme INTERREG III A / PHARE CBC Greece – Bulgaria. The project was implemented by the Institute for Language and Speech Processing (ILSP,http://www.ilsp.gr) and a group of researchers from the Bulgarian Academy of Sciences, (http://www.bultreebank.org/).
Author information
Authors and Affiliations
Institute for Language and Speech Processing Epidavrou 6 & Artemidos, 15125, Athens, Greece
Voula Giouli
Institute of Parallel Processing, Bulgarian Academy of Sciences, Acad. G. Bonchev 25A, 1113, Sofia, Bulgaria
Kiril Simov & Petya Osenova
- Voula Giouli
You can also search for this author inPubMed Google Scholar
- Kiril Simov
You can also search for this author inPubMed Google Scholar
- Petya Osenova
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toVoula Giouli.
Editor information
Editors and Affiliations
, Computational Linguistics / MMCI, Saarland University, Saarbrücken, 66041, Germany
Caroline Sporleder
Fac. Humanities, Tilburg University, Tilburg, Netherlands
Antal van den Bosch
Tilburg School for Humanities, Tilburg Center for Cognition and Communi, University of Tilburg, Tilburg, 5000, Netherlands
Kalliopi Zervanou
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Giouli, V., Simov, K., Osenova, P. (2011). A Parallel Greek-Bulgarian Corpus: A Digital Resource of the Shared Cultural Heritage. In: Sporleder, C., van den Bosch, A., Zervanou, K. (eds) Language Technology for Cultural Heritage. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20227-8_6
Download citation
Published:
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-642-20226-1
Online ISBN:978-3-642-20227-8
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative