- Ted Briscoe5,6,
- Karl Harrison5,
- Andrew Naish7,
- Andy Parker5,7,
- Marek Rei5,
- Advaith Siddharthan8,
- David Sinclair7,
- Mark Slater5 &
- …
- Rebecca Watson6
Part of the book series:The Information Retrieval Series ((INRE,volume 29))
Abstract
We describe a novel search engine for scientific literature. The system allows for sentence-level search starting from portable document format (PDF) files, and integrates text and image search, thus, for example, facilitating the retrieval of information present in tables and figures using both image and caption content. In addition, the system allows the user to generate in an intuitive manner complex queries for search terms that are related through particular grammatical (and thus implicitly semantic) relations. Grid processing techniques are used to parallelise the analysis of large numbers of scientific papers. We are currently conducting user evaluations, but here we report some preliminary evaluation and comparison with Google Scholar, demonstrating the potential utility of the novel features. Finally, we discuss future work and the potential and complementarity of the system for patent search.
This is a preview of subscription content,log in via an institution to check access.
Similar content being viewed by others
References
Atterer M, Schutze H (2008) An inverted index for storing and retrieving grammatical dependencies. In: Proceedings of the 6th international conference on language resources and evaluation, Marrakech, Morocco
Briscoe T, Carroll J, Watson R (2006) The second release of the rasp system. In: Proceedings of the COLING/ACL 2006, Sydney, Australia
Britton D, Cass AJ, Clarke PEL, Coles J, Colling DJ, Doyle AT, Geddes NI, Gordon JC, Jones RWL, Kelsey DP et al. (2009) GridPP: the UK grid for particle physics. Philos Trans A 367(1897):2447
Eggel I, Müller H (2010) Indexing the medical open access literature for textual and content-based visual retrieval. Stud Health Technol Inf 160(2):1277–1281
Gasperin C, Briscoe T (2008) Statistical anaphora resolution in biomedical texts. In: Proceedings of the 22nd international conference on computational linguistics, vol 1, pp 257–264
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proc 25th internat conf on very large data bases
Goetz B (2002) The Lucene search engine: powerful, flexible, and free. Javaworldhttp://www.javaworld.com/javaworld/jw-09-2000/jw-0915-lucene.html
Huang X, Hu Q (2009) A Bayesian learning approach to promoting diversity in ranking for biomedical information retrieval. In: Proceedings of SIGIR 2009, Boston, MA. ACM 978-1-60558-483-6/09/07
Jacobs CE, Finkelstein A, Salesin DH (1995) Fast multiresolution image querying. In: Proceedings of the 22nd annual conference on computer graphics and interactive techniques. ACM, New York, pp 277–286
Lewin I, Hollingsworth B, Tidhar D (2005) Retrieving hierarchical text structure from typeset scientific articles: a prerequisite for e-science text mining. In: Proceedings of the 4th UK E-science all hands conference, Nottingham, UK, pp 267–273
McCallum AK (2002) Mallet: A machine learning for language toolkit.http://mallet.cs.umass.edu
Saetre R, Matsuzaki T, Miyao Y, Sagae K, Tsujii J (2009) Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics 25(3):394–400
Tansley S, Hey T, Tolle K (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research, Redmond
Teufel S, Carletta J, Moens M (1999) An annotation scheme for discourse-level argumentation in research articles. In: Proceedings of the 9th conference of the European chapter of the association for computational linguistics (EACL’99), pp 110–117
Vlachos A (2007) Tackling the BioCreative2 gene mention task with conditional random fields and syntactic parsing. In: Proceedings of the second BioCreative challenge evaluation workshop
Voorhees E, Harman K (1999). The seventh text retrieval conference (TREC-7). NIST
Wang S, Hauskrecht M (2010) Effective query expansion with the resistance distance based term similarity metric. In: Proceedings of SIGIR 2010, Geneva, Switzerland. ACM 978-1-60558-896-4/10/07
Acknowledgements
This work was supported in part by a BBSRC e-Science programme grant to the University of Cambridge (FlySlip), and a STFC miniPIPSS grant to the University of Cambridge and iLexIR Ltd (Scalable and Robust Grid-based Text Mining of Scientific Papers). This chapter is an extended version of one which appeared in the proceedings of the annual North American Association for Computational Linguistics conference proceedings, demonstration session, in June 2010.
Author information
Authors and Affiliations
University of Cambridge, Cambridge, UK
Ted Briscoe, Karl Harrison, Andy Parker, Marek Rei & Mark Slater
iLexIR Ltd, Cambridge, UK
Ted Briscoe & Rebecca Watson
Camtology Ltd, Cambridge, UK
Andrew Naish, Andy Parker & David Sinclair
University of Aberdeen, Aberdeen, UK
Advaith Siddharthan
- Ted Briscoe
You can also search for this author inPubMed Google Scholar
- Karl Harrison
You can also search for this author inPubMed Google Scholar
- Andrew Naish
You can also search for this author inPubMed Google Scholar
- Andy Parker
You can also search for this author inPubMed Google Scholar
- Marek Rei
You can also search for this author inPubMed Google Scholar
- Advaith Siddharthan
You can also search for this author inPubMed Google Scholar
- David Sinclair
You can also search for this author inPubMed Google Scholar
- Mark Slater
You can also search for this author inPubMed Google Scholar
- Rebecca Watson
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toTed Briscoe.
Editor information
Editors and Affiliations
Information Retrieval Facility, Donau-City Straße 1, Vienna, 1220, Austria
Mihai Lupu
Information Retrieval Facility, Donau-City Straße 1, Vienna, 1220, Austria
Katja Mayer
Information Retrieval Facility, Donau-City Straße 1, Vienna, 1220, Austria
John Tait
3LP Advisors, Post Rd. 7003, Dublin, 43016, Ohio, USA
Anthony J. Trippe
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Briscoe, T.et al. (2011). Intelligent Information Access from Scientific Papers. In: Lupu, M., Mayer, K., Tait, J., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 29. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19231-9_16
Download citation
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-642-19230-2
Online ISBN:978-3-642-19231-9
eBook Packages:Computer ScienceComputer Science (R0)
Share this chapter
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative