Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 8505))
Included in the following conference series:
1042Accesses
Abstract
Many applications, like Twitter, Yelp, or Facebook, produce documents that are tagged with geolocations. For example, when a usertweets using Twitter, thetweets are tagged with the user’s location (inferred using the user’s IP address, or mobile GPS). These locations, however, are computed with inherent uncertainty. In such scenarios, it is desired to support search queries that take into account both text relevancy and location proximity. In this paper, we study the problem of text retrieval queries on probabilistic spatial data. We consider top-(\(c\),\(k\)) queries to capture semantics of both textual relevance and probabilistic location proximity. A top-(\(c\),\(k\)) query returns\(k\) tuples which have the highest probability of being in the top-\(c\) query results under the possible world semantics. We propose a framework to answer such queries. Our framework integrates two components: scoring textual similarity based on the query text; and the document text and calculating top-\(c\) confidence based on the probability of the document falling within the query region. We develop an IRTree-based Incremental Scoring Approach (ISA) that returns an iterator over tuples in decreasing order of text similarity. Our parameterized probabilistic ranking algorithm\(PRank^c\), consumes the output of ISA interactively and calculates top-\(c\) confidence of these tuples in linear time. We also provide a heuristic optimization to terminate the\(PRank^c\) algorithm earlier without compromising on result quality. We conduct experiments on real data to show the efficiency of this framework.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 5719
- Price includes VAT (Japan)
- Softcover Book
- JPY 7149
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2004)
Cao, X., Cong, G., Jensen, C.S., Ooi, B.C.: Collective spatial keyword querying. In: Proceedings of ACM Special Interest Group on Management of Data (SIGMOD) (2011)
Chen, Y.-Y., Suel, T., Markowetz, A.: Efficient query processing in geographic web search engines. In: Proceedings of ACM Special Interest Group on Management of Data (SIGMOD) (2006)
Cheng, R., Chen, L., Chen, J., Xie, X.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of the International Conference on Extending Database Technology (EDBT) (2009)
Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) (2009)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) (2007)
De Felipe, I., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: Proceedings of the International Conference on Data Engineering (ICDE) (2008)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of ACM Special Interest Group on Management Of Data (SIGMOD) (2008)
Lian, X., Chen, L.: Probabilistic ranked queries in uncertain databases. In: Proceedings of the International Conference on Extending Database Technology (EDBT) (2008)
Lu, J., Lu, Y., Cong, G.: Reverse spatial and textual k nearest neighbor search. In: Proceedings of ACM Special Interest Group on Management Of Data (SIGMOD) (2011)
Markowetz, A., Chen, Y.Y., Suel, T.: Design and implementation of a geographic search engine. In: International Workshop on the Web and Databases (WebDB) (2005)
McCurley, K.S.: Geospatial mapping and navigation of the web. In: Proceedings of the International Conference on World Wide Web (WWW) (2001)
Qi, Y., Jain, R., Singh, S., Prabhakar, S.: Threshold query optimization for uncertain data. In: Proceedings of ACM Special Interest Group on Management Of Data (SIGMOD) (2010)
Qi, Y., Singh, S., Shah, R., Prabhakar, S.: Indexing probabilistic nearest-neighbor threshold queries. In: Workshop on Management of Uncertain Data (2008)
Sarawagi, S.: Information extraction. Found. Trends Databases1(3), 261–377 (2008)
Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S.E., Neville, J., Cheng, R.: Database support for probabilistic attributes and tuples. In: Proceedings of the International Conference on Data Engineering (ICDE) (2008)
Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Top-k query processing in uncertain databases. In: Proceedings of the International Conference on Data Engineering (ICDE), April 2007
Wing, B.P., Baldridge, J.: Simple supervised document geolocation with geodesic grids. In: Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (2011)
Zhou, Y., Xie, X., Wang, C., Gong, Y., Ma, W.-Y.: Hybrid index structures for location-based web search. In: ACM International Conference on Information and Knowledge Management (2005)
Acknowledgements
The work in this paper was supported by National Science Foundation grants IIS-1017990 and IIS-09168724.
Author information
Feng Gao
Present address: Fudan University, Shanghai, China
Authors and Affiliations
Department of Computer Sciences, Purdue University, West Lafayette, IN, USA
Feng Gao, Rohit Jain, Sunil Prabhakar & Luo Si
- Feng Gao
You can also search for this author inPubMed Google Scholar
- Rohit Jain
You can also search for this author inPubMed Google Scholar
- Sunil Prabhakar
You can also search for this author inPubMed Google Scholar
- Luo Si
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toRohit Jain.
Editor information
Editors and Affiliations
Pohang University of Science and Technology (POSTECH), Pohang, Korea, Republic of (South Korea)
Wook-Shin Han
National University of Singapore, Singapore, Singapore
Mong Li Lee
Udayana University, Badung, Indonesia
Agus Muliantara
Udayana University, Badung, Indonesia
Ngurah Agus Sanjaya
Christian-Albrechts-Universität zu Kiel Institut für Informatik, Kiel, Germany
Bernhard Thalheim
Fudan University, Shanghai, China
Shuigeng Zhou
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, F., Jain, R., Prabhakar, S., Si, L. (2014). ProbKS: Keyword Search on Probabilistic Spatial Data. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_30
Download citation
Published:
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-662-43983-8
Online ISBN:978-3-662-43984-5
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative