1169Accesses
37Citations
Abstract
Massive amounts of data that contain spatial, textual, and temporal information are being generated at a rapid pace. With streams of such data, which includes check-ins and geo-tagged tweets, available, users may be interested in being kept up-to-date on which terms are popular in the streams in a particular region of space. To enable this functionality, we aim at efficiently processing two types of general top-k term subscriptions over streams of spatio-temporal documents: region-based top-k spatial-temporal term (RST) subscriptions and similarity-based top-k spatio-temporal term (SST) subscriptions.RST subscriptions continuously maintain the top-k most popular trending terms within a user-defined region.SST subscriptions free users from defining a region and maintain top-k locally popular terms based on a ranking function that combines term frequency, term recency, and term proximity. To solve the problem, we propose solutions that are capable of supporting real-life location-based publish/subscribe applications that process large numbers ofSST andRST subscriptions over a realistic stream of spatio-temporal documents. The performance of our proposed solutions is studied in extensive experiments using two spatio-temporal datasets.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.






























Similar content being viewed by others
Notes
We say a subscription matches a term if the term is a top-k result of the subscription.
The complexity isO(1) when theSF score is computed based on Euclidean distance or network distance with pre-computation of pair distances.
We do not index\(d_4\) because it does not contain\(w_3\).
ParameterM is set to 16–128 in experiments (cf. Sect. 5).
The performance discrepancy between baseline and TS is negligible whenk is small. Thus, we only report the result of baseline when varyingk.
References
Abdelhaq, H., Gertz, M.: On the locality of keywords in twitter streams. In: IWGS, pp. 12–20 (2014)
Abdelhaq, H., Gertz, M., Armiti, A.: Efficient online extraction of keywords for localized events in twitter. GeoInformatica21(2), 365–388 (2017)
Ahmed, P., Hasan, M., Kashyap, A., Hristidis, V., Tsotras, V.J.: Efficient computation of top-k frequent terms over spatio-temporal ranges. In: SIGMOD, pp. 1227–1241 (2017)
Altinel, M., Franklin, M.J.: Efficient filtering of xml documents for selective dissemination of information. In: VLDB, pp. 53–64 (2000)
Amati, G., Amodeo, G., Gaibisso, C.: Survival analysis for freshness in microblogging search. In: CIKM, pp. 2483–2486. ACM, New York (2012)
Anick, P.G.: Using terminological feedback for web search refinement: a log-based study. In: SIGIR, pp. 88–95 (2003)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)
Cao, X., Chen, L., Cong, G., Xiao, X.: Keyword-aware optimal route search. PVLDB5(11), 1136–1147 (2012)
Chen, L., Cong, G.: Diversity-aware top-k publish/subscribe for text stream. In: SIGMOD, pp. 347–362 (2015)
Chen, L., Cong, G., Cao, X.: An efficient query indexing mechanism for filtering geo-textual data. In: SIGMOD, pp. 749–760 (2013)
Chen, L., Cong, G., Cao, X., Tan, K.: Temporal spatial-keyword top-k publish/subscribe. In: ICDE, pp. 255–266 (2015)
Chen, L., Shang, S.: Approximate spatio-temporal top-k publish/subscribe. World Wide Web22(5), 2153–2175 (2019)
Chen, L., Shang, S.: Region-based message exploration over spatio-temporal data streams. In: AAAI, pp. 873–880 (2019)
Chen, L., Shang, S., Jensen, C.S., Yao, B., Zhang, Z., Shao, L.: Effective and efficient reuse of past travel behavior for route recommendation. In: KDD, pp. 488–498 (2019)
Chen, L., Shang, S., Yang, C., Li, J.: Spatial keyword search: a survey. GeoInformatica24(1), 85–106 (2020)
Chen, L., Shang, S., Yao, B., Zheng, K.: Spatio-temporal top-k term search over sliding window. World Wide Web22(5), 1953–1970 (2019)
Chen, L., Shang, S., Zhang, Z., Cao, X., Jensen, C.S., Kalnis, P.: Location-aware top-k term publish/subscribe. In: ICDE, pp. 749–760 (2018)
Chen, L., Shang, S., Zheng, K., Kalnis, P.: Cluster-based subscription matching for geo-textual data streams. In: ICDE, pp. 890–901 (2019)
Chen, Z., Cong, G., Zhang, Z., Fuz, T.Z., Chen, L.: Distributed publish/subscribe query processing on the spatio-textual data stream. In: ICDE, pp. 1095–1106 (2017)
Diao, Y., Fischer, P.M., Franklin, M.J., Yfilter, R. To.: Efficient and scalable filtering of XML documents. In: ICDE, pp. 341–342 (2002)
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math.1, 269–271 (1959)
Efron, M., Golovchinsky, G.: Estimation methods for ranking recent information. In: SIGIR, pp. 495–504. ACM, New York (2011)
Farzindar, A., Khreich, W.: A survey of techniques for event detection in twitter. Comput. Intell.31(1), 132–164 (2015)
Guo, D., Zhu, Y., Xu, W., Shang, S., Ding, Z.: How to find appropriate automobile exhibition halls: towards a personalized recommendation service for auto show. Neurocomputing213, 95–101 (2016)
Guo, L., Zhang, D., Li, G., Tan, K., Bao, Z.: Location-aware pub/sub system: When continuous moving queries meet dynamic event streams. In: SIGMOD, pp. 843–857 (2015)
Haghani, P., Michel, S., Aberer, K.: The gist of everything new: Personalized top-k processing over web 2.0 streams. In: CIKM, pp. 489–498 (2010)
He, Q., Chang, K., Lim, E., Zhang, J.: Bursty feature representation for clustering text streams. In: SDM, pp. 491–496, (2007)
Hu, H., Liu, Y., Li, G., Feng, J., Tan, K.: A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In: ICDE, pp. 711–722 (2015)
Hu, J., Cheng, R., Wu, D., Jin, B.: Efficient top-k subscription matching for location-aware publish/subscribe. In: SSTD, pp. 333–351 (2015)
Hu, M., Liu, S., Wei, F., Wu, Y., Stasko, J.T., Ma, K.: Breaking news on twitter. In: CHI Conference on Human Factors in Computing Systems, CHI ’12, Austin, TX, USA–May 05–10, 2012, pp. 2751–2754 (2012)
Jonathan, C., Magdy, A., Mokbel, M.F., Jonathan, A.: GARNET: A holistic system approach for trending queries in microblogs. In: ICDE, pp. 1251–1262 (2016)
Kwak, H., Lee, C., Park, H., Moon, S.B.: What is twitter, a social network or a news media? In: WWW, pp. 591–600 (2010)
Li, G., Wang, Y., Wang, T., Feng, J.: Location-aware publish/subscribe. In: KDD, pp. 802–810 (2013)
Li, X., Croft, W.B.: Time-based language models. In: CIKM, pp. 469–475. ACM, New York (2003)
Liang, H., Xu, Y., Tjondronegoro, D., Christen, P.: Time-aware topic recommendation based on micro-blogs. In: CIKM, pp. 1657–1661 (2012)
Magdy, A., Abdelhafeez, L., Kang, Y., Ong, E., Mokbel, M.F.: Microblogs data management: a survey. VLDB J. pp. 1–40 (2019)
Magdy, A., Aly, A.M., Mokbel, M.F., Elnikety, S., He, Y., Nath, S., Aref. W.G.: Spatial trending queries on real-time microblogs. In: SIGSPATIAL, pp. 7:1–7:10 (2016)
Mahmood, A.R., Aly, A.M., Aref. W.G.: FAST: frequency-aware indexing for spatio-textual data streams. In: ICDE, pp. 305–316 (2018)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB, pp. 346–357 (2002)
Mathioudakis, M., Bansal, N., Koudas, N.: Identifying, attributing and describing spatial bursts. PVLDB3(1), 1091–1102 (2010)
Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: SIGMOD, pp. 1155–1158 (2010)
Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: ICDT, pp. 398–412 (2005)
Mokbel, M.F., Magdy, A.: Microblogs data management systems: querying, analysis, and visualization. In: SIGMOD, pp. 2219–2222 (2016)
Pripuzic, K., Zarko, I.P., Aberer, K.: Top-k/w publish/subscribe: finding k most relevant publications in sliding time window w. In: DEBS, pp. 127–138 (2008)
Samet, H.: The quadtree and related hierarchical data structures. ACM Comput. Surv.16(2), 187–260 (1984)
Shang, S., Chen, L., Jensen, C.S., Wen, J., Kalnis, P.: Searching trajectories by regions of interest. IEEE Trans. Knowl. Data Eng.29(7), 1549–1562 (2017)
Shang, S., Chen, L., Wei, Z., Jensen, C.S., Wen, J., Kalnis, P.: Collective travel planning in spatial networks. IEEE Trans. Knowl. Data Eng.28(5), 1132–1146 (2016)
Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Trajectory similarity join in spatial networks. PVLDB10(11), 1178–1189 (2017)
Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Parallel trajectory similarity joins in spatial networks. VLDB J.27(3), 395–420 (2018)
Shang, S., Chen, L., Zheng, K., Jensen, C.S., Wei, Z., Kalnis, P.: Parallel trajectory-to-location join. IEEE Trans. Knowl. Data Eng.31(6), 1194–1207 (2019)
Shang, S., Ding, R., Zheng, K., Jensen, C.S., Kalnis, P., Zhou, X.: Personalized trajectory matching in spatial networks. VLDB J.23(3), 449–468 (2014)
Shang, S., Liu, J., Zheng, K., Lu, H., Pedersen, T.B., Wen, J.: Planning unobstructed paths in traffic-aware spatial networks. GeoInformatica19(4), 723–746 (2015)
Shang, S., Lu, H., Pedersen, T.B., Xie, X.: Finding traffic-aware fastest paths in spatial networks. In: SSTD, pp. 128–145 (2013)
Shang, S., Lu, H., Pedersen, T.B., Xie, X.: Modeling of traffic-aware travel time in spatial networks. In: MDM, pp. 247–250 (2013)
Shraer, A., Gurevich, M., Fontoura, M., Josifovski, V.: Top-k publish-subscribe for social annotation of news. PVLDB6(6), 385–396 (2013)
Skovsgaard, A., Sidlauskas, D., Jensen, C.S.: Scalable top-k spatio-temporal term querying. In: ICDE, pp. 148–159 (2014)
Sloan, L., Morgan, J.: Who tweets with their location? Understanding the relationship between demographic characteristics and the use of geoservices and geotagging on twitter. PLoS ONE10(11), e0142209 (2015)
Van, L. H., Takasu, A.: Parallelizing top-k frequent spatiotemporal terms computation on key-value stores. In: SIGSPATIAL, pp. 476–479 (2018)
Wang, X., Zhang, Y., Zhang, W., Lin, X.: Efficient identification of local keyword patterns in microblogging platforms. IEEE Trans. Knowl. Data Eng.28(10), 2621–2634 (2016)
Wang, X., Zhang, Y., Zhang, W., Lin, X., Huang, Z.: SKYPE: top-k spatial-keyword publish/subscribe over sliding window. PVLDB9(7), 588–599 (2016)
Wang, X., Zhang, Y., Zhang, W., Lin, X., Wang, W.: Ap-tree: Efficiently support continuous spatial-keyword queries over stream. In: ICDE, pp. 1107–1118 (2015)
Wang, Y., Li, J., Zhong, Y., Zhu, S., Guo, D., Shang, S.: Discovery of accessible locations using region-based geo-social data. World Wide Web22(3), 929–944 (2019)
Xiong, X., Mokbel, M.F., Aref, W.G.: Sea-cnn: Scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases. In: ICDE, pp. 643–654 (2005)
Xu, Y., Chen, L., Yao, B., Shang, S., Zhu, S., Zheng, K., Li, F.: Location-based top-k term querying over sliding window. In: WISE, pp. 299–314 (2017)
Xu, Y., Wang, K., Zhang, B., Chen, Z.: Privacy-enhancing personalized web search. In: WWW, pp. 591–600 (2007)
Yang, C., Chen, L., Shang, S., Zhu, F., Liu, L., Shao, L.: Toward efficient navigation of massive-scale geo-textual streams. In: IJCAI, pp. 4838–4845 (2019)
Yu, M., Li, G., Feng, J.: A cost-based method for location-aware publish/subscribe services. In: CIKM, pp. 693–702 (2015)
Yu, M., Li, G., Wang, T., Feng, J., Gong, Z.: Efficient filtering algorithms for location-aware publish/subscribe. IEEE Trans. Knowl. Data Eng.27(4), 950–963 (2015)
Zhao, K., Chen, L., Cong, G.: Topic exploration in spatio-temporal document collections. In: SIGMOD, pp. 985–998 (2016)
Zhao, K., Liu, Y., Yuan, Q., Chen, L., Chen, Z., Cong, G.: Towards personalized maps: mining user preferences from geo-textual data. PVLDB9(13), 1545–1548 (2016)
Zhao, Y., Shang, S., Wang, Y., Zheng, B., Nguyen, Q.V.H., Zheng, K.: REST: A reference-based framework for spatio-temporal trajectory compression. In: KDD, pp. 2797–2806 (2018)
Acknowledgements
This work was supported by the National Natural Science Foundation of China (61932004, 61922054, 61872235, 61729202, 61832017, U1636210), the National Key Research and Development Program of China (2018YFC1504504, 2016YFB0700502), and Hong Kong RGC Grant 12201018.
Author information
Authors and Affiliations
University of Electronic Science and Technology of China, Chengdu, China
Lisi Chen & Shuo Shang
Aalborg University, Aalborg, Denmark
Christian S. Jensen
Hong Kong Baptist University, Kowloon Tong, Hong Kong
Jianliang Xu
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Panos Kalnis
Shanghai Jiao Tong University, Shanghai, China
Bin Yao
Inception Institute of Artificial Intelligence, Abu Dhabi, UAE
Ling Shao
- Lisi Chen
You can also search for this author inPubMed Google Scholar
- Shuo Shang
You can also search for this author inPubMed Google Scholar
- Christian S. Jensen
You can also search for this author inPubMed Google Scholar
- Jianliang Xu
You can also search for this author inPubMed Google Scholar
- Panos Kalnis
You can also search for this author inPubMed Google Scholar
- Bin Yao
You can also search for this author inPubMed Google Scholar
- Ling Shao
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toShuo Shang.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, L., Shang, S., Jensen, C.S.et al. Top-k term publish/subscribe for geo-textual data streams.The VLDB Journal29, 1101–1128 (2020). https://doi.org/10.1007/s00778-020-00607-8
Received:
Revised:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative