Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Springer Nature Link
Log in

Top-k term publish/subscribe for geo-textual data streams

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Massive amounts of data that contain spatial, textual, and temporal information are being generated at a rapid pace. With streams of such data, which includes check-ins and geo-tagged tweets, available, users may be interested in being kept up-to-date on which terms are popular in the streams in a particular region of space. To enable this functionality, we aim at efficiently processing two types of general top-k term subscriptions over streams of spatio-temporal documents: region-based top-k spatial-temporal term (RST) subscriptions and similarity-based top-k spatio-temporal term (SST) subscriptions.RST subscriptions continuously maintain the top-k most popular trending terms within a user-defined region.SST subscriptions free users from defining a region and maintain top-k locally popular terms based on a ranking function that combines term frequency, term recency, and term proximity. To solve the problem, we propose solutions that are capable of supporting real-life location-based publish/subscribe applications that process large numbers ofSST andRST subscriptions over a realistic stream of spatio-temporal documents. The performance of our proposed solutions is studied in extensive experiments using two spatio-temporal datasets.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30

Similar content being viewed by others

Notes

  1. We say a subscription matches a term if the term is a top-k result of the subscription.

  2. The complexity isO(1) when theSF score is computed based on Euclidean distance or network distance with pre-computation of pair distances.

  3. We do not index\(d_4\) because it does not contain\(w_3\).

  4. ParameterM is set to 16–128 in experiments (cf. Sect. 5).

  5. The performance discrepancy between baseline and TS is negligible whenk is small. Thus, we only report the result of baseline when varyingk.

References

  1. Abdelhaq, H., Gertz, M.: On the locality of keywords in twitter streams. In: IWGS, pp. 12–20 (2014)

  2. Abdelhaq, H., Gertz, M., Armiti, A.: Efficient online extraction of keywords for localized events in twitter. GeoInformatica21(2), 365–388 (2017)

    Article  Google Scholar 

  3. Ahmed, P., Hasan, M., Kashyap, A., Hristidis, V., Tsotras, V.J.: Efficient computation of top-k frequent terms over spatio-temporal ranges. In: SIGMOD, pp. 1227–1241 (2017)

  4. Altinel, M., Franklin, M.J.: Efficient filtering of xml documents for selective dissemination of information. In: VLDB, pp. 53–64 (2000)

  5. Amati, G., Amodeo, G., Gaibisso, C.: Survival analysis for freshness in microblogging search. In: CIKM, pp. 2483–2486. ACM, New York (2012)

  6. Anick, P.G.: Using terminological feedback for web search refinement: a log-based study. In: SIGIR, pp. 88–95 (2003)

  7. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)

  8. Cao, X., Chen, L., Cong, G., Xiao, X.: Keyword-aware optimal route search. PVLDB5(11), 1136–1147 (2012)

    Google Scholar 

  9. Chen, L., Cong, G.: Diversity-aware top-k publish/subscribe for text stream. In: SIGMOD, pp. 347–362 (2015)

  10. Chen, L., Cong, G., Cao, X.: An efficient query indexing mechanism for filtering geo-textual data. In: SIGMOD, pp. 749–760 (2013)

  11. Chen, L., Cong, G., Cao, X., Tan, K.: Temporal spatial-keyword top-k publish/subscribe. In: ICDE, pp. 255–266 (2015)

  12. Chen, L., Shang, S.: Approximate spatio-temporal top-k publish/subscribe. World Wide Web22(5), 2153–2175 (2019)

    Article  Google Scholar 

  13. Chen, L., Shang, S.: Region-based message exploration over spatio-temporal data streams. In: AAAI, pp. 873–880 (2019)

  14. Chen, L., Shang, S., Jensen, C.S., Yao, B., Zhang, Z., Shao, L.: Effective and efficient reuse of past travel behavior for route recommendation. In: KDD, pp. 488–498 (2019)

  15. Chen, L., Shang, S., Yang, C., Li, J.: Spatial keyword search: a survey. GeoInformatica24(1), 85–106 (2020)

    Article  Google Scholar 

  16. Chen, L., Shang, S., Yao, B., Zheng, K.: Spatio-temporal top-k term search over sliding window. World Wide Web22(5), 1953–1970 (2019)

    Article  Google Scholar 

  17. Chen, L., Shang, S., Zhang, Z., Cao, X., Jensen, C.S., Kalnis, P.: Location-aware top-k term publish/subscribe. In: ICDE, pp. 749–760 (2018)

  18. Chen, L., Shang, S., Zheng, K., Kalnis, P.: Cluster-based subscription matching for geo-textual data streams. In: ICDE, pp. 890–901 (2019)

  19. Chen, Z., Cong, G., Zhang, Z., Fuz, T.Z., Chen, L.: Distributed publish/subscribe query processing on the spatio-textual data stream. In: ICDE, pp. 1095–1106 (2017)

  20. Diao, Y., Fischer, P.M., Franklin, M.J., Yfilter, R. To.: Efficient and scalable filtering of XML documents. In: ICDE, pp. 341–342 (2002)

  21. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math.1, 269–271 (1959)

    Article MathSciNet MATH  Google Scholar 

  22. Efron, M., Golovchinsky, G.: Estimation methods for ranking recent information. In: SIGIR, pp. 495–504. ACM, New York (2011)

  23. Farzindar, A., Khreich, W.: A survey of techniques for event detection in twitter. Comput. Intell.31(1), 132–164 (2015)

    Article MathSciNet  Google Scholar 

  24. Guo, D., Zhu, Y., Xu, W., Shang, S., Ding, Z.: How to find appropriate automobile exhibition halls: towards a personalized recommendation service for auto show. Neurocomputing213, 95–101 (2016)

    Article  Google Scholar 

  25. Guo, L., Zhang, D., Li, G., Tan, K., Bao, Z.: Location-aware pub/sub system: When continuous moving queries meet dynamic event streams. In: SIGMOD, pp. 843–857 (2015)

  26. Haghani, P., Michel, S., Aberer, K.: The gist of everything new: Personalized top-k processing over web 2.0 streams. In: CIKM, pp. 489–498 (2010)

  27. He, Q., Chang, K., Lim, E., Zhang, J.: Bursty feature representation for clustering text streams. In: SDM, pp. 491–496, (2007)

  28. Hu, H., Liu, Y., Li, G., Feng, J., Tan, K.: A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In: ICDE, pp. 711–722 (2015)

  29. Hu, J., Cheng, R., Wu, D., Jin, B.: Efficient top-k subscription matching for location-aware publish/subscribe. In: SSTD, pp. 333–351 (2015)

  30. Hu, M., Liu, S., Wei, F., Wu, Y., Stasko, J.T., Ma, K.: Breaking news on twitter. In: CHI Conference on Human Factors in Computing Systems, CHI ’12, Austin, TX, USA–May 05–10, 2012, pp. 2751–2754 (2012)

  31. Jonathan, C., Magdy, A., Mokbel, M.F., Jonathan, A.: GARNET: A holistic system approach for trending queries in microblogs. In: ICDE, pp. 1251–1262 (2016)

  32. Kwak, H., Lee, C., Park, H., Moon, S.B.: What is twitter, a social network or a news media? In: WWW, pp. 591–600 (2010)

  33. Li, G., Wang, Y., Wang, T., Feng, J.: Location-aware publish/subscribe. In: KDD, pp. 802–810 (2013)

  34. Li, X., Croft, W.B.: Time-based language models. In: CIKM, pp. 469–475. ACM, New York (2003)

  35. Liang, H., Xu, Y., Tjondronegoro, D., Christen, P.: Time-aware topic recommendation based on micro-blogs. In: CIKM, pp. 1657–1661 (2012)

  36. Magdy, A., Abdelhafeez, L., Kang, Y., Ong, E., Mokbel, M.F.: Microblogs data management: a survey. VLDB J. pp. 1–40 (2019)

  37. Magdy, A., Aly, A.M., Mokbel, M.F., Elnikety, S., He, Y., Nath, S., Aref. W.G.: Spatial trending queries on real-time microblogs. In: SIGSPATIAL, pp. 7:1–7:10 (2016)

  38. Mahmood, A.R., Aly, A.M., Aref. W.G.: FAST: frequency-aware indexing for spatio-textual data streams. In: ICDE, pp. 305–316 (2018)

  39. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB, pp. 346–357 (2002)

  40. Mathioudakis, M., Bansal, N., Koudas, N.: Identifying, attributing and describing spatial bursts. PVLDB3(1), 1091–1102 (2010)

    Google Scholar 

  41. Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: SIGMOD, pp. 1155–1158 (2010)

  42. Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: ICDT, pp. 398–412 (2005)

  43. Mokbel, M.F., Magdy, A.: Microblogs data management systems: querying, analysis, and visualization. In: SIGMOD, pp. 2219–2222 (2016)

  44. Pripuzic, K., Zarko, I.P., Aberer, K.: Top-k/w publish/subscribe: finding k most relevant publications in sliding time window w. In: DEBS, pp. 127–138 (2008)

  45. Samet, H.: The quadtree and related hierarchical data structures. ACM Comput. Surv.16(2), 187–260 (1984)

    Article MathSciNet  Google Scholar 

  46. Shang, S., Chen, L., Jensen, C.S., Wen, J., Kalnis, P.: Searching trajectories by regions of interest. IEEE Trans. Knowl. Data Eng.29(7), 1549–1562 (2017)

    Article  Google Scholar 

  47. Shang, S., Chen, L., Wei, Z., Jensen, C.S., Wen, J., Kalnis, P.: Collective travel planning in spatial networks. IEEE Trans. Knowl. Data Eng.28(5), 1132–1146 (2016)

    Article  Google Scholar 

  48. Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Trajectory similarity join in spatial networks. PVLDB10(11), 1178–1189 (2017)

    Google Scholar 

  49. Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Parallel trajectory similarity joins in spatial networks. VLDB J.27(3), 395–420 (2018)

    Article  Google Scholar 

  50. Shang, S., Chen, L., Zheng, K., Jensen, C.S., Wei, Z., Kalnis, P.: Parallel trajectory-to-location join. IEEE Trans. Knowl. Data Eng.31(6), 1194–1207 (2019)

    Article  Google Scholar 

  51. Shang, S., Ding, R., Zheng, K., Jensen, C.S., Kalnis, P., Zhou, X.: Personalized trajectory matching in spatial networks. VLDB J.23(3), 449–468 (2014)

    Article  Google Scholar 

  52. Shang, S., Liu, J., Zheng, K., Lu, H., Pedersen, T.B., Wen, J.: Planning unobstructed paths in traffic-aware spatial networks. GeoInformatica19(4), 723–746 (2015)

    Article  Google Scholar 

  53. Shang, S., Lu, H., Pedersen, T.B., Xie, X.: Finding traffic-aware fastest paths in spatial networks. In: SSTD, pp. 128–145 (2013)

  54. Shang, S., Lu, H., Pedersen, T.B., Xie, X.: Modeling of traffic-aware travel time in spatial networks. In: MDM, pp. 247–250 (2013)

  55. Shraer, A., Gurevich, M., Fontoura, M., Josifovski, V.: Top-k publish-subscribe for social annotation of news. PVLDB6(6), 385–396 (2013)

    Google Scholar 

  56. Skovsgaard, A., Sidlauskas, D., Jensen, C.S.: Scalable top-k spatio-temporal term querying. In: ICDE, pp. 148–159 (2014)

  57. Sloan, L., Morgan, J.: Who tweets with their location? Understanding the relationship between demographic characteristics and the use of geoservices and geotagging on twitter. PLoS ONE10(11), e0142209 (2015)

    Article  Google Scholar 

  58. Van, L. H., Takasu, A.: Parallelizing top-k frequent spatiotemporal terms computation on key-value stores. In: SIGSPATIAL, pp. 476–479 (2018)

  59. Wang, X., Zhang, Y., Zhang, W., Lin, X.: Efficient identification of local keyword patterns in microblogging platforms. IEEE Trans. Knowl. Data Eng.28(10), 2621–2634 (2016)

    Article  Google Scholar 

  60. Wang, X., Zhang, Y., Zhang, W., Lin, X., Huang, Z.: SKYPE: top-k spatial-keyword publish/subscribe over sliding window. PVLDB9(7), 588–599 (2016)

    Google Scholar 

  61. Wang, X., Zhang, Y., Zhang, W., Lin, X., Wang, W.: Ap-tree: Efficiently support continuous spatial-keyword queries over stream. In: ICDE, pp. 1107–1118 (2015)

  62. Wang, Y., Li, J., Zhong, Y., Zhu, S., Guo, D., Shang, S.: Discovery of accessible locations using region-based geo-social data. World Wide Web22(3), 929–944 (2019)

    Article  Google Scholar 

  63. Xiong, X., Mokbel, M.F., Aref, W.G.: Sea-cnn: Scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases. In: ICDE, pp. 643–654 (2005)

  64. Xu, Y., Chen, L., Yao, B., Shang, S., Zhu, S., Zheng, K., Li, F.: Location-based top-k term querying over sliding window. In: WISE, pp. 299–314 (2017)

  65. Xu, Y., Wang, K., Zhang, B., Chen, Z.: Privacy-enhancing personalized web search. In: WWW, pp. 591–600 (2007)

  66. Yang, C., Chen, L., Shang, S., Zhu, F., Liu, L., Shao, L.: Toward efficient navigation of massive-scale geo-textual streams. In: IJCAI, pp. 4838–4845 (2019)

  67. Yu, M., Li, G., Feng, J.: A cost-based method for location-aware publish/subscribe services. In: CIKM, pp. 693–702 (2015)

  68. Yu, M., Li, G., Wang, T., Feng, J., Gong, Z.: Efficient filtering algorithms for location-aware publish/subscribe. IEEE Trans. Knowl. Data Eng.27(4), 950–963 (2015)

    Article  Google Scholar 

  69. Zhao, K., Chen, L., Cong, G.: Topic exploration in spatio-temporal document collections. In: SIGMOD, pp. 985–998 (2016)

  70. Zhao, K., Liu, Y., Yuan, Q., Chen, L., Chen, Z., Cong, G.: Towards personalized maps: mining user preferences from geo-textual data. PVLDB9(13), 1545–1548 (2016)

    Google Scholar 

  71. Zhao, Y., Shang, S., Wang, Y., Zheng, B., Nguyen, Q.V.H., Zheng, K.: REST: A reference-based framework for spatio-temporal trajectory compression. In: KDD, pp. 2797–2806 (2018)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61932004, 61922054, 61872235, 61729202, 61832017, U1636210), the National Key Research and Development Program of China (2018YFC1504504, 2016YFB0700502), and Hong Kong RGC Grant 12201018.

Author information

Authors and Affiliations

  1. University of Electronic Science and Technology of China, Chengdu, China

    Lisi Chen & Shuo Shang

  2. Aalborg University, Aalborg, Denmark

    Christian S. Jensen

  3. Hong Kong Baptist University, Kowloon Tong, Hong Kong

    Jianliang Xu

  4. King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

    Panos Kalnis

  5. Shanghai Jiao Tong University, Shanghai, China

    Bin Yao

  6. Inception Institute of Artificial Intelligence, Abu Dhabi, UAE

    Ling Shao

Authors
  1. Lisi Chen

    You can also search for this author inPubMed Google Scholar

  2. Shuo Shang

    You can also search for this author inPubMed Google Scholar

  3. Christian S. Jensen

    You can also search for this author inPubMed Google Scholar

  4. Jianliang Xu

    You can also search for this author inPubMed Google Scholar

  5. Panos Kalnis

    You can also search for this author inPubMed Google Scholar

  6. Bin Yao

    You can also search for this author inPubMed Google Scholar

  7. Ling Shao

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toShuo Shang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Shang, S., Jensen, C.S.et al. Top-k term publish/subscribe for geo-textual data streams.The VLDB Journal29, 1101–1128 (2020). https://doi.org/10.1007/s00778-020-00607-8

Download citation

Keywords

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Advertisement


[8]ページ先頭

©2009-2025 Movatter.jp