1473Accesses
Abstract
As the volumes of spatiotemporal trajectory data continue to grow at a rapid pace; a new generation of data management techniques is needed in order to be able to utilize these data to provide a range of data-driven services, including geographic-type services. Key challenges posed by spatiotemporal data include the massive data volumes, the high velocity with which the data are captured, the need for interactive response times, and the inherent inaccuracy of the data. We propose an infrastructure, Elite, that leverages peer-to-peer and parallel computing techniques to address these challenges. The infrastructure offers efficient, parallel update and query processing by organizing the data into a layered index structure that is logically centralized, but physically distributed among computing nodes. The infrastructure is elastic with respect to storage, meaning that it adapts to fluctuations in the storage volume, and with respect to computation, meaning that the degree of parallelism can be adapted to best match the computational requirements. Further, the infrastructure offers advanced functionality, including probabilistic simulations, for contending with the inaccuracy of the underlying data in query processing. Extensive empirical studies offer insight into properties of the infrastructure and indicate that it meets its design goals, thus enabling the effective management of big spatiotemporal data.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.




























Similar content being viewed by others
Notes
The minimum capacity is below half of the maximum capacity. Otherwise, the condensed node may exceed the maximum capacity.
Details on the *node are covered in Sect. 4.3.1.
From experiments, we obtained\(\alpha =5e^{-4}\),\(\beta =1e^{-6}\), and\(\gamma =1e^{-6}\).
There exist different semantics for top-k queries over uncertain data, such as U-TopK, U-kRanks, Expected scores, and Expected ranks. Among them, the Expected score and Expected rank might be best ones in terms of properties such as Containment and Unique-Rank [39].
References
Ceikute, V., Jensen, C.S.: Vehicle routing with user-generated trajectory data. In: MDM (2015)
Yang, B., Guo, C., Ma, Y., Jensen, C.S.: Toward personalized, context-aware routing. VLDB J.24(2), 297–318 (2015)
Dai, J., Yang, B., Guo, C., Jensen, C.S.: Efficient and accurate path cost estimation using trajectory data. In: CoRR abs/1510.02886 (2015)
Stougiannis, A., Pavlovic, M., Tauheed, F., et al.: Data-driven neuroscience: enabling breakthroughs via innovative data management. In: SIGMOD (2013)
Manyika, J., Chui, M.: Big data: the next frontier for innovation, competition, and productivity. In: McKinsey Global Institute (2011)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD (2003)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. TKDE16(9), 1112–1127 (2004)
Trajcevski, G., Tamassia, R., Ding, H., et al.: Continuous probabilistic nearest-neighbor queries for uncertain trajectories. In: EDBT (2009)
Civilis, A., Jensen, C.S., Pakalnis, S.: Techniques for efficient road-network-based tracking of moving objects. TKDE17(5), 698–712 (2005)
Jensen, C.S., Pakalnis, S.: Trax - real-world tracking of moving objects. In: VLDB (2007)
Eldawy, A., Li, Y., Mokbel, M.F., Janardan, R.:\(\text{ CG }\_\text{ Hadoop }\): computational geometry in mapreduce. In: GIS (2013)
Eldawy, A., Mokbel, M.F.: A demonstration of SpatialHadoop: an efficient MapReduce framework for spatial data. In: VLDB (2013)
Aji, A., Wang, F., Vo, H., et al.: Hadoop-GIS: a high performance spatial data warehousing system over mapreduce. In: VLDB (2013)
Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: MD-HBase: a scalable multi-dimensional data infrastructure for location aware services. In: MDM (2011)
Wang, J., Wu, S., Gao, H., et al.: Indexing multi-dimensional data in a cloud system. In: SIGMOD (2010)
Tsatsanifos, G., Sacharidis, D., Sellis, T.: Index-based query processing on distributed multidimensional data. GeoInformatica17(3), 489–519 (2013)
Ratnasamy, S., Francis, P., Handley, M., et al.: A scalable content-addressable network. In: SIGCOMM (2001)
Wei, L.Y., Zheng, Y., Peng, W.C.: Constructing popular routes from uncertain trajectories. In: KDD (2012)
Pei, T., Zhou, C., Zhu, A.-X, et al.: Windowed nearest neighbour method for mining spatio-temporal clusters in the presence of noise. In: International Journal of Geographical Information Science (2010)
Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB (2004)
Pfoser, D., Jensen, C.S.: Capturing the uncertainty of moving-objects representations. In: SSDBM (1999)
Lian, X, Chen, L.: Monochromatic and bichromatic reverse skyline search over uncertain databases. In: SIGMOD (2008)
Kriegel, H.P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: DASFAA (2007)
Pugh, W.: Concurrent maintenance of lists. Technical report, Dept. of Computer Science, University of Maryland (1990)
Gargantini, I.: An effective way to represent octrees. Commun. ACM25(12), 905–910 (1982)
Sidlauskas, D., Saltenis, S., Christiansen, C.W., et al.: Trees or grids?: indexing moving objects in main memory. In: GIS (2009)
Sidlauskas, D., Saltenis, S., Jensen, C.S.: Processing of extreme moving-object update and query workloads in main memory. VLDB J.23(5), 817–841 (2014)
Cheng, R., Chen, J., Mokbel, M., Chow, C.Y.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: ICDE (2008)
You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In: ICDE Workshops (2015)
Trajcevski, G., Wolfson, O., Zhang, F., Chamberlain, S.: The geometry of uncertainty in moving object databases. In: EDBT (2002)
Zheng, K., Trajcevski, G., Zhou, X., Scheuermann, P.: Probabilistic range queries for uncertain trajectories on road networks. In: EDBT (2011)
Zheng, K., Fung, G.P.C., Zhou, X.: K-nearest neighbor search for fuzzy objects. In: SIGMOD (2010)
Xie, X., Yiu, M.L., Cheng, R., Lu, H.: Scalable evaluation of trajectory queries over imprecise location data. TKDE26(8), 2029–2044 (2014)
Tao, Y., Papadias, D.: MV3R-Tree: a spatio-temporal access method for timestamp and interval queries. In: VLDB (2001)
Pfoster, D., Jensen, C.S., Theodoridis, Y.: Novel approaches to the indexing of moving object trajectories. In: VLDB (2000)
Chakka, V.P., Everspaugh, A.C., Patel, J.M., et al.: Indexing large trajectory data sets with SETI. In: The first biennial conference on innovative data systems research (CIDR) (2003).http://www.cidrdb.org/cidr2003/program/p15.pdf
Tsatsanifos, G., Sacharidis, D., Sellis, T.: RIPPLE: a scalable framework for distributed processing of rank queries. In: EDBT (2014)
The apache cassandra project.http://cassandra.apache.org/
Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE (2009)
Born, M.: On the stability of crystal lattices. IX. Covariant theory of lattice deformations and the stability of some hexagonal lattices. In: Proceedings of the Cambridge Philosophical Society 38 (1942)
Acknowledgments
This work was supported by the 973 program with No 2012CB316205, a grant from the Obel Family Foundation, and National Science Foundation of China under Grant No. 61432006. The work was done in part when some of the authors visited SA Center for Big Data Research at Renmin University of China. The center is partially funded by the Chinese National “111” Project “Attracting International Talents in Data Engineering and Knowledge Engineering Research”.
Author information
Authors and Affiliations
Department of Computer Science, Aalborg University, Aalborg, Denmark
Xike Xie & Christian S. Jensen
School of Information, Renmin University of China, Beijing, China
Benjin Mei
Key Lab of Data Engineering and Knowledge Engineering, Renmin University of China, Beijing, China
Jinchuan Chen & Xiaoyong Du
- Xike Xie
You can also search for this author inPubMed Google Scholar
- Benjin Mei
You can also search for this author inPubMed Google Scholar
- Jinchuan Chen
You can also search for this author inPubMed Google Scholar
- Xiaoyong Du
You can also search for this author inPubMed Google Scholar
- Christian S. Jensen
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toJinchuan Chen.
Appendices
Appendix 1: Algorithms



Appendix 2: Routing cost estimation
Lemma 1
The maximum routing cost of a three-dimensional torus ofN nodes is approximately equal to\(0.91 N^{\frac{1}{3}}\).
Proof
For any torus noden, it takes one hop to reach its six neighbors (first-order neighbors) and two hops to reach its 18 second order neighbors. The number ofi-th order neighbors\(a^i\) can be represented by [40]:
Suppose the furthest node on the torus takesm hops from noden. The total number of nodesN visited equals the summation of the number of 1-th tom-th order neighbors. We have:
Thus, the maximum routing cost equals to the distance ton’sm-th order neighbor,\(0.91N^{\frac{1}{3}}\).\(\square \)
Lemma 2
The average routing cost of a three-dimensional torus ofN nodes is approximately equal to\(0.69 N^{\frac{1}{3}}\) hops.
Proof
Based on Lemma 1, the furthest node fromn requiresm hops. Then, the average number of hops is:
\(\square \)
Appendix 3: Estimation ofh
Let us consider the cost estimation on torus node\(T_i\). After the range search\(Q_i \oplus d_{{ max }} \oplus U_{{ max }}\) (Step 6 in Algorithm 2), we get a set\(C_i\) of candidate trajectories with the average length\(\overline{{\mathcal {T}}_c\cdot \varDelta t}=\frac{1}{|C_i|} \sum _{{\mathcal {T}} \in C_i} {\mathcal {T}}\cdot \varDelta t\). According to Definition 5, the cost of STNNQ depends on the number of trajectories at each snapshot. To estimate that, we first assume the trajectories are uniformly distributed in the spatiotemporal region\(Q_i \oplus d_{{ max }} \oplus U_{{ max }}\).
We define the density\(\rho \), as the number of objects per snapshot divided by the area of the filtering bound\(\pi (d_{{ max }}+U_{{ max }})^2\).
Lemma 3
Assume a two-dimensional regionS in the spatial domain\({\mathfrak {S}}\), where the points are uniformly distributed, and let\(N(S) = m\) represent the fact that there arem points inside regionS. The probability of\(N(S) = m\) is given by:
Proof
The probability thatm points out ofn objects are inS is:
The extreme form of the binomial distribution is a Poisson distribution. Let\(\rho = \frac{n}{|{\mathfrak {S}}|}\). The above equation becomes:
Then, the probability that there is at least one point inS is:
\(\square \)
Then, we can infer that there is a nearest neighbor within the circular regionS to the query point with a probability higher than\(P^*\). In our implementation, we set\(P^*\) to 0.9, which is reasonably large forS to contain the nearest neighbor.
The number of candidate objects per snapshot is estimated as:
Appendix 4: Obtaining\(d_{\mathrm{max}}\)
In our system, we try a series of range queries\(Q_i \oplus d \oplus U_{{ max }}\) to incrementally obtain\(d_{{ max }}\), where\(d=5, 10, 20\,\%, \cdots \) of torus node\(T_i\)’s spatial domain size. Upon collecting the candidate trajectory set\(C_i\) by the range search parameterized withd, we test whether the union of these trajectories’ time spans can cover\(Q_i\)’s\(\varDelta t\), i.e., to decide whether\(\cup _{UT \in C}UT\cdot \varDelta t \supseteq Q_i\cdot \varDelta t\) is true. If true, it means that there always exists at least an object for each timestamp in\(Q_i\cdot \varDelta t\). Therefore, currentd is taken as\(d_{{ max }}\), which is sufficiently large for not missing any possible candidate trajectories. Otherwise, we need to increased incrementally and repeat the aforementioned process.
Rights and permissions
About this article
Cite this article
Xie, X., Mei, B., Chen, J.et al. Elite: an elastic infrastructure for big spatiotemporal trajectories.The VLDB Journal25, 473–493 (2016). https://doi.org/10.1007/s00778-016-0425-6
Received:
Revised:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative