Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

WDBench: A Wikidata Graph Query Benchmark

  • Conference paper
  • First Online:

Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 13489))

Included in the following conference series:

  • 3023Accesses

Abstract

We propose WDBench: a query benchmark for knowledge graphs based on Wikidata, featuring real-world queries extracted from the public query logs of the Wikidata SPARQL endpoint. While a number of benchmarks for graph databases (including SPARQL engines) have been proposed in recent years, few are based on real-world data, even fewer use real-world queries, and fewer still allow for comparing SPARQL engines with (non-SPARQL) graph databases. The raw Wikidata query log contains millions of diverse queries, where it would be prohibitively costly to run all such queries, and difficult to draw conclusions given the mix of features that these queries use. WDBench thus focuses on three main query features that are common to SPARQL and graph databases: (i) basic graph patterns, (ii) optional graph patterns, (iii) path patterns, and (iv) navigational graph patterns. We extract queries from the Wikidata logs specifically to test these patterns, clean them of non-standard features, remove duplicates, classify them into different structural subsets, and present them in two different syntaxes. Using this benchmark, we present and compare performance results for evaluating queries using Blazegraph, Jena/Fuseki, Virtuoso and Neo4j.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

Notes

  1. 1.
  2. 2.
  3. 3.

    Property paths include negated property sets that fall outside 2RPQs [28], but these are rarely used [13], and can be partially emulated through disjunction (|) [28].

  4. 4.

    This is done by the command “# sync; echo 3> /proc/sys/vm/drop_caches”.

  5. 5.

    We also have results for MillenniumDB [45], which we do not include here since the system has been developed by the authors. We keep our results third-party.

References

  1. Ali, W., Saleem, M., Yao, B., Hogan, A., Ngomo, A.-C.N.: A survey of RDF stores & SPARQL engines for querying knowledge graphs. VLDB J. 1–26 (2021).https://doi.org/10.1007/s00778-021-00711-3

  2. Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 197–212. Springer, Cham (2014).https://doi.org/10.1007/978-3-319-11964-9_13

    Chapter  Google Scholar 

  3. Angles, R., Aranda, C.B., Hogan, A., Rojas, C., Vrgoč, D.: WDBench: a Wikidata graph query benchmark (2022).https://figshare.com/s/50b7544ad6b1f51de060

  4. Angles, R., Aranda, C.B., Hogan, A., Rojas, C., Vrgoč, D.: WDBench: a Wikidata graph query benchmark (2022).https://github.com/MillenniumDB/WDBench

  5. Angles, R., Arenas, M., Barceló, P., Hogan, A., Reutter, J.L., Vrgoc, D.: Foundations of modern query languages for graph databases. ACM Comput. Surv.50(5), 68:1–68:40 (2017)

    Google Scholar 

  6. Baeza, P.B., Querying graph databases. In: Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, New York, NY, USA, 22–27 June 2013, pp. 175–188 (2013)

    Google Scholar 

  7. Bagan, G., Bonifati, A., Ciucanu, R., Fletcher, G.H.L., Lemay, A., Advokaat, N.: gMark: Schema-driven generation of graphs and queries. IEEE Trans. Knowl. Data Eng.29(4), 856–869 (2017)

    Article  Google Scholar 

  8. Baier, J.A., Daroch, D., Reutter, J.L., Vrgoc, D.: Evaluating navigational RDF queries over the web. In: Proceedings of the 28th ACM Conference on Hypertext and Social Media, HT 2017, Prague, Czech Republic, 4–7 July 2017, pp. 165–174 (2017)

    Google Scholar 

  9. Bail, S., et al.: FishMark: a linked data application benchmark. In: Fokoue, A., Liebig, T., Goodman, E.L., Weaver, J., Urbani, J., Mizell, D. (eds.) Proceedings of the Joint Workshop on Scalable and High-Performance Semantic Web Systems. CEUR Workshop Proceedings, Boston, 11 November 2012, vol. 943, pp. 1–15. CEUR-WS.org (2012)

    Google Scholar 

  10. Barceló, P., Kröll, M., Pichler, R., Skritek, S.: Efficient evaluation and static analysis for well-designed pattern trees with projection. ACM Trans. Database Syst.43(2), 8:1–8:44 (2018)

    Google Scholar 

  11. Bast, H., Buchhold, B.: QLever: a query engine for efficient SPARQL+Text search. In: Lim, E., et al. (eds.) Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, 06–10 November 2017, pp. 647–656. ACM (2017)

    Google Scholar 

  12. Bizer, C., Schultz, A.: The berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst.5(2), 1–24 (2009)

    Article  Google Scholar 

  13. Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. VLDB J. 655–679 (2019).https://doi.org/10.1007/s00778-019-00558-9

  14. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: an architecture for storing and querying RDF data and schema information. In: Fensel, D., Hendler, J.A., Lieberman, H., Wahlster, W. (eds.) Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential [Outcome of a Dagstuhl Seminar], pp. 197–222. MIT Press (2003)

    Google Scholar 

  15. Calvanese, D., Giacomo, G.D., Lenzerini, M., Vardi, M.Y.: Reasoning on regular path queries. SIGMOD Rec.32(4), 83–92 (2003)

    Article  Google Scholar 

  16. Cyganiak, R., Wood, D., Lanthaler, M.: RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation (2014)

    Google Scholar 

  17. Demartini, G., Enchev, I., Wylot, M., Gapany, J., Cudré-Mauroux, P.: BowlognaBench—benchmarking RDF analytics. In: Aberer, K., Damiani, E., Dillon, T. (eds.) SIMPDA 2011. LNBIP, vol. 116, pp. 82–102. Springer, Heidelberg (2012).https://doi.org/10.1007/978-3-642-34044-4_5

    Chapter  Google Scholar 

  18. Erling, O.: Virtuoso, a hybrid RDBMS/graph column store. IEEE Data Eng. Bull.35(1), 3–8 (2012)

    Google Scholar 

  19. Erling, O., et al.: The LDBC social network benchmark: interactive workload. In: Sellis, T.K., Davidson, S.B., Ives, Z.G. (eds.) Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, 31 May–4 June 2015, pp. 619–630. ACM (2015)

    Google Scholar 

  20. Francis, N., et al.: Cypher: an evolving query language for property graphs. In: Das, G., Jermaine, C.M., Bernstein, P.A. (eds.) Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, 10–15 June 2018, pp. 1433–1445. ACM (2018)

    Google Scholar 

  21. Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant.3(2–3), 158–182 (2005)

    Article  Google Scholar 

  22. Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 Query Language. W3C Recommendation (2013)

    Google Scholar 

  23. Hernández, D., Hogan, A., Krötzsch, M.: Reifying RDF: what works well with Wikidata? In: Liebig, T., Fokoue, A. (eds.) Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems Co-located with 14th International Semantic Web Conference (ISWC 2015), Bethlehem, PA, USA, 11 October 2015, vol. 1457. CEUR Workshop Proceedings, pp. 32–47. CEUR-WS.org (2015)

    Google Scholar 

  24. Hernández, D., Hogan, A., Riveros, C., Rojas, C., Zerega, E.: Querying Wikidata: comparing SPARQL, relational and graph databases. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 88–103. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46547-0_10

    Chapter  Google Scholar 

  25. Hogan, A., et al.: Knowledge graphs. ACM Comput. Surv.54(4), 71:1–71:37 (2021)

    Google Scholar 

  26. Hogan, A., Riveros, C., Rojas, C., Soto, A.: A worst-case optimal join algorithm for SPARQL. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11778, pp. 258–275. Springer, Cham (2019).https://doi.org/10.1007/978-3-030-30793-6_15

    Chapter  Google Scholar 

  27. Jena Team: TDB Documentation (2021)

    Google Scholar 

  28. Kostylev, E.V., Reutter, J.L., Romero, M., Vrgoč, D.: SPARQL with property paths. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 3–18. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-25007-6_1

    Chapter  Google Scholar 

  29. Lehmann, J., et al.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web6(2), 167–195 (2015)

    Article  Google Scholar 

  30. Malyshev, S., Krötzsch, M., González, L., Gonsior, J., Bielefeldt, A.: Getting the most out of Wikidata: semantic technology usage in Wikipedia’s knowledge graph. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 376–394. Springer, Cham (2018).https://doi.org/10.1007/978-3-030-00668-6_23

    Chapter  Google Scholar 

  31. Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011).https://doi.org/10.1007/978-3-642-25073-6_29

    Chapter  Google Scholar 

  32. Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J.19(1), 91–113 (2010)

    Article  Google Scholar 

  33. Pérez, J., Arenas, M., Gutiérrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst.34(3), 16:1–16:45 (2009)

    Google Scholar 

  34. Romero, M.: The tractability frontier of well-designed SPARQL queries. In: den Bussche, Arenas, M. (eds.) Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, 10–15 June 2018, pp. 295–306. ACM (2018)

    Google Scholar 

  35. Saleem, M., Ali, M.I., Hogan, A., Mehmood, Q., Ngomo, A.-C.N.: LSQ: the linked SPARQL queries dataset. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 261–269. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-25010-6_15

    Chapter  Google Scholar 

  36. Saleem, M., Mehmood, Q., Ngonga Ngomo, A.-C.: FEASIBLE: a feature-based SPARQL benchmark generation framework. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 52–69. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-25007-6_4

    Chapter  Google Scholar 

  37. Saleem, M., Szárnyas, G., Conrads, F., Bukhari, S.A.C., Mehmood, Q., Ngomo, A.N.: How representative is a SPARQL benchmark? An analysis of RDF triplestore benchmarks. In: The World Wide Web Conference, pp. 1623–1633. ACM (2019)

    Google Scholar 

  38. Schmelzeisen, L., Dima, C., Staab, S.: Wikidated 1.0: an evolving knowledge graph dataset of Wikidata’s revision history. In: Kaffee, L., Razniewski, S., Hogan, A. (eds.) Proceedings of the 2nd Wikidata Workshop (Wikidata 2021) Co-located with the 20th International Semantic Web Conference (ISWC 2021), Virtual Conference, 24 October 2021, vol. 2982. CEUR Workshop Proceedings. CEUR-WS.org (2021)

    Google Scholar 

  39. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP\(\hat{~}\)2Bench: a SPARQL performance benchmark. In: Ioannidis, Y.E., Lee, D.L., Ng, R.T. (eds.) Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, 29 March 2009–2 April 2009, Shanghai, China, pp. 222–233. IEEE Computer Society (2009)

    Google Scholar 

  40. Szárnyas, G., Izsó, B., Ráth, I., Varró, D.: The train benchmark: cross-technology performance evaluation of continuous model queries. Softw. Syst. Model.17(4), 1365–1393 (2017).https://doi.org/10.1007/s10270-016-0571-8

    Article  Google Scholar 

  41. The Wikimedia Foundation. Wikidata: Database download (2021)

    Google Scholar 

  42. Thompson, B.B., Personick, M., Cutcher, M.: The Bigdata® RDF graph database. In: Harth, A., Hose, K., Schenkel, R. (eds.) Linked Data Management, pp. 193–237. Chapman and Hall/CRC, Boca Raton (2014)

    Google Scholar 

  43. Vandenbussche, P., Umbrich, J., Matteis, L., Hogan, A., Aranda, C.B.: SPARQLES: monitoring public SPARQL endpoints. Semant. Web8(6), 1049–1065 (2017)

    Article  Google Scholar 

  44. Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM57(10), 78–85 (2014)

    Article  Google Scholar 

  45. Vrgoc, D., et al.: MillenniumDB: a persistent, open-source, graph database. CoRR, abs/2111.01540 (2021)

    Google Scholar 

  46. Webber, J.: A programmatic introduction to Neo4j. In: Leavens, G.T. (ed.) Conference on Systems, Programming, and Applications: Software for Humanity, SPLASH 2012, Tucson, AZ, USA, 21–25 October 2012, pp. 217–218. ACM (2012)

    Google Scholar 

  47. Wikimedia Foundation: Wikidata SPARQL Logs (2022).https://iccl.inf.tu-dresden.de/web/Wikidata_SPARQL_Logs/en

  48. Wu, H., Fujiwara, T., Yamamoto, Y., Bolleman, J.T., Yamaguchi, A.: BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data. J. Biomed. Semant.5, 32 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. IMFD Chile, Santiago, Chile

    Renzo Angles, Carlos Buil Aranda, Aidan Hogan, Carlos Rojas & Domagoj Vrgoč

  2. DCC, Universidad de Talca, Talca, Chile

    Renzo Angles

  3. Universidad Técnica Federico Santa María, Valparaíso, Chile

    Carlos Buil Aranda

  4. DCC, Universidad de Chile, Santiago, Chile

    Aidan Hogan

  5. PUC Chile, Santiago, Chile

    Domagoj Vrgoč

Authors
  1. Renzo Angles

    You can also search for this author inPubMed Google Scholar

  2. Carlos Buil Aranda

    You can also search for this author inPubMed Google Scholar

  3. Aidan Hogan

    You can also search for this author inPubMed Google Scholar

  4. Carlos Rojas

    You can also search for this author inPubMed Google Scholar

  5. Domagoj Vrgoč

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toDomagoj Vrgoč.

Editor information

Editors and Affiliations

  1. University of Manchester, Manchester, UK

    Ulrike Sattler

  2. University of Chile, Santiago, Chile

    Aidan Hogan

  3. University of Cape Town, Cape Town, South Africa

    Maria Keet

  4. University of Bologna, Bologna, Italy

    Valentina Presutti

  5. Universidade Federal do Espírito Santo, Vitória, Brazil

    João Paulo A. Almeida

  6. National Institute of Informatics, Tokyo, Japan

    Hideaki Takeda

  7. Orange, Belfort, France

    Pierre Monnin

  8. Sapienza University of Rome, Rome, Italy

    Giuseppe Pirrò

  9. University of Bari, Bari, Italy

    Claudia d’Amato

Rights and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Angles, R., Aranda, C.B., Hogan, A., Rojas, C., Vrgoč, D. (2022). WDBench: A Wikidata Graph Query Benchmark. In: Sattler, U.,et al. The Semantic Web – ISWC 2022. ISWC 2022. Lecture Notes in Computer Science, vol 13489. Springer, Cham. https://doi.org/10.1007/978-3-031-19433-7_41

Download citation

Publish with us

Societies and partnerships

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp