Movatterモバイル変換


[0]ホーム

URL:


M. Tamer Özsu, profile picture
Uploaded byM. Tamer Özsu
PDF, PPTX3,396 views

Web Data Management with RDF

The document discusses web data management in the context of RDF (Resource Description Framework) and its significance in the semantic web. It highlights the challenges of web data, including its lack of schema, volatility, and querying difficulty, and explores recent approaches such as RDF, SPARQL, and linked open data. The document notes the rapid growth of RDF data volumes and outlines different strategies for data integration and querying across distributed sources.

Embed presentation

Download as PDF, PPTX
Web Data Management in RDF Age M. Tamer Ozsu University of Waterloo David R. Cheriton School of Computer Science PKU/2014-08-28 1
Acknowledgements This presentation draws upon collaborative research and discussions with the following colleagues (in alphabetical order) Gunes Aluc, University of Waterloo Khuzaima Daudjee, University of Waterloo Olaf Hartig, University of Waterloo Lei Chen, Hong Kong University of Science  Technology Lei Zou, Peking University PKU/2014-08-28 2
Web Data Management A long term research interest in the DB community 2000 2004 2011 2011 PKU/2014-08-28 3
Interest Due to Properties of Web Data Lack of a schema Data is at best semi-structured Missing data, additional attributes, similar data but not identical Volatility Changes frequently May conform to one schema now, but not later Scale Does it make sense to talk about a schema for Web? How do you capture everything? Querying diculty What is the user language? What are the primitives? Arent search engines or metasearch engines sucient? PKU/2014-08-28 4
More Recent Approaches to Web Querying Fusion Tables Users contribute data in spreadsheet, CVS, KML format Possible joins between multiple data sets Extensive visualization PKU/2014-08-28 8
More Recent Approaches to Web Querying Fusion Tables Users contribute data in spreadsheet, CVS, KML format Possible joins between multiple data sets Extensive visualization XML Data exchange language Primarily tree based structure list title=MOVIES film titleThe Shining/title directorStanley Kubrick/director actorJack Nicholson/actor /film film titleSpartacus/title directorStanley Kubrick/director /film film titleThe Passenger/title actorJack Nicholson/actor /film ... /list root
lm title The Shining director Stanley Kubrick
lm actor ... Jack Nicholson
lm title The Passenger actor Jack Nicholson PKU/2014-08-28 8
More Recent Approaches to Web Querying Fusion Tables Users contribute data in spreadsheet, CVS, KML format Possible joins between multiple data sets Extensive visualization XML Data exchange language Primarily tree based structure RDF (Resource Description Framework)  SPARQL W3C recommendation Simple, self-descriptive model Building block of semantic web  Linked Open Data (LOD) PKU/2014-08-28 8
RDF and Semantic Web RDF is a language for the conceptual modeling of information about resources (web resources in our context) A building block of semantic web Facilitates exchange of information Search engine results can be more focused and structured Facilitates data integration (mashes) Machine understandable Understand the information on the web and the interrelationships among them PKU/2014-08-28 9
RDF Uses Yago and DBpedia extract facts from Wikipedia  represent as RDF ! structural queries Communities build RDF data E.g., biologists: Bio2RDF and Uniprot RDF Web data integration Linked Open Data Cloud . . . PKU/2014-08-28 10
RDF Data Volumes . . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year PKU/2014-08-28 11
RDF Data Volumes . . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year As of March 2009 LinkedCT Reactome Taxonomy KEGG GeneID PubMed Pfam UniProt OMIM PDB BBC Later + TOTP riese Symbol ChEBI Daily Med Disea-some CAS HGNC Inter Pro Drug Bank UniParc UniRef ProDom PROSITE Gene Ontology Homolo Gene Pub Chem MGI UniSTS GEO Species Jamendo BBC Programm es Music-brainz Magna-tune Surge Radio MySpace Wrapper Audio- Scrobbler Linked MDB BBC John Peel BBC Playcount Data Gov- Track US Census Data Geo-names lingvoj World Fact-book Euro-stat IRIT Toulouse SW Conference Corpus RDF Book Mashup Project Guten-berg DBLP Hannover DBLP Berlin LAAS-CNRS Buda-pest BME IEEE IBM Resex Pisa New-castle RAE 2001 CiteSeer ACM DBLP RKB Explorer eprints LIBRIS Semantic Web.org Eurécom ECS South-ampton SIOC Revyu Sites Doap-space Flickr exporter FOAF profiles flickr wrappr Crunch Base Sem- Web- Central Open- Guides Wiki-company QDOS Pub Guide Open Calais RDF ohloh W3C WordNet Open Cyc UMBEL Yago DBpedia Freebase Virtuoso Sponger March '09: 89 datasets Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ PKU/2014-08-28 11
RDF Data Volumes . . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year New-castle User-generated content As of September 2010 Audio-scrobbler (DBTune) Music Brainz (zitgist) P20 YAGO Chronic-ling America World Fact-book (FUB) Geo Names Moseley Folk WordNet (VUA) WordNet (W3C) VIVO UF VIVO Indiana VIVO Cornell VIAF URI Burner Sussex Reading Lists Plymouth Reading Lists UMBEL UK Post-codes legislation .gov.uk Uberblic UB Mann-heim TWC LOGD GTAA BBC Program mes Twarql transport data.gov .uk totl.net Tele-graphis TCM Gene DIT Taxon Concept The Open Library (Talis) t4gm Surge Radio RAMEAU SH STW statistics data.gov .uk St. Andrews Resource Lists ECS South-ampton EPrints Semantic Crunch Base semantic web.org Semantic XBRL SW Dog Food rdfabout US SEC Wiki UN/ LOCODE Ulm ECS (RKB Explorer) Roma RISKS RESEX RAE2001 Pisa OS OAI NSF LAAS KISTI JISC IRIT IEEE IBM Eurécom ERA ePrints dotAC DEPLOY DBLP (RKB Explorer) Course-ware CORDIS CiteSeer Budapest ACM riese Revyu research data.gov .uk reference data.gov .uk Recht-spraak. nl RDF ohloh Last.FM (rdfize) RDF Book Mashup PSH lingvoj Product DB Poké-pédia PBAC Ord-nance Survey Openly Local The Open Library Open Cyc Open Calais OpenEI New York Times NTU Resource Lists NDL subjects MARC Codes List Man-chester Reading Lists Lotico The London Gazette LOIUS lobid Resources lobid Organi-sations Linked MDB Linked LCCN Linked GeoData Linked CT Linked Open Numbers LIBRIS Lexvo LCSH DBLP (L3S) Linked Sensor Data (Kno.e.sis) Good-win Family Jamendo iServe NSZL Catalog GovTrack GESIS Geo Species Geo Linked Data (es) STITCH Project Guten-berg (FUB) SIDER Medi Care Euro-stat (FUB) Drug Bank Disea-some DBLP (FU Berlin) Daily Med Freebase flickr wrappr Fishes of Texas FanHubz Event- Media EUTC Produc-tions Eurostat EUNIS ESD stan-dards Popula-tion (En- AKTing) NHS (EnAKTing) Mortality (En- AKTing) Energy (En- AKTing) CO2 (En- AKTing) education data.gov .uk ECS South-ampton Gem. Norm-datei data dcs MySpace (DBTune) Music Brainz (DBTune) Magna-tune John Peel (DB Tune) classical (DB Tune) Last.fm Artists (DBTune) DB Tropes dbpedia lite DBpedia Pokedex Airports NASA (Data Incu-bator) Music Brainz (Data Incubator) Discogs (Data In-cubator) Climbing Linked Data for Intervals Cornetto Chem2 Bio2RDF biz. data. gov.uk UniRef UniSTS Uni Path-way Taxo-nomy UniParc UniProt SGD Reactome PubMed Pub Chem Pfam PDB PRO-SITE ProDom OMIM OBO MGI KEGG Reaction KEGG Drug KEGG Pathway KEGG Glycan KEGG Enzyme KEGG Cpd InterPro Homolo Gene HGNC Gene Ontology GeneID Gen Bank ChEBI CAS Affy-metrix BibBase BBC Wildlife Finder BBC Music rdfabout US Census Media Geographic Publications Government Cross-domain Life sciences September '10: 203 datasets Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ PKU/2014-08-28 11
RDF Data Volumes . . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year RESEX IBM User-generated content As of September 2011 Audio Scrobbler (DBTune) Music Brainz (zitgist) P20 Turismo de Zaragoza yovisto Yahoo! Geo Planet World Fact-book Moseley Folk YAGO El Viajero Tourism BBC Program mes BBC Geo Names WordNet (VUA) WordNet (W3C) VIVO UF Calames VIVO Indiana VIVO Cornell VIAF URI Burner Sussex Reading Lists Plymouth Reading Lists Source Code Ecosystem Linked Data UniProt PubMed UniRef UMBEL UK Post-codes legislation data.gov.uk Uberblic UB Mann-heim TWC LOGD Twarql transport data.gov. uk Traffic Scotland theses. fr Thesau-rus W totl.net Tele-graphis Semantic Tweet TCM Gene DIT Taxon Concept Open Library (Talis) tags2con delicious t4gm info Swedish Open Cultural Heritage Surge Radio Sudoc STW RAMEAU SH statistics data.gov. uk St. Andrews Resource Lists ECS South-ampton EPrints SSW Thesaur us Linked User Feedback gnoss Greek DBpedia Smart Link Slideshare 2RDF semantic web.org GovTrack Semantic XBRL SW Dog Food US SEC (rdfabout) Sears Scotland Pupils  Exams Scotland Geo-graphy Scholaro-meter WordNet (RKB Explorer) Wiki UN/ LOCODE Ulm ECS (RKB Explorer) Roma RISKS RAE2001 Pisa OS OAI NSF New-castle LAAS KISTI JISC IRIT IEEE Eurécom ERA ePrints dotAC DEPLOY DBLP (RKB Explorer) Crime Reports UK Course-ware CORDIS (RKB Explorer) CiteSeer Budapest ACM riese Revyu research data.gov. Ren. uk Energy Genera-tors reference data.gov. uk Recht-spraak. nl RDF ohloh Last.FM (rdfize) RDF Book Mashup Rådata nå! PSH Product Types Ontology Product DB PBAC Poké-pédia patents data.go v.uk Ox Points Ord-nance Survey Openly Local Open Library Open Cyc Open Corpo-rates Open Calais OpenEI Open Election Data Project Open Data Thesau-rus Ontos News Portal OGOLOD Ocean Drilling Codices Janus AMP New York Times NVD ntnusc NTU Resource Lists Norwe-gian MeSH NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico Weather Stations London Gazette LOIUS Linked Open Colors lobid Resources lobid Organi-sations LEM Linked MDB LinkedL CCN Linked GeoData LinkedCT LOV Linked Open Numbers LODE Eurostat (Ontology Central) Linked EDGAR (Ontology Central) Linked Crunch-base lingvoj Lichfield Spen-ding LIBRIS Lexvo LCSH DBLP (L3S) Linked Sensor Data (Kno.e.sis) Klapp-stuhl-club Good-win Family National Radio-activity JP Jamendo (DBtune) Italian public schools ISTAT Immi-gration iServe IdRef Sudoc NSZL Catalog Hellenic PD Hellenic FBD Piedmont Accomo-dations GovWILD Google Art wrapper GESIS GeoWord Net Geo Species Geo Linked Data GEMET GTAA STITCH SIDER Project Guten-berg Medi Care Euro-stat (FUB) EURES Drug Bank Disea-some DBLP (FU Berlin) Daily Med CORDIS (FUB) Freebase flickr wrappr Fishes of Texas Finnish Munici-palities ChEMBL FanHubz Event Media EUTC Produc-tions Eurostat Europeana EUNIS EU Insti-tutions ESD stan-dards EARTh Enipedia Popula-tion (En- AKTing) NHS (En- AKTing) Mortality (En- AKTing) Energy (En- AKTing) Crime (En- AKTing) CO2 Emission (En- AKTing) EEA SISVU educatio n.data.g ov.uk ECS South-ampton ECCO-TCP GND Didactal ia DDC Deutsche Bio-graphie data dcs Music Brainz (DBTune) Magna-tune John Peel (DBTune) Classical (DB Tune) Last.FM artists (DBTune) DB Tropes Portu-guese DBpedia dbpedia lite DBpedia data-open-ac- uk SMC Journals Pokedex Airports NASA (Data Incu-bator) Music Brainz (Data Incubator) Metoffice Weather Forecasts Discogs (Data Incubator) Climbing data.gov.uk intervals Data Gov.ie data bnf.fr Cornetto reegle Chronic-ling America Chem2 Bio2RDF business data.gov. uk Bricklink Brazilian Poli-ticians BNB UniSTS UniPath way UniParc Taxono my UniProt (Bio2RDF) SGD Reactome Pub Chem PRO-SITE ProDom Pfam PDB OMIM MGI KEGG Reaction KEGG Pathway KEGG Glycan KEGG Enzyme KEGG Drug KEGG Com-pound InterPro Homolo Gene HGNC Gene Ontology GeneID Affy-metrix bible ontology BibBase FTS BBC Wildlife Finder Music Alpine Ski Austria LOCAH Amster-dam Museum AGROV OC AEMET US Census (rdfabout) Media Geographic Publications Government Cross-domain Life sciences September '11: 295 datasets, 25B triples Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ PKU/2014-08-28 11
RDF Data Volumes . . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year April '14: 1091 datasets, ??? triples Max Schmachtenberg, Christian Bizer, and Heiko Paulheim: Adoption of Linked Data Best Practices in Dierent Topical Domains. In Proc. ISWC, 2014. PKU/2014-08-28 11
Closer Look PKU/2014-08-28 12
Globally Distributed Network of Data PKU/2014-08-28 13
Three Approaches Data warehousing Consolidate data in a repository and query it SPARQL federation Leverage query services provided by data publishers Live Linked Data querying Navigate through LOD by looking up URIs at query execution time PKU/2014-08-28 14
Outline 1 LOD and RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 15
Outline 1 LOD and RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 16
Traditional Hypertext-based Web Access IMDb World Book Data exposed to the Web via HTML PKU/2014-08-28 17
Linked Data Publishing Principles (http://...linkedmdb.../Shining,releaseDate, 23 May 1980) (http://...linkedmdb.../Shining,
lmLocation, http://cia.../UK) (http://...linkedmdb.../29704,actedIn, http://...linkedmdb.../Shining) IMDb World Book ... (http://cia.../UK, hasPopulation, 63230000) ... Shining UK Data model: RDF Global identi
er: URI Access mechanism: HTTP Connection: data links PKU/2014-08-28 18
RDF Introduction Everything is an uniquely named resource http://data.linkedmdb.org/resource/actor/JN29704 PKU/2014-08-28 19
RDF Introduction Everything is an uniquely named resource Pre
xes can be used to shorten the names xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 PKU/2014-08-28 19
RDF Introduction Everything is an uniquely named resource Pre
xes can be used to shorten the names Properties of resources can be de
ned xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 y:JN29704:hasName Jack Nicholson y:JN29704:BornOnDate 1937-04-22 PKU/2014-08-28 19
RDF Introduction Everything is an uniquely named resource Pre
xes can be used to shorten the names Properties of resources can be de
ned Relationships with other resources can be de
ned xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 y:JN29704:hasName Jack Nicholson y:JN29704:BornOnDate 1937-04-22 JN29704:movieActor y:TS2014 y:TS2014:title The Shining y:TS2014:releaseDate 1980-05-23 PKU/2014-08-28 19
RDF Introduction Everything is an uniquely named resource Pre
xes can be used to shorten the names Properties of resources can be de
ned Relationships with other resources can be de
ned Resource descriptions can be contributed by dierent people/groups and can be located anywhere in the web Integrated web database xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 y:JN29704:hasName Jack Nicholson y:JN29704:BornOnDate 1937-04-22 JN29704:movieActor y:TS2014 y:TS2014:title The Shining y:TS2014:releaseDate 1980-05-23 PKU/2014-08-28 19
RDF Data Model Triple: Subject, Predicate (Property), Object (s; p; o) Subject: the entity that is described (URI or blank node) Predicate: a feature of the entity (URI) Object: value of the feature (URI, blank node or literal) (s; p; o) 2 (U [ B)  U  (U [ B [ L) Set of RDF triples is called an RDF graph U Predicate Subject Object U B U B L U: set of URIs B: set of blank nodes L: set of literals Subject Predicate Object http://...imdb.../
lm/2014 rdfs:label The Shining http://...imdb.../
lm/2014 movie:releaseDate 1980-05-23 http://...imdb.../29704 movie:actor name Jack Nicholson : : : : : : : : : PKU/2014-08-28 20
RDF Example Instance Pre
xes: mdb=http://data.linkedmdb.org/resource/; geo=http://sws.geonames.org/ bm=http://wifo5-03.informatik.uni-mannheim.de/bookmashup/ lexvo=http://lexvo.org/id/;wp=http://en.wikipedia.org/wiki/ Subject Predicate Object mdb:
lm/2014 rdfs:label The Shining mdb:
lm/2014 movie:initial release date 1980-05-23' mdb:
lm/2014 movie:director mdb:director/8476 mdb:
lm/2014 movie:actor mdb:actor/29704 mdb:
lm/2014 movie:actor mdb: actor/30013 mdb:
lm/2014 movie:music contributor mdb: music contributor/4110 mdb:
lm/2014 foaf:based near geo:2635167 mdb:
lm/2014 movie:relatedBook bm:0743424425 mdb:
lm/2014 movie:language lexvo:iso639-3/eng mdb:director/8476 movie:director name Stanley Kubrick mdb:
lm/2685 movie:director mdb:director/8476 mdb:
lm/2685 rdfs:label A Clockwork Orange mdb:
lm/424 movie:director mdb:director/8476 mdb:
lm/424 rdfs:label Spartacus mdb:actor/29704 movie:actor name Jack Nicholson mdb:
lm/1267 movie:actor mdb:actor/29704 mdb:
lm/1267 rdfs:label The Last Tycoon mdb:
lm/3418 movie:actor mdb:actor/29704 mdb:
lm/3418 rdfs:label The Passenger geo:2635167 gn:name United Kingdom geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer lexvo:iso639-3/eng rdfs:label English lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn URI Literal URI PKU/2014-08-28 21
RDF Graph United Kingdom gn:name The Passenger refs:label 62348447 gn:population mdb:
lm/2014 movie:initial release date 1980-05-23 bm:oers/0743424425amazonOer The Shining refs:label bm:books/0743424425 4.7 rev:rating geo:2635167 The Last Tycoon refs:label movie:actor movie:actor mdb:actor/29704 movie:actor name Jack Nicholson mdb:
lm/3418 mdb:
lm/1267 mdb:director/8476 Stanley Kubrick movie:director name mdb:
lm/2685 refs:label A Clockwork Orange mdb:
lm/424 refs:label Spartacus mdb:actor/30013 movie:relatedBook scam:hasOer foaf:based near movie:actor movie:director movie:actor movie:director movie:director PKU/2014-08-28 22
Linked Data Model [Hartig, 2012] Web Document Given a countably in
nite set D (documents), a Web of Linked Data is a tuple W = (D; adoc; data) where: I D  D, I adoc is a partial mapping from URIs to D, and I data is a total mapping from D to
nite sets of RDF triples. PKU/2014-08-28 23
Linked Data Model [Hartig, 2012] Web Document Given a countably in
nite set D (documents), a Web of Linked Data is a tuple W = (D; adoc; data) where: I D  D, I adoc is a partial mapping from URIs to D, and I data is a total mapping from D to
nite sets of RDF triples. Web of Linked Data A Web of Linked Data W = (D; adoc; data) contains a data link from document d 2 D to document d0 2 D if there exists a URI u such that: I u is mentioned in an RDF triple t 2 data(d), and I d0 = adoc(u). PKU/2014-08-28 23
RDF Query Model { SPARQL Query Model - SPARQL Protocol and RDF Query Language Given U (set of URIs), L (set of literals), and V (set of variables), a SPARQL expression is de
ned recursively: an atomic triple pattern, which is an element of (U [ V)  (U [ V)  (U [ V [ L) ?x rdfs:label The Shining P FILTER R, where P is a graph pattern expression and R is a built-in SPARQL condition (i.e., analogous to a SQL predicate) ?x rev:rating ?p FILTER(?p  3.0) P1 AND/OPT/UNION P2, where P1 and P2 are graph pattern expressions Example: SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame  S t a n l e y Kubr i ck  . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r  4 . 0 ) g PKU/2014-08-28 24
SPARQL Queries SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame  S t a n l e y Kubr i ck  . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r  4 . 0 ) g FILTER(?r  4.0) ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook Stanley Kubrick movie:director name ?r rev:rating PKU/2014-08-28 25
Outline 1 LOD and RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 26
Nave Triple Store Design SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame  S t a n l e y Kubr i ck  . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r  4 . 0 ) g Subject Property Object mdb:
lm/2014 rdfs:label The Shining mdb:
lm/2014 movie:initial release date 1980-05-23 mdb:
lm/2014 movie:director mdb:director/8476 mdb:
lm/2014 movie:actor mdb:actor/29704 mdb:
lm/2014 movie:actor mdb: actor/30013 mdb:
lm/2014 movie:music contributor mdb: music contributor/4110 mdb:
lm/2014 foaf:based near geo:2635167 mdb:
lm/2014 movie:relatedBook bm:0743424425 mdb:
lm/2014 movie:language lexvo:iso639-3/eng mdb:director/8476 movie:director name Stanley Kubrick mdb:
lm/2685 movie:director mdb:director/8476 mdb:
lm/2685 rdfs:label A Clockwork Orange mdb:
lm/424 movie:director mdb:director/8476 mdb:
lm/424 rdfs:label Spartacus mdb:actor/29704 movie:actor name Jack Nicholson mdb:
lm/1267 movie:actor mdb:actor/29704 mdb:
lm/1267 rdfs:label The Last Tycoon mdb:
lm/3418 movie:actor mdb:actor/29704 mdb:
lm/3418 rdfs:label The Passenger geo:2635167 gn:name United Kingdom geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer lexvo:iso639-3/eng rdfs:label English lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn PKU/2014-08-28 27
Nave Triple Store Design SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame  S t a n l e y Kubr i ck  . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r  4 . 0 ) g Subject Property Object mdb:
lm/2014 rdfs:label The Shining mdb:
lm/2014 movie:initial release date 1980-05-23 mdb:
lm/2014 movie:director mdb:director/8476 mdb:
lm/2014 movie:actor mdb:actor/29704 mdb:
lm/2014 movie:actor mdb: actor/30013 mdb:
lm/2014 movie:music contributor mdb: music contributor/4110 mdb:
lm/2014 foaf:based near geo:2635167 mdb:
lm/2014 movie:relatedBook bm:0743424425 mdb:
lm/2014 movie:language lexvo:iso639-3/eng mdb:director/8476 movie:director name Stanley Kubrick mdb:
lm/2685 movie:director mdb:director/8476 mdb:
lm/2685 rdfs:label A Clockwork Orange mdb:
lm/424 movie:director mdb:director/8476 mdb:
lm/424 rdfs:label Spartacus mdb:actor/29704 movie:actor name Jack Nicholson mdb:
lm/1267 movie:actor mdb:actor/29704 mdb:
lm/1267 rdfs:label The Last Tycoon mdb:
lm/3418 movie:actor mdb:actor/29704 mdb:
lm/3418 rdfs:label The Passenger geo:2635167 gn:name United Kingdom geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer lexvo:iso639-3/eng rdfs:label English lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn SELECT T1 . o b j e c t FROM T as T1 , T as T2 , T as T3 , T as T4 , T as T5 WHERE T1 . p= r d f s : l a b e l  AND T2 . p=movie : r e l a t e dBo o k  AND T3 . p=movie : d i r e c t o r  AND T4 . p= r e v : r a t i n g  AND T5 . p=movie : d i r e c t o r n ame  AND T1 . s=T2 . s AND T1 . s=T3 . s AND T2 . o=T4 . s AND T3 . o=T5 . s AND T4 . o  4 . 0 AND T5 . o= S t a n l e y Kubr i ck  PKU/2014-08-28 27
Nave Triple Store Design SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame  S t a n l e y Kubr i ck  . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r  4 . 0 ) g Subject Property Object mdb:
lm/2014 rdfs:label The Shining mdb:
lm/2014 movie:initial release date 1980-05-23 mdb:
lm/2014 movie:director mdb:director/8476 mdb:
lm/2014 movie:actor mdb:actor/29704 mdb:
lm/2014 movie:actor mdb: actor/30013 mdb:
lm/2014 movie:music contributor mdb: music contributor/4110 mdb:
lm/2014 foaf:based near geo:2635167 mdb:
lm/2014 movie:relatedBook bm:0743424425 mdb:
lm/2014 movie:language lexvo:iso639-3/eng mdb:director/8476 movie:director name Stanley Kubrick mdb:
lm/2685 movie:director mdb:director/8476 mdb:
lm/2685 rdfs:label A Clockwork Orange mdb:

Recommended

PDF
Web Data Management in the RDF Age
PPTX
RDF data model
PDF
20110728 datalift-rpi-troy
PDF
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
PDF
Web Data Management in RDF Age
PDF
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
PPTX
Introduction to RDF Data Model
ZIP
Intro to Linked Open Data in Libraries, Archives & Museums
PPTX
SWT Lecture Session 8 - Rules
PPTX
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
PPTX
When the Web of Linked Data Arrives
PDF
An Introduction to RDF and the Web of Data
PPT
Riding the wave - Paradigm shifts in information access
PDF
Connections that work: Linked Open Data demystified
ODP
2009 0807 Lod Gmod
PPTX
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
PDF
OAC Presentation at CNI 09 Fall Forum
PDF
KESW2012 Hackathon St Petersburg
PPTX
Linked Data in Libraries
PPTX
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
PDF
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
PPTX
What is #LODLAM?! (revised January 2015)
PDF
Linked open data and libraries
PDF
Methodological Guidelines for Publishing Linked Data
PDF
Learning Multilingual Semantics from Big Data on the Web
PPT
Open Knowledge Foundation Edinburgh meet-up #3
PDF
RDF, SPARQL and Semantic Repositories
ZIP
SemWeb Fundamentals - Info Linking & Layering in Practice
PDF
Linked Data

More Related Content

PDF
Web Data Management in the RDF Age
PPTX
RDF data model
PDF
20110728 datalift-rpi-troy
PDF
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
PDF
Web Data Management in RDF Age
PDF
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
PPTX
Introduction to RDF Data Model
Web Data Management in the RDF Age
RDF data model
20110728 datalift-rpi-troy
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Web Data Management in RDF Age
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
Introduction to RDF Data Model

What's hot

ZIP
Intro to Linked Open Data in Libraries, Archives & Museums
PPTX
SWT Lecture Session 8 - Rules
PPTX
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
PPTX
When the Web of Linked Data Arrives
PDF
An Introduction to RDF and the Web of Data
PPT
Riding the wave - Paradigm shifts in information access
PDF
Connections that work: Linked Open Data demystified
ODP
2009 0807 Lod Gmod
PPTX
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
PDF
OAC Presentation at CNI 09 Fall Forum
PDF
KESW2012 Hackathon St Petersburg
PPTX
Linked Data in Libraries
PPTX
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
PDF
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
PPTX
What is #LODLAM?! (revised January 2015)
PDF
Linked open data and libraries
PDF
Methodological Guidelines for Publishing Linked Data
PDF
Learning Multilingual Semantics from Big Data on the Web
PPT
Open Knowledge Foundation Edinburgh meet-up #3
PDF
RDF, SPARQL and Semantic Repositories
Intro to Linked Open Data in Libraries, Archives & Museums
SWT Lecture Session 8 - Rules
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
When the Web of Linked Data Arrives
An Introduction to RDF and the Web of Data
Riding the wave - Paradigm shifts in information access
Connections that work: Linked Open Data demystified
2009 0807 Lod Gmod
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
OAC Presentation at CNI 09 Fall Forum
KESW2012 Hackathon St Petersburg
Linked Data in Libraries
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
What is #LODLAM?! (revised January 2015)
Linked open data and libraries
Methodological Guidelines for Publishing Linked Data
Learning Multilingual Semantics from Big Data on the Web
Open Knowledge Foundation Edinburgh meet-up #3
RDF, SPARQL and Semantic Repositories

Similar to Web Data Management with RDF

ZIP
SemWeb Fundamentals - Info Linking & Layering in Practice
PDF
Linked Data
PDF
Query-Driven Management of Linked Data Quality
PPTX
Hack U Barcelona 2011
PPT
Exploring and using the Semantic Web - SSSW09 tutorial
PDF
Linked Data (1st Linked Data Meetup Malmö)
PDF
Culture Geeks Feb talk: Adventures in Linked Data Land
ODP
Linked Data
ODP
State of the Semantic Web
PDF
Linked Open Data
PDF
Connecting the Dots: Constellations in the Linked Data Universe
PPT
Linked Data Tutorial
PDF
Danbri Drupalcon Export
KEY
Transmission6 - Publishing Linked Data
PPTX
The Semantic Data Web, Sören Auer, University of Leipzig
ODP
Linked Data
PPTX
Omitola birmingham cityuniv
PPT
Webofdata
PPTX
Linked Open Data Utrecht University Library
ODP
Linked data and applications
SemWeb Fundamentals - Info Linking & Layering in Practice
Linked Data
Query-Driven Management of Linked Data Quality
Hack U Barcelona 2011
Exploring and using the Semantic Web - SSSW09 tutorial
Linked Data (1st Linked Data Meetup Malmö)
Culture Geeks Feb talk: Adventures in Linked Data Land
Linked Data
State of the Semantic Web
Linked Open Data
Connecting the Dots: Constellations in the Linked Data Universe
Linked Data Tutorial
Danbri Drupalcon Export
Transmission6 - Publishing Linked Data
The Semantic Data Web, Sören Auer, University of Leipzig
Linked Data
Omitola birmingham cityuniv
Webofdata
Linked Open Data Utrecht University Library
Linked data and applications

Recently uploaded

PDF
Lets Build a Serverless Function with Kiro
PDF
[BDD 2025 - Full-Stack Development] PHP in AI Age: The Laravel Way. (Rizqy Hi...
PDF
Running Non-Cloud-Native Databases in Cloud-Native Environments_ Challenges a...
PDF
Top 10 AI Development Companies in UK 2025
PDF
Dev Dives: Build smarter agents with UiPath Agent Builder
PDF
[BDD 2025 - Mobile Development] Crafting Immersive UI with E2E and AGSL Shade...
PDF
10 Best Automation QA Testing Software Tools in 2025.pdf
PDF
Top Crypto Supers 15th Report November 2025
PDF
Beyond Basics: How to Build Scalable, Intelligent Imagery Pipelines
PDF
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
PDF
MuleSoft Meetup: Dreamforce'25 Tour- Vibing With AI & Agents.pdf
PDF
[BDD 2025 - Full-Stack Development] The Modern Stack: Building Web & AI Appli...
PDF
DUBAI IT MODERNIZATION WITH AZURE MANAGED SERVICES.pdf
PDF
Cybersecurity Prevention and Detection: Unit 2
PPTX
"Feelings versus facts: why metrics are more important than intuition", Igor ...
 
PDF
KMWorld - KM & AI Bring Collectivity, Nostalgia, & Selectivity
PDF
Oracle MySQL HeatWave - One Page - Version 3
PDF
[BDD 2025 - Artificial Intelligence] Building AI Systems That Users (and Comp...
PPTX
How to Choose the Right Vendor for ADA PDF Accessibility and Compliance in 2026
PDF
Rolling out Enterprise AI: Tools, Insights, and Team Empowerment
Lets Build a Serverless Function with Kiro
[BDD 2025 - Full-Stack Development] PHP in AI Age: The Laravel Way. (Rizqy Hi...
Running Non-Cloud-Native Databases in Cloud-Native Environments_ Challenges a...
Top 10 AI Development Companies in UK 2025
Dev Dives: Build smarter agents with UiPath Agent Builder
[BDD 2025 - Mobile Development] Crafting Immersive UI with E2E and AGSL Shade...
10 Best Automation QA Testing Software Tools in 2025.pdf
Top Crypto Supers 15th Report November 2025
Beyond Basics: How to Build Scalable, Intelligent Imagery Pipelines
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
MuleSoft Meetup: Dreamforce'25 Tour- Vibing With AI & Agents.pdf
[BDD 2025 - Full-Stack Development] The Modern Stack: Building Web & AI Appli...
DUBAI IT MODERNIZATION WITH AZURE MANAGED SERVICES.pdf
Cybersecurity Prevention and Detection: Unit 2
"Feelings versus facts: why metrics are more important than intuition", Igor ...
 
KMWorld - KM & AI Bring Collectivity, Nostalgia, & Selectivity
Oracle MySQL HeatWave - One Page - Version 3
[BDD 2025 - Artificial Intelligence] Building AI Systems That Users (and Comp...
How to Choose the Right Vendor for ADA PDF Accessibility and Compliance in 2026
Rolling out Enterprise AI: Tools, Insights, and Team Empowerment

Web Data Management with RDF

  • 1.
    Web Data Managementin RDF Age M. Tamer Ozsu University of Waterloo David R. Cheriton School of Computer Science PKU/2014-08-28 1
  • 2.
    Acknowledgements This presentationdraws upon collaborative research and discussions with the following colleagues (in alphabetical order) Gunes Aluc, University of Waterloo Khuzaima Daudjee, University of Waterloo Olaf Hartig, University of Waterloo Lei Chen, Hong Kong University of Science Technology Lei Zou, Peking University PKU/2014-08-28 2
  • 3.
    Web Data ManagementA long term research interest in the DB community 2000 2004 2011 2011 PKU/2014-08-28 3
  • 4.
    Interest Due toProperties of Web Data Lack of a schema Data is at best semi-structured Missing data, additional attributes, similar data but not identical Volatility Changes frequently May conform to one schema now, but not later Scale Does it make sense to talk about a schema for Web? How do you capture everything? Querying diculty What is the user language? What are the primitives? Arent search engines or metasearch engines sucient? PKU/2014-08-28 4
  • 5.
    More Recent Approachesto Web Querying Fusion Tables Users contribute data in spreadsheet, CVS, KML format Possible joins between multiple data sets Extensive visualization PKU/2014-08-28 8
  • 6.
    More Recent Approachesto Web Querying Fusion Tables Users contribute data in spreadsheet, CVS, KML format Possible joins between multiple data sets Extensive visualization XML Data exchange language Primarily tree based structure list title=MOVIES film titleThe Shining/title directorStanley Kubrick/director actorJack Nicholson/actor /film film titleSpartacus/title directorStanley Kubrick/director /film film titleThe Passenger/title actorJack Nicholson/actor /film ... /list root
  • 7.
    lm title TheShining director Stanley Kubrick
  • 8.
    lm actor ...Jack Nicholson
  • 9.
    lm title ThePassenger actor Jack Nicholson PKU/2014-08-28 8
  • 10.
    More Recent Approachesto Web Querying Fusion Tables Users contribute data in spreadsheet, CVS, KML format Possible joins between multiple data sets Extensive visualization XML Data exchange language Primarily tree based structure RDF (Resource Description Framework) SPARQL W3C recommendation Simple, self-descriptive model Building block of semantic web Linked Open Data (LOD) PKU/2014-08-28 8
  • 11.
    RDF and SemanticWeb RDF is a language for the conceptual modeling of information about resources (web resources in our context) A building block of semantic web Facilitates exchange of information Search engine results can be more focused and structured Facilitates data integration (mashes) Machine understandable Understand the information on the web and the interrelationships among them PKU/2014-08-28 9
  • 12.
    RDF Uses Yagoand DBpedia extract facts from Wikipedia represent as RDF ! structural queries Communities build RDF data E.g., biologists: Bio2RDF and Uniprot RDF Web data integration Linked Open Data Cloud . . . PKU/2014-08-28 10
  • 13.
    RDF Data Volumes. . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year PKU/2014-08-28 11
  • 14.
    RDF Data Volumes. . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year As of March 2009 LinkedCT Reactome Taxonomy KEGG GeneID PubMed Pfam UniProt OMIM PDB BBC Later + TOTP riese Symbol ChEBI Daily Med Disea-some CAS HGNC Inter Pro Drug Bank UniParc UniRef ProDom PROSITE Gene Ontology Homolo Gene Pub Chem MGI UniSTS GEO Species Jamendo BBC Programm es Music-brainz Magna-tune Surge Radio MySpace Wrapper Audio- Scrobbler Linked MDB BBC John Peel BBC Playcount Data Gov- Track US Census Data Geo-names lingvoj World Fact-book Euro-stat IRIT Toulouse SW Conference Corpus RDF Book Mashup Project Guten-berg DBLP Hannover DBLP Berlin LAAS-CNRS Buda-pest BME IEEE IBM Resex Pisa New-castle RAE 2001 CiteSeer ACM DBLP RKB Explorer eprints LIBRIS Semantic Web.org Eurécom ECS South-ampton SIOC Revyu Sites Doap-space Flickr exporter FOAF profiles flickr wrappr Crunch Base Sem- Web- Central Open- Guides Wiki-company QDOS Pub Guide Open Calais RDF ohloh W3C WordNet Open Cyc UMBEL Yago DBpedia Freebase Virtuoso Sponger March '09: 89 datasets Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ PKU/2014-08-28 11
  • 15.
    RDF Data Volumes. . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year New-castle User-generated content As of September 2010 Audio-scrobbler (DBTune) Music Brainz (zitgist) P20 YAGO Chronic-ling America World Fact-book (FUB) Geo Names Moseley Folk WordNet (VUA) WordNet (W3C) VIVO UF VIVO Indiana VIVO Cornell VIAF URI Burner Sussex Reading Lists Plymouth Reading Lists UMBEL UK Post-codes legislation .gov.uk Uberblic UB Mann-heim TWC LOGD GTAA BBC Program mes Twarql transport data.gov .uk totl.net Tele-graphis TCM Gene DIT Taxon Concept The Open Library (Talis) t4gm Surge Radio RAMEAU SH STW statistics data.gov .uk St. Andrews Resource Lists ECS South-ampton EPrints Semantic Crunch Base semantic web.org Semantic XBRL SW Dog Food rdfabout US SEC Wiki UN/ LOCODE Ulm ECS (RKB Explorer) Roma RISKS RESEX RAE2001 Pisa OS OAI NSF LAAS KISTI JISC IRIT IEEE IBM Eurécom ERA ePrints dotAC DEPLOY DBLP (RKB Explorer) Course-ware CORDIS CiteSeer Budapest ACM riese Revyu research data.gov .uk reference data.gov .uk Recht-spraak. nl RDF ohloh Last.FM (rdfize) RDF Book Mashup PSH lingvoj Product DB Poké-pédia PBAC Ord-nance Survey Openly Local The Open Library Open Cyc Open Calais OpenEI New York Times NTU Resource Lists NDL subjects MARC Codes List Man-chester Reading Lists Lotico The London Gazette LOIUS lobid Resources lobid Organi-sations Linked MDB Linked LCCN Linked GeoData Linked CT Linked Open Numbers LIBRIS Lexvo LCSH DBLP (L3S) Linked Sensor Data (Kno.e.sis) Good-win Family Jamendo iServe NSZL Catalog GovTrack GESIS Geo Species Geo Linked Data (es) STITCH Project Guten-berg (FUB) SIDER Medi Care Euro-stat (FUB) Drug Bank Disea-some DBLP (FU Berlin) Daily Med Freebase flickr wrappr Fishes of Texas FanHubz Event- Media EUTC Produc-tions Eurostat EUNIS ESD stan-dards Popula-tion (En- AKTing) NHS (EnAKTing) Mortality (En- AKTing) Energy (En- AKTing) CO2 (En- AKTing) education data.gov .uk ECS South-ampton Gem. Norm-datei data dcs MySpace (DBTune) Music Brainz (DBTune) Magna-tune John Peel (DB Tune) classical (DB Tune) Last.fm Artists (DBTune) DB Tropes dbpedia lite DBpedia Pokedex Airports NASA (Data Incu-bator) Music Brainz (Data Incubator) Discogs (Data In-cubator) Climbing Linked Data for Intervals Cornetto Chem2 Bio2RDF biz. data. gov.uk UniRef UniSTS Uni Path-way Taxo-nomy UniParc UniProt SGD Reactome PubMed Pub Chem Pfam PDB PRO-SITE ProDom OMIM OBO MGI KEGG Reaction KEGG Drug KEGG Pathway KEGG Glycan KEGG Enzyme KEGG Cpd InterPro Homolo Gene HGNC Gene Ontology GeneID Gen Bank ChEBI CAS Affy-metrix BibBase BBC Wildlife Finder BBC Music rdfabout US Census Media Geographic Publications Government Cross-domain Life sciences September '10: 203 datasets Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ PKU/2014-08-28 11
  • 16.
    RDF Data Volumes. . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year RESEX IBM User-generated content As of September 2011 Audio Scrobbler (DBTune) Music Brainz (zitgist) P20 Turismo de Zaragoza yovisto Yahoo! Geo Planet World Fact-book Moseley Folk YAGO El Viajero Tourism BBC Program mes BBC Geo Names WordNet (VUA) WordNet (W3C) VIVO UF Calames VIVO Indiana VIVO Cornell VIAF URI Burner Sussex Reading Lists Plymouth Reading Lists Source Code Ecosystem Linked Data UniProt PubMed UniRef UMBEL UK Post-codes legislation data.gov.uk Uberblic UB Mann-heim TWC LOGD Twarql transport data.gov. uk Traffic Scotland theses. fr Thesau-rus W totl.net Tele-graphis Semantic Tweet TCM Gene DIT Taxon Concept Open Library (Talis) tags2con delicious t4gm info Swedish Open Cultural Heritage Surge Radio Sudoc STW RAMEAU SH statistics data.gov. uk St. Andrews Resource Lists ECS South-ampton EPrints SSW Thesaur us Linked User Feedback gnoss Greek DBpedia Smart Link Slideshare 2RDF semantic web.org GovTrack Semantic XBRL SW Dog Food US SEC (rdfabout) Sears Scotland Pupils Exams Scotland Geo-graphy Scholaro-meter WordNet (RKB Explorer) Wiki UN/ LOCODE Ulm ECS (RKB Explorer) Roma RISKS RAE2001 Pisa OS OAI NSF New-castle LAAS KISTI JISC IRIT IEEE Eurécom ERA ePrints dotAC DEPLOY DBLP (RKB Explorer) Crime Reports UK Course-ware CORDIS (RKB Explorer) CiteSeer Budapest ACM riese Revyu research data.gov. Ren. uk Energy Genera-tors reference data.gov. uk Recht-spraak. nl RDF ohloh Last.FM (rdfize) RDF Book Mashup Rådata nå! PSH Product Types Ontology Product DB PBAC Poké-pédia patents data.go v.uk Ox Points Ord-nance Survey Openly Local Open Library Open Cyc Open Corpo-rates Open Calais OpenEI Open Election Data Project Open Data Thesau-rus Ontos News Portal OGOLOD Ocean Drilling Codices Janus AMP New York Times NVD ntnusc NTU Resource Lists Norwe-gian MeSH NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico Weather Stations London Gazette LOIUS Linked Open Colors lobid Resources lobid Organi-sations LEM Linked MDB LinkedL CCN Linked GeoData LinkedCT LOV Linked Open Numbers LODE Eurostat (Ontology Central) Linked EDGAR (Ontology Central) Linked Crunch-base lingvoj Lichfield Spen-ding LIBRIS Lexvo LCSH DBLP (L3S) Linked Sensor Data (Kno.e.sis) Klapp-stuhl-club Good-win Family National Radio-activity JP Jamendo (DBtune) Italian public schools ISTAT Immi-gration iServe IdRef Sudoc NSZL Catalog Hellenic PD Hellenic FBD Piedmont Accomo-dations GovWILD Google Art wrapper GESIS GeoWord Net Geo Species Geo Linked Data GEMET GTAA STITCH SIDER Project Guten-berg Medi Care Euro-stat (FUB) EURES Drug Bank Disea-some DBLP (FU Berlin) Daily Med CORDIS (FUB) Freebase flickr wrappr Fishes of Texas Finnish Munici-palities ChEMBL FanHubz Event Media EUTC Produc-tions Eurostat Europeana EUNIS EU Insti-tutions ESD stan-dards EARTh Enipedia Popula-tion (En- AKTing) NHS (En- AKTing) Mortality (En- AKTing) Energy (En- AKTing) Crime (En- AKTing) CO2 Emission (En- AKTing) EEA SISVU educatio n.data.g ov.uk ECS South-ampton ECCO-TCP GND Didactal ia DDC Deutsche Bio-graphie data dcs Music Brainz (DBTune) Magna-tune John Peel (DBTune) Classical (DB Tune) Last.FM artists (DBTune) DB Tropes Portu-guese DBpedia dbpedia lite DBpedia data-open-ac- uk SMC Journals Pokedex Airports NASA (Data Incu-bator) Music Brainz (Data Incubator) Metoffice Weather Forecasts Discogs (Data Incubator) Climbing data.gov.uk intervals Data Gov.ie data bnf.fr Cornetto reegle Chronic-ling America Chem2 Bio2RDF business data.gov. uk Bricklink Brazilian Poli-ticians BNB UniSTS UniPath way UniParc Taxono my UniProt (Bio2RDF) SGD Reactome Pub Chem PRO-SITE ProDom Pfam PDB OMIM MGI KEGG Reaction KEGG Pathway KEGG Glycan KEGG Enzyme KEGG Drug KEGG Com-pound InterPro Homolo Gene HGNC Gene Ontology GeneID Affy-metrix bible ontology BibBase FTS BBC Wildlife Finder Music Alpine Ski Austria LOCAH Amster-dam Museum AGROV OC AEMET US Census (rdfabout) Media Geographic Publications Government Cross-domain Life sciences September '11: 295 datasets, 25B triples Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ PKU/2014-08-28 11
  • 17.
    RDF Data Volumes. . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year April '14: 1091 datasets, ??? triples Max Schmachtenberg, Christian Bizer, and Heiko Paulheim: Adoption of Linked Data Best Practices in Dierent Topical Domains. In Proc. ISWC, 2014. PKU/2014-08-28 11
  • 18.
  • 19.
    Globally Distributed Networkof Data PKU/2014-08-28 13
  • 20.
    Three Approaches Datawarehousing Consolidate data in a repository and query it SPARQL federation Leverage query services provided by data publishers Live Linked Data querying Navigate through LOD by looking up URIs at query execution time PKU/2014-08-28 14
  • 21.
    Outline 1 LODand RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 15
  • 22.
    Outline 1 LODand RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 16
  • 23.
    Traditional Hypertext-based WebAccess IMDb World Book Data exposed to the Web via HTML PKU/2014-08-28 17
  • 24.
    Linked Data PublishingPrinciples (http://...linkedmdb.../Shining,releaseDate, 23 May 1980) (http://...linkedmdb.../Shining,
  • 25.
    lmLocation, http://cia.../UK) (http://...linkedmdb.../29704,actedIn,http://...linkedmdb.../Shining) IMDb World Book ... (http://cia.../UK, hasPopulation, 63230000) ... Shining UK Data model: RDF Global identi
  • 26.
    er: URI Accessmechanism: HTTP Connection: data links PKU/2014-08-28 18
  • 27.
    RDF Introduction Everythingis an uniquely named resource http://data.linkedmdb.org/resource/actor/JN29704 PKU/2014-08-28 19
  • 28.
    RDF Introduction Everythingis an uniquely named resource Pre
  • 29.
    xes can beused to shorten the names xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 PKU/2014-08-28 19
  • 30.
    RDF Introduction Everythingis an uniquely named resource Pre
  • 31.
    xes can beused to shorten the names Properties of resources can be de
  • 32.
    ned xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704y:JN29704:hasName Jack Nicholson y:JN29704:BornOnDate 1937-04-22 PKU/2014-08-28 19
  • 33.
    RDF Introduction Everythingis an uniquely named resource Pre
  • 34.
    xes can beused to shorten the names Properties of resources can be de
  • 35.
    ned Relationships withother resources can be de
  • 36.
    ned xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704y:JN29704:hasName Jack Nicholson y:JN29704:BornOnDate 1937-04-22 JN29704:movieActor y:TS2014 y:TS2014:title The Shining y:TS2014:releaseDate 1980-05-23 PKU/2014-08-28 19
  • 37.
    RDF Introduction Everythingis an uniquely named resource Pre
  • 38.
    xes can beused to shorten the names Properties of resources can be de
  • 39.
    ned Relationships withother resources can be de
  • 40.
    ned Resource descriptionscan be contributed by dierent people/groups and can be located anywhere in the web Integrated web database xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 y:JN29704:hasName Jack Nicholson y:JN29704:BornOnDate 1937-04-22 JN29704:movieActor y:TS2014 y:TS2014:title The Shining y:TS2014:releaseDate 1980-05-23 PKU/2014-08-28 19
  • 41.
    RDF Data ModelTriple: Subject, Predicate (Property), Object (s; p; o) Subject: the entity that is described (URI or blank node) Predicate: a feature of the entity (URI) Object: value of the feature (URI, blank node or literal) (s; p; o) 2 (U [ B) U (U [ B [ L) Set of RDF triples is called an RDF graph U Predicate Subject Object U B U B L U: set of URIs B: set of blank nodes L: set of literals Subject Predicate Object http://...imdb.../
  • 42.
    lm/2014 rdfs:label TheShining http://...imdb.../
  • 43.
    lm/2014 movie:releaseDate 1980-05-23http://...imdb.../29704 movie:actor name Jack Nicholson : : : : : : : : : PKU/2014-08-28 20
  • 44.
  • 45.
    xes: mdb=http://data.linkedmdb.org/resource/; geo=http://sws.geonames.org/bm=http://wifo5-03.informatik.uni-mannheim.de/bookmashup/ lexvo=http://lexvo.org/id/;wp=http://en.wikipedia.org/wiki/ Subject Predicate Object mdb:
  • 46.
  • 47.
    lm/2014 movie:initial releasedate 1980-05-23' mdb:
  • 48.
  • 49.
  • 50.
    lm/2014 movie:actor mdb:actor/30013 mdb:
  • 51.
    lm/2014 movie:music contributormdb: music contributor/4110 mdb:
  • 52.
    lm/2014 foaf:based neargeo:2635167 mdb:
  • 53.
  • 54.
    lm/2014 movie:language lexvo:iso639-3/engmdb:director/8476 movie:director name Stanley Kubrick mdb:
  • 55.
  • 56.
    lm/2685 rdfs:label AClockwork Orange mdb:
  • 57.
  • 58.
    lm/424 rdfs:label Spartacusmdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 59.
  • 60.
    lm/1267 rdfs:label TheLast Tycoon mdb:
  • 61.
  • 62.
    lm/3418 rdfs:label ThePassenger geo:2635167 gn:name United Kingdom geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer lexvo:iso639-3/eng rdfs:label English lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn URI Literal URI PKU/2014-08-28 21
  • 63.
    RDF Graph UnitedKingdom gn:name The Passenger refs:label 62348447 gn:population mdb:
  • 64.
    lm/2014 movie:initial releasedate 1980-05-23 bm:oers/0743424425amazonOer The Shining refs:label bm:books/0743424425 4.7 rev:rating geo:2635167 The Last Tycoon refs:label movie:actor movie:actor mdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 65.
  • 66.
    lm/1267 mdb:director/8476 StanleyKubrick movie:director name mdb:
  • 67.
    lm/2685 refs:label AClockwork Orange mdb:
  • 68.
    lm/424 refs:label Spartacusmdb:actor/30013 movie:relatedBook scam:hasOer foaf:based near movie:actor movie:director movie:actor movie:director movie:director PKU/2014-08-28 22
  • 69.
    Linked Data Model[Hartig, 2012] Web Document Given a countably in
  • 70.
    nite set D(documents), a Web of Linked Data is a tuple W = (D; adoc; data) where: I D D, I adoc is a partial mapping from URIs to D, and I data is a total mapping from D to
  • 71.
    nite sets ofRDF triples. PKU/2014-08-28 23
  • 72.
    Linked Data Model[Hartig, 2012] Web Document Given a countably in
  • 73.
    nite set D(documents), a Web of Linked Data is a tuple W = (D; adoc; data) where: I D D, I adoc is a partial mapping from URIs to D, and I data is a total mapping from D to
  • 74.
    nite sets ofRDF triples. Web of Linked Data A Web of Linked Data W = (D; adoc; data) contains a data link from document d 2 D to document d0 2 D if there exists a URI u such that: I u is mentioned in an RDF triple t 2 data(d), and I d0 = adoc(u). PKU/2014-08-28 23
  • 75.
    RDF Query Model{ SPARQL Query Model - SPARQL Protocol and RDF Query Language Given U (set of URIs), L (set of literals), and V (set of variables), a SPARQL expression is de
  • 76.
    ned recursively: anatomic triple pattern, which is an element of (U [ V) (U [ V) (U [ V [ L) ?x rdfs:label The Shining P FILTER R, where P is a graph pattern expression and R is a built-in SPARQL condition (i.e., analogous to a SQL predicate) ?x rev:rating ?p FILTER(?p 3.0) P1 AND/OPT/UNION P2, where P1 and P2 are graph pattern expressions Example: SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame S t a n l e y Kubr i ck . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r 4 . 0 ) g PKU/2014-08-28 24
  • 77.
    SPARQL Queries SELECT?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame S t a n l e y Kubr i ck . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r 4 . 0 ) g FILTER(?r 4.0) ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook Stanley Kubrick movie:director name ?r rev:rating PKU/2014-08-28 25
  • 78.
    Outline 1 LODand RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 26
  • 79.
    Nave Triple StoreDesign SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame S t a n l e y Kubr i ck . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r 4 . 0 ) g Subject Property Object mdb:
  • 80.
  • 81.
    lm/2014 movie:initial releasedate 1980-05-23 mdb:
  • 82.
  • 83.
  • 84.
    lm/2014 movie:actor mdb:actor/30013 mdb:
  • 85.
    lm/2014 movie:music contributormdb: music contributor/4110 mdb:
  • 86.
    lm/2014 foaf:based neargeo:2635167 mdb:
  • 87.
  • 88.
    lm/2014 movie:language lexvo:iso639-3/engmdb:director/8476 movie:director name Stanley Kubrick mdb:
  • 89.
  • 90.
    lm/2685 rdfs:label AClockwork Orange mdb:
  • 91.
  • 92.
    lm/424 rdfs:label Spartacusmdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 93.
  • 94.
    lm/1267 rdfs:label TheLast Tycoon mdb:
  • 95.
  • 96.
    lm/3418 rdfs:label ThePassenger geo:2635167 gn:name United Kingdom geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer lexvo:iso639-3/eng rdfs:label English lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn PKU/2014-08-28 27
  • 97.
    Nave Triple StoreDesign SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame S t a n l e y Kubr i ck . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r 4 . 0 ) g Subject Property Object mdb:
  • 98.
  • 99.
    lm/2014 movie:initial releasedate 1980-05-23 mdb:
  • 100.
  • 101.
  • 102.
    lm/2014 movie:actor mdb:actor/30013 mdb:
  • 103.
    lm/2014 movie:music contributormdb: music contributor/4110 mdb:
  • 104.
    lm/2014 foaf:based neargeo:2635167 mdb:
  • 105.
  • 106.
    lm/2014 movie:language lexvo:iso639-3/engmdb:director/8476 movie:director name Stanley Kubrick mdb:
  • 107.
  • 108.
    lm/2685 rdfs:label AClockwork Orange mdb:
  • 109.
  • 110.
    lm/424 rdfs:label Spartacusmdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 111.
  • 112.
    lm/1267 rdfs:label TheLast Tycoon mdb:
  • 113.
  • 114.
    lm/3418 rdfs:label ThePassenger geo:2635167 gn:name United Kingdom geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer lexvo:iso639-3/eng rdfs:label English lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn SELECT T1 . o b j e c t FROM T as T1 , T as T2 , T as T3 , T as T4 , T as T5 WHERE T1 . p= r d f s : l a b e l AND T2 . p=movie : r e l a t e dBo o k AND T3 . p=movie : d i r e c t o r AND T4 . p= r e v : r a t i n g AND T5 . p=movie : d i r e c t o r n ame AND T1 . s=T2 . s AND T1 . s=T3 . s AND T2 . o=T4 . s AND T3 . o=T5 . s AND T4 . o 4 . 0 AND T5 . o= S t a n l e y Kubr i ck PKU/2014-08-28 27
  • 115.
    Nave Triple StoreDesign SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame S t a n l e y Kubr i ck . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r 4 . 0 ) g Subject Property Object mdb:
  • 116.
  • 117.
    lm/2014 movie:initial releasedate 1980-05-23 mdb:
  • 118.
  • 119.
  • 120.
    lm/2014 movie:actor mdb:actor/30013 mdb:
  • 121.
    lm/2014 movie:music contributormdb: music contributor/4110 mdb:
  • 122.
    lm/2014 foaf:based neargeo:2635167 mdb:
  • 123.
  • 124.
    lm/2014 movie:language lexvo:iso639-3/engmdb:director/8476 movie:director name Stanley Kubrick mdb:
  • 125.
  • 126.
    lm/2685 rdfs:label AClockwork Orange mdb:
  • 127.
  • 128.
    lm/424 rdfs:label Spartacusmdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 129.
  • 130.
    lm/1267 rdfs:label TheLast Tycoon mdb:
  • 131.
  • 132.
    lm/3418 rdfs:label ThePassenger geo:2635167 gn:name United Kingdom geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer lexvo:iso639-3/eng rdfs:label English lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn Easy to implement but too many self-joins! SELECT T1 . o b j e c t FROM T as T1 , T as T2 , T as T3 , T as T4 , T as T5 WHERE T1 . p= r d f s : l a b e l AND T2 . p=movie : r e l a t e dBo o k AND T3 . p=movie : d i r e c t o r AND T4 . p= r e v : r a t i n g AND T5 . p=movie : d i r e c t o r n ame AND T1 . s=T2 . s AND T1 . s=T3 . s AND T2 . o=T4 . s AND T3 . o=T5 . s AND T4 . o 4 . 0 AND T5 . o= S t a n l e y Kubr i ck PKU/2014-08-28 27
  • 133.
    Exhaustive Indexing RDF-3X[Neumann and Weikum, 2008, 2009], Hexastore [Weiss et al., 2008] Strings are mapped to ids using a mapping table Original triple table Subject Property Object mdb:
  • 134.
  • 135.
    lm/2014 movie:initial releasedate 1980-05-23 mdb:director/8476 movie:director name Stanley Kubrick mdb:
  • 136.
    lm/2685 movie:director mdb:director/8476Encoded triple table Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 Mapping table ID Value 0 mdb:
  • 137.
    lm/2014 1 rdfs:label2 The Shining 3 movie:initial release date 4 1980-05-23 5 mdb:director/8476 6 movie:director name 7 Stanley Kubrick 8 mdb:
  • 138.
    lm/2685 9 movie:directorPKU/2014-08-28 28
  • 139.
    Exhaustive Indexing RDF-3X[Neumann and Weikum, 2008, 2009], Hexastore [Weiss et al., 2008] Strings are mapped to ids using a mapping table Triples are indexed in a clustered B+ tree in lexicographic order Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... B+ tree Easy querying through mapping table PKU/2014-08-28 28
  • 140.
    Exhaustive Indexing RDF-3X[Neumann and Weikum, 2008, 2009], Hexastore [Weiss et al., 2008] Strings are mapped to ids using a mapping table Triples are indexed in a clustered B+ tree in lexicographic order Create indexes for permutations of the three columns: SPO, SOP, PSO, POS, OPS, OSP Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... B+ tree Easy querying through mapping table PKU/2014-08-28 28
  • 141.
    Exhaustive Indexing{Query ExecutionEach triple pattern can be answered by a range query Joins between triple patterns computed using merge join Join order is easy due to extensive indexing Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... ID Value 0 mdb:
  • 142.
    lm/2014 1 rdfs:label2 The Shining 3 movie:initial release date 4 1980-05-23 5 mdb:director/8476 6 movie:director name 7 Stanley Kubrick 8 mdb:
  • 143.
    lm/2685 9 movie:directorPKU/2014-08-28 29
  • 144.
    Exhaustive Indexing{Query ExecutionEach triple pattern can be answered by a range query Joins between triple patterns computed using merge join Join order is easy due to extensive indexing Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... ID Value 0 mdb:
  • 145.
    lm/2014 1 rdfs:label2 The Shining 3 movie:initial release date 4 1980-05-23 5 mdb:director/8476 6 movie:director name 7 Stanley Kubrick 8 mdb:
  • 146.
    lm/2685 9 movie:directorAdvantages I Eliminates some of the joins { they become range queries I Merge join is easy and fast PKU/2014-08-28 29
  • 147.
    Exhaustive Indexing{Query ExecutionEach triple pattern can be answered by a range query Joins between triple patterns computed using merge join Join order is easy due to extensive indexing Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... ID Value 0 mdb:
  • 148.
    lm/2014 1 rdfs:label2 The Shining 3 movie:initial release date 4 1980-05-23 5 mdb:director/8476 6 movie:director name 7 Stanley Kubrick 8 mdb:
  • 149.
    lm/2685 9 movie:directorAdvantages I Eliminates some of the joins { they become range queries I Merge join is easy and fast Disadvantages I Space usage PKU/2014-08-28 29
  • 150.
    Property Tables Groupingby entities; Jena [Wilkinson, 2006], DB2-RDF [Bornea et al., 2013] Clustered property table: group together the properties that tend to occur in the same (or similar) subjects Property-class table: cluster the subjects with the same type of property into one property table Subject Property Object mdb:
  • 151.
  • 152.
  • 153.
  • 154.
    lm/2685 rdfs:label AClockwork Orange mdb:actor/29704 movie:actor name Jack Nicholson : : : : : : : : : Subject refs:label movie:director mob:
  • 155.
    lm/2014 The Shiningmob:director/8476 mob:
  • 156.
    lm/2685 The ClockworkOrange mob:director/8476 Subject movie:actor name mdb:actor Jack Nicholson PKU/2014-08-28 30
  • 157.
    Property Tables Groupingby entities; Jena [Wilkinson, 2006], DB2-RDF [Bornea et al., 2013] Clustered property table: group together the properties that tend to occur in the same (or similar) subjects Property-class table: cluster the subjects with the same type of property into one property table Subject Property Object mdb:
  • 158.
  • 159.
  • 160.
  • 161.
    lm/2685 rdfs:label AClockwork Orange mdb:actor/29704 movie:actor name Jack Nicholson : : : : : : : : : Subject refs:label movie:director mob:
  • 162.
    lm/2014 The Shiningmob:director/8476 mob:
  • 163.
    lm/2685 The ClockworkOrange mob:director/8476 Subject movie:actor name mdb:actor Jack Nicholson Advantages I Fewer joins I If the data is structured, we have a relational system { similar to normalized relations PKU/2014-08-28 30
  • 164.
    Property Tables Groupingby entities; Jena [Wilkinson, 2006], DB2-RDF [Bornea et al., 2013] Clustered property table: group together the properties that tend to occur in the same (or similar) subjects Property-class table: cluster the subjects with the same type of property into one property table Subject Property Object mdb:
  • 165.
  • 166.
  • 167.
  • 168.
    lm/2685 rdfs:label AClockwork Orange mdb:actor/29704 movie:actor name Jack Nicholson : : : : : : : : : Subject refs:label movie:director mob:
  • 169.
    lm/2014 The Shiningmob:director/8476 mob:
  • 170.
    lm/2685 The ClockworkOrange mob:director/8476 Subject movie:actor name mdb:actor Jack Nicholson Advantages I Fewer joins I If the data is structured, we have a relational system { similar to normalized relations Disadvantages I Potentially a lot of NULLs I Clustering is not trivial I Multi-valued properties are complicated PKU/2014-08-28 30
  • 171.
    Binary Tables Groupingby properties: For each property, build a two-column table, containing both subject and object, ordered by subjects [Abadi et al., 2007, 2009] Also called vertical partitioned tables n two column tables (n is the number of unique properties in the data) Subject Property Object mdb:
  • 172.
  • 173.
  • 174.
  • 175.
    lm/2685 rdfs:label AClockwork Orange mdb:actor/29704 movie:actor name Jack Nicholson : : : : : : : : : movie:director Subject Object mdb:
  • 176.
  • 177.
  • 178.
  • 179.
    lm/2685 The ClockworkOrange movie:actor name Subject Object mdb:actor/29704 Jack Nicholson PKU/2014-08-28 31
  • 180.
    Binary Tables Groupingby properties: For each property, build a two-column table, containing both subject and object, ordered by subjects [Abadi et al., 2007, 2009] Also called vertical partitioned tables n two column tables (n is the number of unique properties in the data) Subject Property Object mdb:
  • 181.
  • 182.
  • 183.
  • 184.
    lm/2685 rdfs:label AClockwork Orange mdb:actor/29704 movie:actor name Jack Nicholson : : : : : : : : : movie:director Subject Object mdb:
  • 185.
  • 186.
  • 187.
  • 188.
    lm/2685 The ClockworkOrange movie:actor name Subject Object mdb:actor/29704 Jack Nicholson Advantages I Supports multi-valued properties I No NULLs I No clustering I Read only needed attributes (i.e. less I/O) I Good performance for subject-subject joins PKU/2014-08-28 31
  • 189.
    Binary Tables Groupingby properties: For each property, build a two-column table, containing both subject and object, ordered by subjects [Abadi et al., 2007, 2009] Also called vertical partitioned tables n two column tables (n is the number of unique properties in the data) Subject Property Object mdb:
  • 190.
  • 191.
  • 192.
  • 193.
    lm/2685 rdfs:label AClockwork Orange mdb:actor/29704 movie:actor name Jack Nicholson : : : : : : : : : movie:director Subject Object mdb:
  • 194.
  • 195.
  • 196.
  • 197.
    lm/2685 The ClockworkOrange movie:actor name Subject Object mdb:actor/29704 Jack Nicholson Advantages I Supports multi-valued properties I No NULLs I No clustering I Read only needed attributes (i.e. less I/O) I Good performance for subject-subject joins Disadvantages I Not useful for subject-object joins I Expensive inserts PKU/2014-08-28 31
  • 198.
    Graph-based Approach AnsweringSPARQL query subgraph matching gStore [Zou et al., 2011, 2014] FILTER(?r 4.0) ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook Stanley Kubrick movie:director name ?r rev:rating Subgraph Matching United Kingdom gn:name The Passenger refs:label 62348447 gn:population mdb:
  • 199.
    lm/2014 movie:initial releasedate 1980-05-23 bm:oers/0743424425amazonOer The Shining refs:label bm:books/0743424425 4.7 rev:rating geo:2635167 The Last Tycoon refs:label movie:actor movie:actor mdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 200.
  • 201.
    lm/1267 mdb:director/8476 StanleyKubrick movie:director name mdb:
  • 202.
    lm/2685 refs:label AClockwork Orange mdb:
  • 203.
    lm/424 refs:label Spartacusmdb:actor/30013 movie:relatedBook scam:hasOer foaf:based near movie:actor movie:director movie:actor movie:director movie:director PKU/2014-08-28 32
  • 204.
    Graph-based Approach AnsweringSPARQL query subgraph matching gStore [Zou et al., 2011, 2014] FILTER(?r 4.0) ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook Stanley Kubrick movie:director name ?r rev:rating Subgraph Matching United Kingdom gn:name The Passenger refs:label 62348447 gn:population mdb:
  • 205.
    lm/2014 movie:initial releasedate 1980-05-23 bm:oers/0743424425amazonOer The Shining refs:label bm:books/0743424425 4.7 rev:rating geo:2635167 The Last Tycoon refs:label movie:actor movie:actor mdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 206.
  • 207.
    lm/1267 mdb:director/8476 StanleyKubrick movie:director name mdb:
  • 208.
    lm/2685 refs:label AClockwork Orange mdb:
  • 209.
    lm/424 refs:label Spartacusmdb:actor/30013 movie:relatedBook scam:hasOer foaf:based near movie:actor movie:director movie:actor movie:director movie:director Advantages I Maintains the graph structure I Full set of queries can be handled PKU/2014-08-28 32
  • 210.
    Graph-based Approach AnsweringSPARQL query subgraph matching gStore [Zou et al., 2011, 2014] FILTER(?r 4.0) ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook Stanley Kubrick movie:director name ?r rev:rating Subgraph Matching United Kingdom gn:name The Passenger refs:label 62348447 gn:population mdb:
  • 211.
    lm/2014 movie:initial releasedate 1980-05-23 bm:oers/0743424425amazonOer The Shining refs:label bm:books/0743424425 4.7 rev:rating geo:2635167 The Last Tycoon refs:label movie:actor movie:actor mdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 212.
  • 213.
    lm/1267 mdb:director/8476 StanleyKubrick movie:director name mdb:
  • 214.
    lm/2685 refs:label AClockwork Orange mdb:
  • 215.
    lm/424 refs:label Spartacusmdb:actor/30013 movie:relatedBook scam:hasOer foaf:based near movie:actor movie:director movie:actor movie:director movie:director Advantages I Maintains the graph structure I Full set of queries can be handled Disadvantages I Graph pattern matching is expensive PKU/2014-08-28 32
  • 216.
    gStore General Approach:Work directly on the RDF graph and the SPARQL query graph Use a signature-based encoding of each entity and class vertex to speed up matching Filter-and-evaluate Use a false positive algorithm to prune nodes and obtain a set of candidates; then do more detailed evaluation on those Use an index (VS-tree) over the data signature graph (has light maintenance load) for ecient pruning PKU/2014-08-28 33
  • 217.
    1. Encode Qand G to Get Signature Graphs Query signature graph Q 00010 0100 0000 1000 0000 0000 0100 10000 Data signature graph G 0010 1000 0100 0001 00001 1000 0001 00010 0000 0100 10000 0000 1000 10000 0000 0010 10000 0000 1001 00100 1001 1000 01000 0001 0001 01000 0100 1000 01000 0001 0100 01000 PKU/2014-08-28 34
  • 218.
    2. Filter-and-Evaluate Querysignature graph Q 00010 0100 0000 1000 0000 0000 0100 10000 Data signature graph G 0010 1000 0100 0001 00001 1000 0001 00010 0000 0100 10000 0000 1000 10000 0000 0010 10000 0000 1001 00100 1001 1000 01000 0001 0001 01000 0100 1000 01000 0001 0100 01000 Find matches of Q over signature graph G Verify each match in RDF graph G PKU/2014-08-28 35
  • 219.
    How to GenerateCandidate List Two step process: 1. For each node of Q get lists of nodes in G that include that node. 2. Do a multi-way join to get the candidate list PKU/2014-08-28 36
  • 220.
    How to GenerateCandidate List Two step process: 1. For each node of Q get lists of nodes in G that include that node. 2. Do a multi-way join to get the candidate list Alternatives: PKU/2014-08-28 36
  • 221.
    How to GenerateCandidate List Two step process: 1. For each node of Q get lists of nodes in G that include that node. 2. Do a multi-way join to get the candidate list Alternatives: Sequential scan of G Both steps are inecient PKU/2014-08-28 36
  • 222.
    How to GenerateCandidate List Two step process: 1. For each node of Q get lists of nodes in G that include that node. 2. Do a multi-way join to get the candidate list Alternatives: Sequential scan of G Both steps are inecient Use S-trees Height-balanced tree over signatures Run an inclusion query for each node of Q and get lists of nodes in G that include that node. Given query signature q and a set of data signatures S,
  • 223.
    nd all datasignatures si 2 S where qsi = q Does not support second step { expensive PKU/2014-08-28 36
  • 224.
    How to GenerateCandidate List Two step process: 1. For each node of Q get lists of nodes in G that include that node. 2. Do a multi-way join to get the candidate list Alternatives: Sequential scan of G Both steps are inecient Use S-trees Height-balanced tree over signatures Run an inclusion query for each node of Q and get lists of nodes in G that include that node. Given query signature q and a set of data signatures S,
  • 225.
    nd all datasignatures si 2 S where qsi = q Does not support second step { expensive VS-tree (and VS-tree) Multi-resolution summary graph based on S-tree Supports both steps eciently Grouping by vertices PKU/2014-08-28 36
  • 226.
    S-tree Solution 11111111 00010 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 10000 PKU/2014-08-28 37
  • 227.
    S-tree Solution 11111111 00010 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 10000 PKU/2014-08-28 37
  • 228.
    S-tree Solution 10000002 1111 1111 00010 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 011 PKU/2014-08-28 37
  • 229.
    S-tree Solution 10000002 1111 1111 00010 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 011 003 008 PKU/2014-08-28 37
  • 230.
    S-tree Solution 10000002 1111 1111 00010 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 011 003 008 004 009 PKU/2014-08-28 37
  • 231.
    S-tree Solution 10000002 1111 1111 00010 004 009 on on 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 011 003 008 PKU/2014-08-28 37
  • 232.
    S-tree Solution 10000002 1111 1111 00010 004 009 on on 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 011 003 008 Possibly large join space! PKU/2014-08-28 37
  • 233.
    VS-tree 1111 111110001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 Super edge PKU/2014-08-28 38
  • 234.
    Pruning with VS-Tree1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 00010 0100 0000 1000 0000 0000 0100 10000 PKU/2014-08-28 39
  • 235.
    Pruning with VS-Tree1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 00010 0100 0000 1000 0000 0000 0100 10000 PKU/2014-08-28 39
  • 236.
    Pruning with VS-Tree1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 00010 0100 0000 1000 0000 0000 0100 10000 d3 2 d3 3 00010 10000 d3 3 d3 4 d3 1 d3 4 G3 01000 PKU/2014-08-28 39
  • 237.
    Pruning with VS-Tree1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 00010 0100 0000 1000 0000 0000 0100 10000 d3 2 d3 3 00010 10000 d3 3 d3 4 d3 1 d3 4 G3 01000 003 008 002 011 004 009 on on PKU/2014-08-28 39
  • 238.
    Pruning with VS-Tree1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 00010 0100 0000 1000 0000 0000 0100 10000 d3 2 d3 3 00010 10000 d3 3 d3 4 d3 1 d3 4 G3 01000 003 008 002 011 004 009 on on PKU/2014-08-28 39
  • 239.
    Pruning with VS-Tree1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 00010 0100 0000 1000 0000 0000 0100 10000 d3 2 d3 3 00010 10000 d3 3 d3 4 d3 1 d3 4 G3 01000 003 008 002 011 004 009 on on Reduced join space! PKU/2014-08-28 39
  • 240.
    Adaptivity to WorkloadWeb applications that are supported by RDF data management systems are far more varied than conventional relational applications Data that are being handled are far more heterogeneous SPARQL is far more exible in how triple patterns (i.e., the atomic query unit) can be combined An experiment [Aluc et al., 2014] RDF-3X VOS (6.1) VOS (7.1) MonetDB 4Store % queries for which tested system is fastest 20.9 0.0 22.6 56.5 0.0 Total workload exe- cution time (hours) 27.1 20.9 20.8 38.6 72.2 Mean (per query) execution time (sec- onds) 7.8 6.0 6.0 11.1 20.7 PKU/2014-08-28 40
  • 241.
    Adaptivity to WorkloadWeb applications that are supported by RDF data management systems are far more varied than conventional relational applications Data that are being handled are far more heterogeneous SPARQL is far more exible in how triple patterns (i.e., the atomic query unit) can be combined An experiment [Aluc et al., 2014] Summary of Experiments RDF-3X VOS (6.1) VOS (7.1) MonetDB 4Store % I queries No single for which system is a sole 20.9 winner across 0.0 all queries 22.6 56.5 0.0 tested I system is No single system is the sole loser across all queries, either fastest I There can be 2{5 orders of magnitude dierence in the performance (i.e., query Total workload exe- cution time (hours) 27.1 20.9 20.8 38.6 72.2 execution time) between the best and the worst system for a given query I The winner in one query may timeout in another I Performance dierence widens as dataset size increases Mean (per query) execution time (sec- onds) 7.8 6.0 6.0 11.1 20.7 PKU/2014-08-28 40
  • 242.
    Group-by-Query Approach hasPostTamer Post23571 Olaf taggedIn UWaterloo worksAt hasPost taggedIn hasPost Tamer Post23 Bob likes UWaterloo worksAt Post2 hasPost favourites hasPost Tamer Post23 Bob taggedIn UWaterloo worksAt Post2 PKU/2014-08-28 41
  • 243.
  • 244.
    xed size, (b)contain same set of attributes 1. Workload time analysis Type-A, robust Type-C, robust Type-A, adaptable Type-B, adaptable Type-B, adaptable Type-B, adaptable Type-B, adaptable Type-C, adaptable PKU/2014-08-28 42
  • 245.
  • 246.
    xed size, (b)contain same set of attributes 1. Workload time analysis 2. Updating the physical layout Cache Storage System Hash Function evict @t1 function adapts Hash Function @tk PKU/2014-08-28 42
  • 247.
  • 248.
    xed size, (b)contain same set of attributes 1. Workload time analysis 2. Updating the physical layout 3. Partial indexing Index { { { { { { { { { { Cache Storage System Hash Function evict @t1 function adapts Hash Function @tk SPARQL Query Engine PKU/2014-08-28 42
  • 249.
    chameleon-db Prototype system[Aluc et al., 2013] 35,000 lines of code in C++ and growing Structural Index ... Vertex Index Spill Index Storage System Cluster Index Storage Advisor Query Engine Plan Generation Evaluation PKU/2014-08-28 43
  • 250.
    Some Open ProblemsScalability of the solutions to very large datasets Maintenance of auxiliary data structures in dynamic environments Adaptive systems to handle varying and time-changing workloads Uncertain RDF data processing Keyword search over RDF data Query processing over incomplete RDF data PKU/2014-08-28 44
  • 251.
    Outline 1 LODand RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 45
  • 252.
    Remember the EnvironmentDistributed environment Some of the data sites can process SPARQL queries { SPARQL endpoints Not all data sites can process queries PKU/2014-08-28 46
  • 253.
    Remember the EnvironmentDistributed environment Some of the data sites can process SPARQL queries { SPARQL endpoints Not all data sites can process queries Alternatives PKU/2014-08-28 46
  • 254.
    Remember the EnvironmentDistributed environment Some of the data sites can process SPARQL queries { SPARQL endpoints Not all data sites can process queries Alternatives Data re-distribution + query decomposition PKU/2014-08-28 46
  • 255.
    Remember the EnvironmentDistributed environment Some of the data sites can process SPARQL queries { SPARQL endpoints Not all data sites can process queries Alternatives Data re-distribution + query decomposition SPARQL federation: just process at SPARQL endpoints PKU/2014-08-28 46
  • 256.
    Remember the EnvironmentDistributed environment Some of the data sites can process SPARQL queries { SPARQL endpoints Not all data sites can process queries Alternatives Data re-distribution + query decomposition SPARQL federation: just process at SPARQL endpoints Live querying (see next section) PKU/2014-08-28 46
  • 257.
    Distributed RDF Processing[Kaoudi and Manolescu, 2014] Data partitioning approaches RDF data warehouse is partitioned and distributed RDF data D = fD1; : : : ;Dng Allocate each Di to a site PKU/2014-08-28 47
  • 258.
    Distributed RDF Processing[Kaoudi and Manolescu, 2014] Data partitioning approaches RDF data warehouse is partitioned and distributed RDF data D = fD1; : : : ;Dng Allocate each Di to a site Partitioning alternatives Table-based (e.g., [Husain et al., 2011]) Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013]) Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013]) PKU/2014-08-28 47
  • 259.
    Distributed RDF Processing[Kaoudi and Manolescu, 2014] Data partitioning approaches RDF data warehouse is partitioned and distributed RDF data D = fD1; : : : ;Dng Allocate each Di to a site Partitioning alternatives Table-based (e.g., [Husain et al., 2011]) Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013]) Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013]) SPARQL query decomposed Q = fQ1; : : : ;Qkg Distributed execution of fQ1; : : : ;Qkg over fD1; : : : ;Dng PKU/2014-08-28 47
  • 260.
    Distributed RDF Processing[Kaoudi and Manolescu, 2014] Data partitioning approaches RDF data warehouse is partitioned and distributed RDF data D = fD1; : : : ;Dng Allocate each Di to a site Partitioning alternatives Table-based (e.g., [Husain et al., 2011]) Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013]) Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013]) SPARQL query decomposed Q = fQ1; : : : ;Qkg Distributed execution of fQ1; : : : ;Qkg over fD1; : : : ;Dng I High performance I Great for parallelizing centralized RDF data I May not be possible to re-partition and re-allocate Web data (i.e., LOD) PKU/2014-08-28 47
  • 261.
    Distributed RDF Processing{ 2 Data summary-based approaches Build summaries (index) for the distributed RDF datasets (e.g., [Atre et al., 2010; Prasser et al., 2012]) PKU/2014-08-28 48
  • 262.
    Distributed RDF Processing{ 2 Data summary-based approaches Build summaries (index) for the distributed RDF datasets (e.g., [Atre et al., 2010; Prasser et al., 2012]) SPARQL query Q = fQ1; : : : ;Qkg Distributed execution of fQ1; : : : ;Qkg using the data summary PKU/2014-08-28 48
  • 263.
    Distributed RDF Processing{ 2 Data summary-based approaches Build summaries (index) for the distributed RDF datasets (e.g., [Atre et al., 2010; Prasser et al., 2012]) SPARQL query Q = fQ1; : : : ;Qkg Distributed execution of fQ1; : : : ;Qkg using the data summary I No data re-partitioning and re-allocation I Have to scan the data at each site I Index over distributed data with maintenance concerns PKU/2014-08-28 48
  • 264.
    SPARQL Endpoint FederationConsider only the SPARQL endpoints for query execution No data re-partitioning/re-distribution Consider D = D1 [ D2 [ : : : [ Dn; Di : SPARQL endpoint PKU/2014-08-28 49
  • 265.
    SPARQL Endpoint FederationConsider only the SPARQL endpoints for query execution No data re-partitioning/re-distribution Consider D = D1 [ D2 [ : : : [ Dn; Di : SPARQL endpoint Alternatives SPARQL query decomposed Q = fQ1; : : : ;Qkg and executed over fD1; : : : ;Dng { DARQ, FedX [Schwarte et al., 2011], SPLENDID [Gorlitz and Staab, 2011], ANAPSID [Acosta et al., 2011] Partial query evaluation { Distributed gStore [Peng et al., 2014] PKU/2014-08-28 49
  • 266.
    SPARQL Endpoint FederationConsider only the SPARQL endpoints for query execution No data re-partitioning/re-distribution Consider D = D1 [ D2 [ : : : [ Dn; Di : SPARQL endpoint Alternatives SPARQL query decomposed Q = fQ1; : : : ;Qkg and executed over fD1; : : : ;Dng { DARQ, FedX [Schwarte et al., 2011], SPLENDID [Gorlitz and Staab, 2011], ANAPSID [Acosta et al., 2011] Partial query evaluation { Distributed gStore [Peng et al., 2014] Partial evaluation I Given function f (s; d) and part of its input s, perform f 's computation that only depends on s to get f 0(d) I Compute f 0(d) when d becomes available I Applied to, e.g., XML [Buneman et al., 2006] PKU/2014-08-28 49
  • 267.
    Distributed SPARQL UsingPartial Query Evaluation Two steps: 1. Evaluate a query at each site to
  • 268.
    nd local matchesQuery is the function and each Di is the known input Inner match or local partial match D1 D2 D3 D4 PKU/2014-08-28 50
  • 269.
    Distributed SPARQL UsingPartial Query Evaluation Two steps: 1. Evaluate a query at each site to
  • 270.
    nd local matchesQuery is the function and each Di is the known input Inner match or local partial match 2. Assemble the partial matches to get
  • 271.
    nal result Crossingmatch Centralized assembly Distributed assembly D1 D2 D3 D4 Crossing match PKU/2014-08-28 50
  • 272.
    Some Open ProblemsHandling data at non-SPARQL endpoint sites Modi
  • 273.
    cation to SPARQLendpoints (for partial query evaluation) Heterogeneous use of vocabularies (use of ontologies) PKU/2014-08-28 51
  • 274.
    Outline 1 LODand RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 52
  • 275.
    Live Query ProcessingNot all data resides at SPARQL endpoints Freshness of access to data important Potentially countably in
  • 276.
    nite data sourcesLive querying On-line execution Only rely on linked data principles Alternatives Traversal-based approaches Index-based approaches Hybrid approaches PKU/2014-08-28 53
  • 277.
    SPARQL Query Semanticsin Live Querying Full-web semantics Scope of evaluating a SPARQL expression is all Linked Data Query result completeness cannot be guaranteed by any (terminating) execution PKU/2014-08-28 54
  • 278.
    SPARQL Query Semanticsin Live Querying Full-web semantics Scope of evaluating a SPARQL expression is all Linked Data Query result completeness cannot be guaranteed by any (terminating) execution Reachability-based query semantics Query consists of a SPARQL expression, a set of seed URIs S, and a reachability condition c Scope: all data along paths of data links that satisfy the condition Computationally feasible PKU/2014-08-28 54
  • 279.
    Traversal Approaches Discoverrelevant URIs recursively by traversing (speci
  • 280.
    c) data linksat query execution runtime [Hartig, 2013; Ladwig and Tran, 2011] Implements reachability-based query semantics Start from a set of seed URIs Recursively follow and discover new URIs Important issue is selection of seed URIs Retrieved data serves to discover new URIs and to construct result PKU/2014-08-28 55
  • 281.
    Traversal Approaches Discoverrelevant URIs recursively by traversing (speci
  • 282.
    c) data linksat query execution runtime [Hartig, 2013; Ladwig and Tran, 2011] Implements reachability-based query semantics Advantages Easy to implement. No data structure to maintain. Start from a set of seed URIs Recursively follow and discover new URIs Important issue is selection of seed URIs Retrieved data serves to discover new URIs and to construct result PKU/2014-08-28 55
  • 283.
    Traversal Approaches Discoverrelevant URIs recursively by traversing (speci
  • 284.
    c) data linksat query execution runtime [Hartig, 2013; Ladwig and Tran, 2011] Implements reachability-based query semantics Advantages Easy to implement. No data structure to maintain. Start from a set of seed URIs Recursively follow and discover new URIs Important issue is selection of seed URIs Retrieved data serves to discover new URIs and to construct result Disadvantages Possibilities for parallelized data retrieval are limited Repeated data retrieval introduces signi
  • 285.
    cant query latency.PKU/2014-08-28 55
  • 286.
    Traversal Optimization Dynamicquery execution [Hartig and Ozsu, 2014] Data Retrieval ...lookup queue... Output PKU/2014-08-28 56
  • 287.
    Traversal Optimization Dynamicquery execution [Hartig and Ozsu, 2014] Prioritization of URIs { a number of alternatives Non-adaptive Adaptive, Local processing aware Adaptive, Local processing agnostic Intermediate solution driven Solution-aware graph-based Hybrid graph-based Purely graph-based PKU/2014-08-28 56
  • 288.
    Index Approaches Usepre-populated index to determine relevant URIs (and to avoid as many irrelevant ones as possible) Dierent index keys possible; e.g., triple patterns [Umbrich et al., 2011] Index entries a set of URIs Indexed URIs may appear multiple times (i.e., associated with multiple index keys) Each URI in such an entry may be paired with a cardinality (utilized for source ranking) Key: tp Entry: furi1; uri2; ; uring GET urii PKU/2014-08-28 57
  • 289.
    Index Approaches Usepre-populated index to determine relevant URIs (and to avoid as many irrelevant ones as possible) Dierent index keys possible; e.g., triple patterns [Umbrich et al., 2011] Index entries a set of URIs Indexed URIs may appear multiple times (i.e., associated with multiple index keys) Each URI in such an entry may be paired with a cardinality (utilized for source ranking) Advantages Data retrieval can be fully parallelized Reduces the impact of data retrieval on query execution time Key: tp Entry: furi1; uri2; ; uring GET urii PKU/2014-08-28 57
  • 290.
    Index Approaches Usepre-populated index to determine relevant URIs (and to avoid as many irrelevant ones as possible) Dierent index keys possible; e.g., triple patterns [Umbrich et al., 2011] Index entries a set of URIs Indexed URIs may appear multiple times (i.e., associated with multiple index keys) Each URI in such an entry may be paired with a cardinality (utilized for source ranking) Advantages Data retrieval can be fully parallelized Reduces the impact of data retrieval on query execution time Key: tp Entry: furi1; uri2; ; uring Disadvantages Querying can only start after index GET construction urii Depends on what has been selected for the index Freshness may be an issue Index maintenance PKU/2014-08-28 57
  • 291.
    Hybrid Approach Performa traversal-based execution using a prioritized list of URIs to look up [Ladwig and Tran, 2010] Initial seed from the pre-populated index Non-seed URIs are ranked by a function based on information in the index New discovered URIs that are not in the index are ranked according to number of referring documents PKU/2014-08-28 58
  • 292.
    Some Open ProblemsOptimize queries by using statistics collected during earlier query executions Heterogeneous use of vocabularies (use of ontologies) Combine SPARQL federation to leverage SPARQL endpoint functionality PKU/2014-08-28 59
  • 293.
    Outline 1 LODand RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 60
  • 294.
    Conclusions RDF andLinked Object Data seem to have considerable promise for Web data management 2014 2011 PKU/2014-08-28 61
  • 295.
    Conclusions RDF andLinked Object Data seem to have considerable promise for Web data management More work needs to be done Query semantics Adaptive system design Optimizations { both in data warehousing and distributed environments Live querying requires signi
  • 296.
    cant thought toreduce latency 2014 2011 PKU/2014-08-28 61
  • 297.
    Conclusions What Idid not talk about: Not much on general distributed/parallel processing Not much on SPARQL semantics Nothing about RDFS { no schema stu Nothing about entailment regimes 0 ) no reasoning PKU/2014-08-28 62
  • 298.
    Thank you! Researchsupported by PKU/2014-08-28 63
  • 299.
    References I Abadi,D. J., Marcus, A., Madden, S., and Hollenbach, K. (2009). SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J., 18(2):385{406. Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. (2007). Scalable semantic web data management using vertical partitioning. In Proc. 33rd Int. Conf. on Very Large Data Bases, pages 411{422. Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. (1997). The Lorel query language for semistructured data. Int. J. Digit. Libr., 1(1):68{88. Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., and Ruckhaus, E. (2011). ANAPSID: an adaptive query processing engine for SPARQL endpoints. In Proc. 10th Int. Semantic Web Conf., pages 18{34. Aluc, G., Hartig, O., Ozsu, M. T., and Daudjee, K. (2014). Diversi
  • 300.
    ed stress testingof RDF data management systems. In Proc. 13th Int. Semantic Web Conf. Forthcoming. Aluc, G., Ozsu, M. T., Daudjee, K., and Hartig, O. (2013). chameleon-db: a workload-aware robust RDF data management system. Technical Report CS-2013-10, University of Waterloo. PKU/2014-08-28 64
  • 301.
    References II Arocena,G. and Mendelzon, A. (1998). Weboql: Restructuring documents, databases and webs. In Proc. 14th Int. Conf. on Data Engineering, pages 24{33. Atre, M., Chaoji, V., Zaki, M. J., and Hendler, J. A. (2010). Matrix bit loaded: A scalable lightweight join query processor for rdf data. In Proc. 19th Int. World Wide Web Conf., pages 41{50. Bornea, M. A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., and Bhattacharjee, B. (2013). Building an ecient RDF store over a relational database. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 121{132. Buneman, P., Cong, G., Fan, W., and Kementsietsidis, A. (2006). Using partial evaluation in distributed query evaluation. In Proc. 32nd Int. Conf. on Very Large Data Bases, pages 211{222. Buneman, P., Davidson, S., Hillebrand, G. G., and Suciu, D. (1996). A query language and optimization techniques for unstructured data. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 505{516. Fernandez, M., Florescu, D., and Levy, A. (1997). A query language for a web-site management system. ACM SIGMOD Rec., 26(3):4{11. PKU/2014-08-28 65
  • 302.
    References III Gorlitz,O. and Staab, S. (2011). SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In Proc. 2nd Int. Workshop on Consuming Linked Data. Gurajada, S., Seufert, S., Miliaraki, I., and Theobald, M. (2014). TriAD: A distributed shared-nothing RDF engine based on asynchronous message passing. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 289{300. Hartig, O. (2012). SPARQL for a web of linked data: Semantics and computability. In Proc. 9th Extended Semantic Web Conf., pages 8{23. Hartig, O. (2013). SQUIN: a traversal based query execution system for the web of linked data. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 1081{1084. Hartig, O. and Ozsu, M. T. (2014). Optimizing response time of traversal-based query optimization. In preparation. Huang, J., Abadi, D. J., and Ren, K. (2011). Scalable SPARQL querying of large RDF graphs. Proc. VLDB Endowment, 4(11):1123{1134. PKU/2014-08-28 66
  • 303.
    References IV Husain,M. F., McGlothlin, J., Masud, M. M., Khan, L. R., and Thuraisingham, B. (2011). Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans. Knowl. and Data Eng., 23(9):1312{1327. Kaoudi, Z. and Manolescu, I. (2014). RDF in the clouds: A survey. VLDB J. Forthcoming. Konopnicki, D. and Shmueli, O. (1995). W3QS: A query system for the World Wide Web. In Proc. 21th Int. Conf. on Very Large Data Bases, pages 54{65. Ladwig, G. and Tran, T. (2010). Linked data query processing strategies. In Proc. 9th Int. Semantic Web Conf., pages 453{469. Ladwig, G. and Tran, T. (2011). SIHJoin: Querying remote and local linked data. In Proc. 8th Extended Semantic Web Conf., pages 139{153. Lakshmanan, L. V. S., Sadri, F., and Subramanian, I. N. (1996). A declarative language for querying and restructuring the Web. In Proc. 6th Int. Workshop on Research Issues on Data Eng., pages 12{21. Lee, K. and Liu, L. (2013). Scaling queries over big rdf graphs with semantic hash partitioning. Proc. VLDB Endowment, 6(14):1894{1905. PKU/2014-08-28 67
  • 304.
    References V Mendelzon,A. O., Mihaila, G. A., and Milo, T. (1997). Querying the World Wide Web. Int. J. Digit. Libr., 1(1):54{67. Neumann, T. and Weikum, G. (2008). RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endowment, 1(1):647{659. Neumann, T. and Weikum, G. (2009). The RDF-3X engine for scalable management of RDF data. VLDB J., 19(1):91{113. Papakonstantinou, Y., Garcia-Molina, H., and Widom, J. (1995). Object exchange across heterogeneous information sources. In Proc. 11th Int. Conf. on Data Engineering, pages 251{260. Peng, P., Zou, L., Ozsu, M. T., Chen, L., and Zhao, D. (2014). Processing SPARQL queries over linked data { a distributed graph-based approach. In submitted for publication. Prasser, F., Kemper, A., and Kuhn, K. A. (2012). Ecient distributed query processing for autonomous rdf databases. In Proc. 15th Int. Conf. on Extending Database Technology, pages 372{383. Schwarte, A., Haase, P., Hose, K., Schenkel, R., and Schmidt, M. (2011). Fedx: A federation layer for distributed query processing on linked open data. In Proc. 8th Extended Semantic Web Conf., pages 481{486. PKU/2014-08-28 68
  • 305.
    References VI Umbrich,J., Hose, K., Karnstedt, M., Harth, A., and Polleres, A. (2011). Comparing data summaries for processing live queries over linked data. World Wide Web J., 14(5-6):495{544. Weiss, C., Karras, P., and Bernstein, A. (2008). Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endowment, 1(1):1008{1019. Wilkinson, K. (2006). Jena property table implementation. Technical Report HPL-2006-140, HP Laboratories Palo Alto. Zhang, X., Chen, L., Tong, Y., and Wang, M. (2013). EAGRE: Towards scalable I/O ecient SPARQL query evaluation on the cloud. In Proc. 29th Int. Conf. on Data Engineering, pages 565{576. Zou, L., Mo, J., Chen, L., Ozsu, M. T., and Zhao, D. (2011). gStore: answering SPARQL queries via subgraph matching. Proc. VLDB Endowment, 4(8):482{493. Zou, L., Ozsu, M. T., Chen, L., Shen, X., Huang, R., and Zhao, D. (2014). gStore: A graph-based SPARQL query engine. VLDB J., 23(4):565{590. PKU/2014-08-28 69

[8]ページ先頭

©2009-2025 Movatter.jp