Movatterモバイル変換


[0]ホーム

URL:


Ankur Biswas, profile picture
Uploaded byAnkur Biswas
PPSX, PPTX1,306 views

An Introduction to Semantic Web Technology

The document provides an overview of the semantic web and some of its key challenges. It discusses:1) The evolution of the world wide web from a web of documents to a web of linked data through technologies like RDF, OWL, and SPARQL that add semantic meaning. 2) The vision for the semantic web is to publish machine-readable data using common formats so that information can be automatically processed by agents and integrated across sources. 3) Some challenges in realizing this vision include dealing with implicit knowledge, heterogeneous data distributions, and maintaining links and correctness over time as data changes.

Related topics:

Embed presentation

Download as PPSX, PPTX
SEMANTIC WEBUNDERSTANDING IN BRIEF
INTRODUCTIONWEB OF DOCUMENTS VS. WEB OF DATA4/4/2016Ankur Biswas 2
A Walk Through Brief History of WorldWide Web• 1969 – ARPANET (Advanced Research Project Agency)launched• In 1980, Tim Berners-Lee built ENQUIRE, as a personaldatabase of people and software models, a way toplay with hypertext; each new page of information inENQUIRE had to be linked to an existing page.• In 1990, Berners-Lee built all the tools necessary forworking Web: HTTP 0.9, HTML, First Web Browser(Web-Editor), the first HTTP server software (CERNhttpd), the first web server (http://info.cern.ch), andthe first Web pages that described the project itself.WWW's historical logo designedby Robert CailliauThe NeXTcube used by TimBerners-Lee at CERN becamethe first Web server.34/4/2016Ankur Biswas
How big is web???• As per http://www.worldwidewebsize.com/the Indexed Web contains at least 4.84 billionpages (Thursday, 25 February, 2016).• Early estimates suggested that the deep webis 400 to 550 times larger than the surfaceweb.• Since more information and sites are alwaysbeing added, it can be assumed that the deepweb is growing exponentially at a rate thatcannot be quantified.44/4/2016Ankur Biswas
Understanding Information in the WWW• What is important and how do you know?• What is information, what is advertisement?• What does information mean?• How credible or trustworthy is the information?• What is redundant?54/4/2016Ankur Biswas
Understanding the Importance of Meaning• SEMANTICS: It is part of the linguistics focused on Sense & Meaning oflanguage or symbols of language.• It is study of interpretation of sign or symbols as used by agents orcommunities within particular circumstances and contexts.• Semantics asks, how sense and meaning of complex concepts can bederived from simple concepts based on the rules of syntax.• The semantics of a message depends of its context and pragmatics†.†Dealing with things sensibly and realistically in a way that is based on practical rather thantheoretical considerations.64/4/2016Ankur Biswas
• SYNTAX: In grammatics denotes the study of the principlesand processes by which sentences are constructed inparticular language.• In formal languages, syntax is just a set of rules, by whichwell formed expressions can be created from a fundamentalset of symbols (alphabet).• In computer science, Syntax defines the normative structureof data.Understanding the Importance of Meaning74/4/2016Ankur Biswas
Understanding the Importance of Meaning• CONTEXT: It denotes the surrounding expressions (concepts) in anexpressing represents its relationship with surrounding expressions(concepts) and further related elements.• Context denotes all elements of any sort of communications thatdefine the interpretation of the communicated content e.g.• General contexts: place, time, interrelation of action in message.• Personal or Social contexts: relation between sender and receiver of a message.• PRAGMATICS: It reflects the intention by which the language is used tocommunicate a message.• In linguistic pragmatics denotes the study of applying language indifferent situations It also denotes the intended purpose of speaker.Pragmatics studies the ways in which context contributes to meaning84/4/2016Ankur Biswas
The limits of web• Traditional key based search leads to many irrelevant results.• Ex.- From a simple term Jaguar it is not clear if the user mean car or animal orOS(Mac OS X Jaguar)• POLYSEMY: If you get some result for your search and get some otherresult as well with different meaning having same or similar name.94/4/2016Ankur Biswas
Problem 1: Information Retrieval• Jaguar (animal) Panthera Onca• Traditional keyword-based search doesn’t find all results.• Synonyms & metaphors (Not always addressed properly which results undesiredresults)Primary objects: documentsDegree of structure in data: fairly lowImplicit semantics of contentsDesigned for: human consumption4/4/2016Ankur Biswas 10HTML HTML HTMLAPI/XMLA B C DUntyped Links Untyped Links Untyped Links
Problem 2: Information Extraction• Identifying contents written in other languages e.g. Japanese orBengali• Pictures doesn’t give any information to search engines that what itshows.• Example – Google identifies the caption or name of the picture whichis embedded in it and makes it a reference keyword.4/4/2016Ankur Biswas 11
Problem 2: Information Extraction (Cont.)4/4/2016Ankur Biswas 12HTML HTML HTMLAPI/XMLA B C DUntyped Links Untyped Links Untyped LinksThings ThingsAre two Documentstalking about same“Thing”????????? ?
• Can only be solved, correctly by a human agent• Heterogeneous distribution and order of information.• Software agent does not have sufficient:• Knowledge of contexts• World knowledge and• ExperienceTo solve problemHence it will not be able to solve the problem without explicitsemantic available.Implicit knowledge, i.e. information doesn’t have specified explicitlybut must be derived via logical deductions from available information.4/4/2016Ankur Biswas 13Problem 2: Information Extraction (Cont.)
The more complex and voluminous a website is , the more complicated is themaintenance of the only weakly structured data.Problems: Syntactic consistency error: You have linked your webpage to anotherwebpage having some related content but now the webpage has moved tosome other place and the link to that address still exist. Semantic (link) consistency error: This is even more dangerous wherehyperlinked destinations is consistently changing. Correctness: It is tough to maintain correctness over time in automatedmanner Timeliness: Tracking the changes over time is really tough.Problem 3: Maintenance4/4/2016Ankur Biswas 14http 404 Error: File/Page not found
Problem 4: Personalization• Adaption of the presented information content to personalrequirements:User normally password protect their details and hence it becomes tough to accessany such kind of information.• Problems:• From where do we get the required (personal) information?• Personalization vs Data Security4/4/2016Ankur Biswas 15
INTRODUCTION TOSEMANTIC WEB TECHNOLOGIESTHE VISION OF THE SEMANTIC WEB4/4/2016Ankur Biswas 16
The vision of the Semantic Web4/4/2016Ankur Biswas 17Precondition:• Content can be read andinterpreted correctly(understood) by machinesNatural language Processing• Technologies of TraditionalInformation Retrieval (SearchEngines)Semantic Web concept was first introduced in 1990’s byTim Berners – Lee who is also one of the creator of internet.Semantic Web• Natural language web content willbe explicitly annotated withsemantic metadata• Semantic metadata encode theMeaning (Semantics) of thecontent and can be read andinterpreted correctly by machines.
How Can we Achieve the Semantic Web? –The Original Vision• Instead of publishing information to be consumed byhumans, publish machine-processable data and metadatausing terms/languages that can be understood by machines.• Build machines (agents) that will search for, query, integrateetc. this data.• Make sure all agents understand your terms/languages.4/4/2016Ankur Biswas 18
The Semantic Web and Linked Data VisionToday• The Semantic Web is a web of data. There is lots of data we all useevery day, and it is not part of the web.• The Semantic Web is about two things:• It is about common formats for integration and combination of data drawn fromdiverse sources, where on the original Web mainly concentrated on theinterchange of documents.• It is also about language for recording how the data relates to real world objects.• That allows a person, or a machine, to start off in one database, andthen move through an unending set of databases which are connectednot by wires but by being about the same thing.4/4/2016Ankur Biswas 19
Semantic Web Technology Stack• Most apps use only a subset ofthe stack• Querying allows fine-graineddata access• Standardized informationexchange is a key• Formats are necessary but nottoo important• The semantic web is based onthe web4/4/2016Ankur Biswas 20
Basic Layer of Semantic Web TechnologyStack• The foundation of the layer is World Wide Web. Hence we rely on all technologies inworld wide web.• Semantic version of Wikipedia is DBpedia.• As Wikipedia is having template hence data is somewhat structured.• DBpedia extracts data from Wikipedia infoboxes.• DBpedia is having machine readable language  RDF• Dbpedia stores & publishes the result in RDF and a few other formats.• It also hosts a community effort to define extractors for the data, that can be usedwell beyond Wikipedia.• It provides a number of services around the extracted data, like DBpedia mobile, aSPARQL endpoint, a faceted browser, a number of mappings to external ontologies,an ontology itself, etc.4/4/2016Ankur Biswas 21
Semantic Web Technologies• A set of technologies and frameworks that enablethe Web of Data:• Resource Description Framework (RDF)• A variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples)• Notations such as RDF Schema (RDFS) and the Web OntologyLanguage (OWL)• All are intended to provide a formal description of concepts, terms,and relationships within a given knowledge domain• Specialized query language (SPARQL) is just like SQL but can be morecomplicated and may be based on graph extraction4/4/2016Ankur Biswas 22
Application in Web of Data• Linked Data• Linked Open Data (LOD) denote publicly available (RDF) Data in the web,identification via URI and accessible via HTTP. Linked data4/4/2016Ankur Biswas 23Web of Data:• >31 billion Facts• >500 million Links(Oct 2011)
4/4/2016Ankur Biswas 24What is so special about BBC Music Website?• Information is dynamically aggregated fromexternal, publicly available data (Wikipedia,Music Brainz,…)• No Screen Scrapping• No specialized API• Data available as Linked Open Data.• Data access via simple HTTP Request• Data is always up to date without manualinteraction.
How to build such a site 1.• Site editors roam the Web for new facts• may discover further links while roaming• They update the site manually• And the site gets soon out-of-date4/4/2016Ankur Biswas 25
How to build such a site 2.• Editors roam the Web for new data published onWeb sites• “Scrape” the sites with a program to extract theinformation• i.e., write some code to incorporate the new data• Easily get out of date again…4/4/2016Ankur Biswas 26
How to build such a site 3.• Editors roam the Web for new data via API-s• Understand those…• input, output arguments, datatypes used, etc.• Write some code to incorporate the new data• Easily get out of date again…4/4/2016Ankur Biswas 27
The choice of the BBC• Use external, public datasets• Wikipedia, MusicBrainz, …• They are available as data• not API-s or hidden on a Web site• data can be extracted using, e.g., HTTP requests orstandard queries4/4/2016Ankur Biswas 28
Its all documented4/4/2016Ankur Biswas 29
Search Engines – Document Retrieval• General Problems:• Correct interpretation of querystring ->• Somehow the context of user hasto be considered• e.g. what was the query of the userjust before a specific query or theirusual preferences etc.• Correct identification of entities• Automatic disambiguation• Usability• personalization4/4/2016Ankur Biswas 30
Intelligent Agents in Semantic WebWORLD WIDE WEB SEMANTIC WEB4/4/2016Ankur Biswas 31USERPresentationService (e.g.Firefox)Retrieval Service(e.g. Google)USERPersonalAssistantwww documentswww documentsIntelligentInfrastructureServices
3 Generations of Web Documents4/4/2016Ankur Biswas 32Static WebPagesHTML / CSS1st GenerationVirtualWeb PagesInteractiveWeb PagesJava Script/ AppletsNetbotsInformation ExtractionPresentation PlanningDatabase AccessTemplate BasedGenerationUser ModelMachine LearningOnline LayoutDynamic WebPagesAdaptive WebPages2nd Generation 3rd Generation
Toolbox for the Semantic Web• Standardized Language to express semantic of information content in theweb (XML/XSD, RDF(S), OWL, RIF)• Tools of semantic information in the web (RDFa, GRDDL,…)• Various Field of computer science:• Artificial Intelligence• Linguistics• Cryptography• Database• Theoretical Computer Science• Computer Architecture• Software Engineering• Systems Theory• Computer Networks4/4/2016Ankur Biswas 33
Basic Architecture of Semantic Web - I• Uniform  Different types ofresource identifiers allconstructed according touniform schema.•Resource  Whatever may beidentified by URI•Identifier  To distinguish oneresource from another4/4/2016Ankur Biswas 34
Uniform Resource Identifier (URI)• A Uniform Resource Identifier (URI) defines a simple and extensibleschema for world wide unique identification of abstract or physicalresources.• Resources can be every object with a clear identity (according to the context ofthe application)• As e.g. webpages, books, locations, persons, relations among objects, abstract concepts,etc.• The concept of URI is already established in various domains as e.g.• The Web(URL (uniform resource locator), PRN (persistent uniform names), pURL(persistent uniform resource locator)• Books & Publications (ISBN, ISSN)• Digital Object Identifier (DOI)4/4/2016Ankur Biswas 35
Uniform Resource Identifier (URI)• URI Combines• Address (Locator)• Uniform Resource Locator (URL, RFC1738)• Denotes, where a resource can befound in the web by stating itsprimary access mechanism• Might change during life time.• Identity (Name)• Uniform Resource Name (URN, RFC2141)• Persistent Identifier for a webresource• Remains unchanged during life cycle• URI Generic Syntax• Schema: e.g. http, ftp, mailto• Userinfo: e.g. username; password• Host: e.g. Domain name, IPv4/IPv6Address• Port: e.g. :80 stands for http port• Path: e.g. path in file system ofWWW server• Query: e.g. parameters to be passedover to applications• Fragment: e.g. determines specificfragment of a document4/4/2016Ankur Biswas 36URI=schema”://”[userinfo”@”]host[:port][path][“?”query][“#”fragment]
Data on the Web is not enough…• We need a proper infrastructure for a real Web ofData• data is available on the Web• accessible via standard Web technologies• data are interlinked over the Web• i.e., data can be integrated over the Web• This is where Semantic Web technologies come in• We will use a simplistic example to introduce themain Semantic Web concepts4/4/2016Ankur Biswas 37
The rough structure of data integration• Map the various data onto an abstract datarepresentation• make the data independent of its internalrepresentation…• Merge the resulting representations• Start making queries on the whole!• queries not possible on the individual data sets4/4/2016Ankur Biswas 38
We start with a book...4/4/2016Ankur Biswas 39
A simplified bookstore data(dataset “A”)4/4/2016Ankur Biswas 40ID Author Title Publisher YearISBN 0-00-6511409-X id_xyz The Glass Palace id_qpr 2000ID Name Homepageid_xyz Ghosh, Amitav http://www.amitavghosh.comID Publisher’s name Cityid_qpr Harper Collins London
1st: we export our data as a set of relations4/4/2016Ankur Biswas 41http://…isbn/000651409XGhosh, Amitav http://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:author
Some notes on the exporting the data• Relations form a graph• the nodes refer to the “real” data or contain some literal• how the graph is represented in machine is immaterial for now• Data export does not necessarily mean physical conversion of the data• relations can be generated on-the-fly at query time• via SQL “bridges”• scraping HTML pages• extracting data from Excel sheets• etc.• One can export part of the data4/4/2016Ankur Biswas 42
Same book in French…4/4/2016Ankur Biswas 43
Another bookstore data(dataset “F”)4/4/2016Ankur Biswas 44A B C D1ID Titre Traducteur Original2ISBN 2020286682 Le Palais des Miroirs $A12$ ISBN 0-00-6511409-X3456ID Auteur7ISBN 0-00-6511409-X $A11$8910Nom11Ghosh, Amitav12Besse, Christianne
2nd: export your second set of data4/4/2016Ankur Biswas 45http://…isbn/000651409XGhosh, AmitavBesse, ChristianneLe palais des miroirsf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nom
3rd: start merging your data4/4/2016Ankur Biswas 46http://…isbn/000651409XGhosh, AmitavBesse, ChristianneLe palais des miroirsf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nomhttp://…isbn/000651409XGhosh, Amitavhttp://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:authorSame URI!
3rd: start merging your data4/4/2016Ankur Biswas 47Ghosh, AmitavBesse, ChristianneLe palais des miroirsf:originalf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nomGhosh, Amitavhttp://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:authorhttp://…isbn/000651409X
Start making queries…• User of data “F” can now ask queries like:• “give me the title of the original”• well, … « donnes-moi le titre de l’original »• This information is not in the dataset “F”…• …but can be retrieved by merging with dataset “A”!4/4/2016Ankur Biswas 48
However, more can be achieved…• We “feel” that a:author and f:auteur should be the same• But an automatic merge does not know that!• Let us add some extra information to the merged data:• a:author same as f:auteur• both identify a “Person”• a term that a community may have already defined:• a “Person” is uniquely identified by his/her name and, say, homepage• it can be used as a “category” for certain type of resources4/4/2016Ankur Biswas 49
3rd revisited: use the extra knowledge4/4/2016Ankur Biswas 50Besse, ChristianneLe palais des miroirsf:originalf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nomGhosh, Amitavhttp://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:authorhttp://…isbn/000651409Xhttp://…foaf/Personr:typer:typef:auteura:namea:homepagef:auteura:namea:homepagef:originalf:traducteurf:nomr:typef:auteura:namea:homepage
Start making richer queries!• User of dataset “F” can now query:• “donnes-moi la page d’accueil de l’auteur de l’original”• well… “give me the home page of the original’s ‘auteur’”• The information is not in datasets “F” or “A”…• …but was made available by:• merging datasets “A” and datasets “F”• adding three simple extra statements as an extra “glue”4/4/2016Ankur Biswas 51
Combine with different datasets• Using, e.g., the “Person”, the dataset can be combined withother sources• For example, data in Wikipedia can be extracted usingdedicated tools• e.g., the “dbpedia” project can extract the “infobox” informationfrom Wikipedia already…4/4/2016Ankur Biswas 52
Merge with Wikipedia data4/4/2016Ankur Biswas 53Besse, ChristianneLe palais des miroirsf:originalf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nomGhosh, Amitav http://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:authorhttp://…isbn/000651409Xhttp://…foaf/Personr:typer:typehttp://dbpedia.org/../Amitav_Ghoshhttp://dbpedia.org/../The_Hungry_Tidehttp://dbpedia.org/../The_Calcutta_Chromosomehttp://dbpedia.org/../The_Glass_Palacer:typefoaf:namew:referencew:author_ofw:author_ofw:isbna:authorf:originalf:traducteurf:nomr:typew:isbnhttp://dbpedia.org/../Kolkataw:author_ofw:born_inw:longw:lat
Search Engines – FactRetrieval4/4/2016Ankur Biswas 54Query String: International Space Station - 17thMarch 2016• What is International Space Station?• Is it orbiting on 17th March 2016?• How to compute the position of satellite on thesaid date• External Data to be considered:• Constellation data• Planet data• Satellite dataQuery String: International Space Station - 17thMarch 2016• What is International Space Station?• Is it orbiting on 17th March 2016?• How to compute the position of satellite on thesaid date• External Data to be considered:• Constellation data• Planet data• Satellite data
RDFRDF stands for• Resource: pages, dogs, ideas...everything that can have a URI• Description: attributes, features, andrelations of the resources• Framework: model, languages andsyntaxes for these descriptions•RDF is a triple model i.e. every piece ofknowledge is broken down into( subject , predicate , object )4/4/2016Ankur Biswas 55
RDF4/4/2016Ankur Biswas 56• doc.html has for author Ankurand has for theme Research• doc.html has for author Ankurdoc.html has for theme Research• ( doc.html , author , Ankur )( doc.html , theme , Research )( subject , predicate , object )
4/4/2016Ankur Biswas 57RDFis also a graph model to link the descriptions of resourcesRDF triples can be seen as arcsof a graph (vertex,edge,vertex)AnkurDoc.htmlResearchAuthor Theme
Resource Description Framework (RDF)• Another Triple Model:4/4/2016Ankur Biswas 58Subject Predicate ObjectRenee Miller Teaches CSC433Renee Miller Lives in Toronto<URI> <URI> <URI> or “Literal”<http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> <http://dbpedia.org/resource/Toronto><http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> Torontobb: renee-j-millerdbpedia: Toront0foaf: PersonRenee J. Millerrdf: typefoaf: namefoaf: based_nearbb: renee-j-millerbb: renee-j-millerbb: renee-j-millerfoaf: namebb: renee-j-millerfoaf: Friend of a Friend
A Simple RDF Example (in RDF/XML)<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:foaf="http://xmlns.com/foaf/spec/#"xmlns:bb="http://data.bibbase.org/ontology/"><rdf:Description rdf:about="http://.../author/renee-j-miller/"><rdf:type rdf:resource="http://xmlns.com/foaf/spec/#term_Person"/><foaf:name xml:lang=“en">Renée J. Miller</foaf:name><foaf:based_nearrdf:resource="http://dbpedia.org/resource/Toronto"/></rdf:Description></rdf:RDF>4/4/2016Ankur Biswas 59dbpedia: Toront0foaf: PersonRenee J. Millerrdf: typefoaf: based_nearfoaf: namebb: renee-j-miller
A Simple RDF Example (in Turtle)@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix foaf: <http://xmlns.com/foaf/spec/#> .@prefix bb: <http://data.bibbase.org/ontology/> .<http://data.bibbase.org/author/renee-j-miller/>rdf:type foaf:person .foaf:name “Renée J. Miller”@en ;foaf:based_near <http://dbpedia.org/resource/Toronto>4/4/2016Ankur Biswas 60dbpedia: Toront0foaf: PersonRenee J. Millerrdf: typefoaf: based_nearfoaf: namebb: renee-j-miller
A Simple RDF Example (in RDFa)…<p about="http://.../author/renee-j-miller">The author“<span property=“foaf:name” lang=“en”>Renée J. Miller</span>”lives in the city“<span rel=“foaf:based_near“resource="http://…/Toronto">Toronto</span>”</p> .…4/4/2016Ankur Biswas 61dbpedia: Toront0foaf: PersonRenee J. Millerrdf: typefoaf: based_nearfoaf: namebb: renee-j-miller
• SPARQL stands for “SPARQL Protocoland RDF Query Language”.• It is the standard query language forRDF data proposed by the W3C.• It is based on matching graphpatterns against RDF graphs.• The simplest kind of graph pattern isa triple pattern.– A triple pattern is like an RDFtriple, but with the option of avariable in the subject, predicate orobject positions.4/4/2016Ankur Biswas 62
Example Dataset@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix foaf: <http://xmlns.com/foaf/spec/#> .@prefix bb: <http://data.bibbase.org/ontology/> .<http://data.bibbase.org/author/renee-j-miller/>rdf:type foaf:person .foaf:name “Renée J. Miller”@en ;foaf:based_near [ rdf: type foaf:Place;foaf:name “Toronto”] .4/4/2016Ankur Biswas 63
Example SPARQL QuerySELECT ?nameWHERE { ?x foaf:name ?name .?x rdf:type foaf:Person .?x foaf:based_near ?y .?y foaf:name “Toronto” .}• Result4/4/2016Ankur Biswas 64?name“Renée J. Miller”
4/4/2016Ankur Biswas 65
Example SPARQL Query4/4/2016Ankur Biswas 66
SPARQL 1.0 allows• Extraction of Data as• URIs, Blank Nodes, typed & un-typed Literals• RDF Subgraphs• Exploration of data via Query for unknown relations.• Execution of complex join operations heterogeneous databases in asingle query• Transformation of RDF Data from one Vocabulary to another• Construction of new RDF Graphs based on RDF Query Subgraph4/4/2016Ankur Biswas 67
SPARQL 1.1 (in progress) allows• Additional Query Features• Aggregate function, subqueries, negations, project expressions, property paths,• Enables logical Entailment for• RDF, RDFS, OWL Direct & RDFS – Based Semantic entailment and RIF Coreentailment• Enables update of RDF graphs as a full data manipulation language• Enables the discovery of information about the SPARQL service• Enables Federated Queries distributed over different SPARQL.4/4/2016Ankur Biswas 68
SPARQL usage in practice• SPARQL is usually used over the network• Separate documents define the protocol and the result format• SPARQL Protocol for RDF with HTTP and SOAP bindings• SPARQL results in XML or JSON formats• Big datasets often offer “SPARQL endpoints” using thisprotocol• Typical example: SPARQL endpoint to DBpedia4/4/2016Ankur Biswas 69
SPARQL as a unifying point4/4/2016Ankur Biswas 70ApplicationsSPARQL ProcessorRDF GraphHTMLNLPTechniqueRelational DatabaseSQL⇔RDFDatabaseSPARQLEndpointSPARQLEndpointTriple StoreUnstructured Text XML/XHTMLBased on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/
Other Semantic Web Technologies• Web Ontology Language (OWL)• A family of knowledge representation languages for authoring ontologies forthe Web• RDF Schema (RDFS)• RDF Vocabulary Description Language• http://www.w3.org/TR/rdf-schema/• How to use RDF to describe RDF vocabularies• Other RDF Vocabularies• Simple Knowledge Organization System (SKOS)• Designed for representation of thesauri, classification schemes, taxonomies,subject-heading systems, or any other type of structured controlledvocabulary• FOAF (Friend of a friend)• A machine-readable ontology describing persons, their activities and theirrelations to other people and object4/4/2016Ankur Biswas 71
ONTOLOGIESEXISTING OF BEING4/4/2016Ankur Biswas 72
Ontologies• An ontology is a formal, explicit, shared specification of aconceptualization of a domain (Gruber, 1993).• Conceptualization: the objects, concepts, and other entities that areassumed to exist in some area of interest and the relationships thathold among them. A conceptualization is an abstract, simplified viewof the world that we wish to represent for some purpose.• The term ontology is borrowed from Philosophy, where ontology is asystematic account of existence (what things exist, how they can bedifferentiated from each other etc.).• Today the word ontology is a synonym for a shared knowledge base.4/4/2016Ankur Biswas 73
Ontologies – Components & Models• Classes, Relations & Instances• Classes represent concepts• Classes are described byattributes• Attributes are name value pairs4/4/2016Ankur Biswas 74The address contains the name, title andplace of address of a personSemi - Informal DescriptionAddress First name <string> Family name <string> Street <string> PIN Code <int> City <string> …Informal Description
Learning Ontologies4/4/2016Ankur Biswas 75
Very Large Ontologies• Recently there has been a lot of work on developing very largeontologies that capture various areas of human knowledge anddeploying this knowledge in applications such as search engines orquestion answering.• Example: Watson, IBM’s question answering system that beat humansin the quiz show Jeopardy (http://www-03.ibm.com/innovation/us/watson/index.html ).4/4/2016Ankur Biswas 76
5 Open Data – by Tim Berners-Lee• Tim Berners-Lee, the inventor of the Web and Linked Data initiator,suggested a 5-star deployment scheme for Open Data. Here, we giveexamples for each step of the stars and explain costs and benefits thatcome along with it.4/4/2016Ankur Biswas 77
BY EXAMPLE …make your stuff available on the Web (whatever format) under an openlicensemake it available as structured data (e.g., Excel instead of image scan of atable)make it available in a non-proprietary open format (e.g., CSV as well as ofExcel)use URIs to denote things, so that people can point at your stufflink your data to other data to provide context4/4/2016Ankur Biswas 78
What are the costs & benefits of ★ Webdata?• As a consumer …• You can look at it.• You can print it.• You can store it locally (on your hard drive or on an USB stick).• You can enter the data into any other system.• You can change the data as you wish.• You can share the data with anyone you like.• As a publisher …• It’s simple to publish.• You do not have explain repeatedly to others that they can use your data.4/4/2016Ankur Biswas 79
What are the costs & benefits of ★★ Webdata?• As a consumer, you can do all what you can do with ★ Webdata and additionally:• You can directly process it with proprietary software to aggregate it,perform calculations, visualize it, etc.• You can export it into another (structured) format.• As a publisher …• It’s still simple to publish.4/4/2016Ankur Biswas 80
What are the costs & benefits of ★★★ Webdata?• As a consumer, you can do all what you can dowith ★★ Web data and additionally:• You can manipulate the data in any way you like, without the needto own any proprietary software package.• As a publisher …• You might need converters or plug-ins to export the data from theproprietary format.• It’s still rather simple to publish.4/4/2016Ankur Biswas 81
What are the costs & benefits of ★★★★ Webdata?• As a consumer, you can do all what you can do with ★★★ Web data and additionally:• You can link to it from any other place (on the Web or locally).• You can bookmark it.• You can reuse parts of the data.• You may be able to reuse existing tools and libraries, even if they only understand parts of the patternthe publisher used.• Understanding the structure of an RDF “Graph” of data can be more effort than tabular (Excel/CSV) ortree (XML/JSON) data.• You can combine the data safely with other data. URIs are a global scheme so if two things have thesame URI then it’s intentional, and if so that’s well on it’s way to being 5-star data!• As a publisher …• You have fine-granular control over the data items and can optimize their access (load balancing,caching, etc.)• Other data publishers can now link into your data, promoting it to 5 star!• You typically invest some time slicing and dicing your data.• You’ll need to assign URIs to data items and think about how to represent the data.• You need to either find existing patterns to reuse or create your own.4/4/2016Ankur Biswas 82
What are the costs & benefits of ★★★★★ Webdata?• As a consumer, you can do all what you can do with ★★★★ Web data andadditionally:• You can discover more (related) data while consuming the data.You can directly learn about the data schema.• You now have to deal with broken data links, just like 404 errors in web pages.• Presenting data from an arbitrary link as fact is as risky as letting people includecontent from any website in your pages. Caution, trust and common sense are all stillnecessary.• As a publisher …• You make your data discoverable.• You increase the value of your data.• Your own organization will gain the same benefits from the links as the consumers.• You’ll need to invest resources to link your data to other data on the Web.• You may need to repair broken or incorrect links.4/4/2016Ankur Biswas 83
Applications• Data integration (e.g., see project Optique http://www.optique-project.eu/)• E-government (e.g., open data)• E-commerce• Tourism• Medicine• Biology• Earth Observation (see the work of my group in projects TELEIOShttp://www.earthobservatory.eu/ and LEOhttp://www.linkedeodata.eu/ ).• …4/4/2016Ankur Biswas 84
References:• Books:• Antoniou, Grigoris, and F. Van Harmelet. "A semantic web premier." England: The MIT PressCambridge (2004).• Segaran, Toby, Colin Evans, and Jamie Taylor. Programming the semantic web. " O'Reilly Media, Inc.", 2009.• Davies, John, Dieter Fensel, and Frank Van Harmelen. "Towards the semantic web." Ontology-DrivenKnowledge Management. Chichester (2003).• Scientific Papers:• Maedche, Alexander. Ontology learning for the semantic web. Vol. 665. Springer Science & Business Media,2012.• Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the linked data best practices indifferent topical domains." The semantic web–ISWC 2014. Springer International Publishing, 2014. 245-260.• Video Lectures & Slides• Video lectures on Semantic Web by Dr. Harald Sack, Hasso Plattner Institute, University in Potsdam, Germany• www.cs.toronto.edu/~oktie/slides/web-of-data-intro.pdf• https://www.w3.org/2010/Talks/0622-SemTech-IH/• Websites• http://dbpedia.org/snorql/• http://5stardata.info/en/4/4/2016Ankur Biswas 85
4/4/2016Ankur Biswas 86Thank You

Recommended

PDF
Chapter 1 semantic web
PPTX
Semantic web
PDF
Semantic web
PPTX
Semantic Web
PPTX
The semantic web
byap
 
PPTX
Introduction to the Semantic Web
PPT
Semantic Web
PDF
The Semantic Web: An Introduction
PDF
CS6007 information retrieval - 5 units notes
PDF
Semantic web technology
PPTX
Semantic web
PPT
Web ontology language (owl)
PDF
Introduction to Web Services
PPTX
Semantic web
PPTX
Information retrieval (introduction)
PPTX
Web crawler
PPTX
Probabilistic information retrieval models & systems
PPTX
Semantic web
PDF
Ontologies and semantic web
PPT
Webcrawler
ODP
Web 3.0 The Semantic Web
PPT
RDF and OWL
PDF
Information_Retrieval_Models_Nfaoui_El_Habib
DOCX
NE7012- SOCIAL NETWORK ANALYSIS
DOCX
NE7012- SOCIAL NETWORK ANALYSIS
PPTX
Web search Technologies
PPTX
Resource description framework
DOCX
NE7012- SOCIAL NETWORK ANALYSIS
PPT
Semantic web an overview and projects
PPTX
semantic web.pptxThe Semantic Web is an extension of the WWW

More Related Content

PDF
Chapter 1 semantic web
PPTX
Semantic web
PDF
Semantic web
PPTX
Semantic Web
PPTX
The semantic web
byap
 
PPTX
Introduction to the Semantic Web
PPT
Semantic Web
PDF
The Semantic Web: An Introduction
Chapter 1 semantic web
Semantic web
Semantic web
Semantic Web
The semantic web
byap
 
Introduction to the Semantic Web
Semantic Web
The Semantic Web: An Introduction

What's hot

PDF
CS6007 information retrieval - 5 units notes
PDF
Semantic web technology
PPTX
Semantic web
PPT
Web ontology language (owl)
PDF
Introduction to Web Services
PPTX
Semantic web
PPTX
Information retrieval (introduction)
PPTX
Web crawler
PPTX
Probabilistic information retrieval models & systems
PPTX
Semantic web
PDF
Ontologies and semantic web
PPT
Webcrawler
ODP
Web 3.0 The Semantic Web
PPT
RDF and OWL
PDF
Information_Retrieval_Models_Nfaoui_El_Habib
DOCX
NE7012- SOCIAL NETWORK ANALYSIS
DOCX
NE7012- SOCIAL NETWORK ANALYSIS
PPTX
Web search Technologies
PPTX
Resource description framework
DOCX
NE7012- SOCIAL NETWORK ANALYSIS
CS6007 information retrieval - 5 units notes
Semantic web technology
Semantic web
Web ontology language (owl)
Introduction to Web Services
Semantic web
Information retrieval (introduction)
Web crawler
Probabilistic information retrieval models & systems
Semantic web
Ontologies and semantic web
Webcrawler
Web 3.0 The Semantic Web
RDF and OWL
Information_Retrieval_Models_Nfaoui_El_Habib
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
Web search Technologies
Resource description framework
NE7012- SOCIAL NETWORK ANALYSIS

Similar to An Introduction to Semantic Web Technology

PPT
Semantic web an overview and projects
PPTX
semantic web.pptxThe Semantic Web is an extension of the WWW
PPTX
What happened to the Semantic Web?
PPTX
Semantic Web Landscape 2009
PDF
Semantic we bnext
PPT
Spivack Blogtalk 2008
PPT
Nova Spivack - Semantic Web Talk
 
PPTX
SWT Lecture Session 1 - Introduction
PPTX
(Keynote) Peter Mika - “Making the Web Searchable”
PPTX
Semantic Search keynote at CORIA 2015
PPT
Intelligent expert systems for location planning
PDF
IRJET- Semantic Web Mining and Semantic Search Engine: A Review
PPT
Semantic Web
PPT
The Semantic Web: status and prospects
PDF
Security-Challenges-in-Implementing-Semantic-Web-Unifying-Logic
PPT
Realizing a Semantic Web Application - ICWE 2010 Tutorial
PDF
My understanding semantic web
PPT
Semantic Web
PPTX
Making the Web Searchable - Keynote ICWE 2015
PDF
A short introduction to Semantic Web - 2012
Semantic web an overview and projects
semantic web.pptxThe Semantic Web is an extension of the WWW
What happened to the Semantic Web?
Semantic Web Landscape 2009
Semantic we bnext
Spivack Blogtalk 2008
Nova Spivack - Semantic Web Talk
 
SWT Lecture Session 1 - Introduction
(Keynote) Peter Mika - “Making the Web Searchable”
Semantic Search keynote at CORIA 2015
Intelligent expert systems for location planning
IRJET- Semantic Web Mining and Semantic Search Engine: A Review
Semantic Web
The Semantic Web: status and prospects
Security-Challenges-in-Implementing-Semantic-Web-Unifying-Logic
Realizing a Semantic Web Application - ICWE 2010 Tutorial
My understanding semantic web
Semantic Web
Making the Web Searchable - Keynote ICWE 2015
A short introduction to Semantic Web - 2012

Recently uploaded

PDF
Rolling out Enterprise AI: Tools, Insights, and Team Empowerment
PDF
Transforming Content Operations in the Age of AI
PDF
Mulesoft Meetup Online Portuguese: MCP e IA
PPTX
The power of Slack and MuleSoft | Bangalore MuleSoft Meetup #60
PDF
Oracle MySQL HeatWave - Short - Version 3
PPTX
kernel PPT (Explanation of Windows Kernal).pptx
PDF
Cheryl Hung, Vibe Coding Auth Without Melting Down! isaqb Software Architectu...
PDF
The partnership effect: Libraries and publishers on collaborating and thrivin...
PPTX
"Feelings versus facts: why metrics are more important than intuition", Igor ...
 
PDF
[BDD 2025 - Full-Stack Development] PHP in AI Age: The Laravel Way. (Rizqy Hi...
PDF
Top 10 AI Development Companies in UK 2025
PDF
Mastering UiPath Maestro – Session 2 – Building a Live Use Case - Session 2
PDF
DUBAI IT MODERNIZATION WITH AZURE MANAGED SERVICES.pdf
PDF
ODSC AI West: Agent Optimization: Beyond Context engineering
PDF
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
PDF
Top Crypto Supers 15th Report November 2025
PPTX
How to Choose the Right Vendor for ADA PDF Accessibility and Compliance in 2026
PPTX
Leon Brands - Intro to GPU Occlusion (Graphics Programming Conference 2024)
PDF
Transcript: The partnership effect: Libraries and publishers on collaborating...
PDF
Integrating AI with Meaningful Human Collaboration
Rolling out Enterprise AI: Tools, Insights, and Team Empowerment
Transforming Content Operations in the Age of AI
Mulesoft Meetup Online Portuguese: MCP e IA
The power of Slack and MuleSoft | Bangalore MuleSoft Meetup #60
Oracle MySQL HeatWave - Short - Version 3
kernel PPT (Explanation of Windows Kernal).pptx
Cheryl Hung, Vibe Coding Auth Without Melting Down! isaqb Software Architectu...
The partnership effect: Libraries and publishers on collaborating and thrivin...
"Feelings versus facts: why metrics are more important than intuition", Igor ...
 
[BDD 2025 - Full-Stack Development] PHP in AI Age: The Laravel Way. (Rizqy Hi...
Top 10 AI Development Companies in UK 2025
Mastering UiPath Maestro – Session 2 – Building a Live Use Case - Session 2
DUBAI IT MODERNIZATION WITH AZURE MANAGED SERVICES.pdf
ODSC AI West: Agent Optimization: Beyond Context engineering
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
Top Crypto Supers 15th Report November 2025
How to Choose the Right Vendor for ADA PDF Accessibility and Compliance in 2026
Leon Brands - Intro to GPU Occlusion (Graphics Programming Conference 2024)
Transcript: The partnership effect: Libraries and publishers on collaborating...
Integrating AI with Meaningful Human Collaboration

An Introduction to Semantic Web Technology

  • 1.
  • 2.
    INTRODUCTIONWEB OF DOCUMENTSVS. WEB OF DATA4/4/2016Ankur Biswas 2
  • 3.
    A Walk ThroughBrief History of WorldWide Web• 1969 – ARPANET (Advanced Research Project Agency)launched• In 1980, Tim Berners-Lee built ENQUIRE, as a personaldatabase of people and software models, a way toplay with hypertext; each new page of information inENQUIRE had to be linked to an existing page.• In 1990, Berners-Lee built all the tools necessary forworking Web: HTTP 0.9, HTML, First Web Browser(Web-Editor), the first HTTP server software (CERNhttpd), the first web server (http://info.cern.ch), andthe first Web pages that described the project itself.WWW's historical logo designedby Robert CailliauThe NeXTcube used by TimBerners-Lee at CERN becamethe first Web server.34/4/2016Ankur Biswas
  • 4.
    How big isweb???• As per http://www.worldwidewebsize.com/the Indexed Web contains at least 4.84 billionpages (Thursday, 25 February, 2016).• Early estimates suggested that the deep webis 400 to 550 times larger than the surfaceweb.• Since more information and sites are alwaysbeing added, it can be assumed that the deepweb is growing exponentially at a rate thatcannot be quantified.44/4/2016Ankur Biswas
  • 5.
    Understanding Information inthe WWW• What is important and how do you know?• What is information, what is advertisement?• What does information mean?• How credible or trustworthy is the information?• What is redundant?54/4/2016Ankur Biswas
  • 6.
    Understanding the Importanceof Meaning• SEMANTICS: It is part of the linguistics focused on Sense & Meaning oflanguage or symbols of language.• It is study of interpretation of sign or symbols as used by agents orcommunities within particular circumstances and contexts.• Semantics asks, how sense and meaning of complex concepts can bederived from simple concepts based on the rules of syntax.• The semantics of a message depends of its context and pragmatics†.†Dealing with things sensibly and realistically in a way that is based on practical rather thantheoretical considerations.64/4/2016Ankur Biswas
  • 7.
    • SYNTAX: Ingrammatics denotes the study of the principlesand processes by which sentences are constructed inparticular language.• In formal languages, syntax is just a set of rules, by whichwell formed expressions can be created from a fundamentalset of symbols (alphabet).• In computer science, Syntax defines the normative structureof data.Understanding the Importance of Meaning74/4/2016Ankur Biswas
  • 8.
    Understanding the Importanceof Meaning• CONTEXT: It denotes the surrounding expressions (concepts) in anexpressing represents its relationship with surrounding expressions(concepts) and further related elements.• Context denotes all elements of any sort of communications thatdefine the interpretation of the communicated content e.g.• General contexts: place, time, interrelation of action in message.• Personal or Social contexts: relation between sender and receiver of a message.• PRAGMATICS: It reflects the intention by which the language is used tocommunicate a message.• In linguistic pragmatics denotes the study of applying language indifferent situations It also denotes the intended purpose of speaker.Pragmatics studies the ways in which context contributes to meaning84/4/2016Ankur Biswas
  • 9.
    The limits ofweb• Traditional key based search leads to many irrelevant results.• Ex.- From a simple term Jaguar it is not clear if the user mean car or animal orOS(Mac OS X Jaguar)• POLYSEMY: If you get some result for your search and get some otherresult as well with different meaning having same or similar name.94/4/2016Ankur Biswas
  • 10.
    Problem 1: InformationRetrieval• Jaguar (animal) Panthera Onca• Traditional keyword-based search doesn’t find all results.• Synonyms & metaphors (Not always addressed properly which results undesiredresults)Primary objects: documentsDegree of structure in data: fairly lowImplicit semantics of contentsDesigned for: human consumption4/4/2016Ankur Biswas 10HTML HTML HTMLAPI/XMLA B C DUntyped Links Untyped Links Untyped Links
  • 11.
    Problem 2: InformationExtraction• Identifying contents written in other languages e.g. Japanese orBengali• Pictures doesn’t give any information to search engines that what itshows.• Example – Google identifies the caption or name of the picture whichis embedded in it and makes it a reference keyword.4/4/2016Ankur Biswas 11
  • 12.
    Problem 2: InformationExtraction (Cont.)4/4/2016Ankur Biswas 12HTML HTML HTMLAPI/XMLA B C DUntyped Links Untyped Links Untyped LinksThings ThingsAre two Documentstalking about same“Thing”????????? ?
  • 13.
    • Can onlybe solved, correctly by a human agent• Heterogeneous distribution and order of information.• Software agent does not have sufficient:• Knowledge of contexts• World knowledge and• ExperienceTo solve problemHence it will not be able to solve the problem without explicitsemantic available.Implicit knowledge, i.e. information doesn’t have specified explicitlybut must be derived via logical deductions from available information.4/4/2016Ankur Biswas 13Problem 2: Information Extraction (Cont.)
  • 14.
    The more complexand voluminous a website is , the more complicated is themaintenance of the only weakly structured data.Problems: Syntactic consistency error: You have linked your webpage to anotherwebpage having some related content but now the webpage has moved tosome other place and the link to that address still exist. Semantic (link) consistency error: This is even more dangerous wherehyperlinked destinations is consistently changing. Correctness: It is tough to maintain correctness over time in automatedmanner Timeliness: Tracking the changes over time is really tough.Problem 3: Maintenance4/4/2016Ankur Biswas 14http 404 Error: File/Page not found
  • 15.
    Problem 4: Personalization•Adaption of the presented information content to personalrequirements:User normally password protect their details and hence it becomes tough to accessany such kind of information.• Problems:• From where do we get the required (personal) information?• Personalization vs Data Security4/4/2016Ankur Biswas 15
  • 16.
    INTRODUCTION TOSEMANTIC WEBTECHNOLOGIESTHE VISION OF THE SEMANTIC WEB4/4/2016Ankur Biswas 16
  • 17.
    The vision ofthe Semantic Web4/4/2016Ankur Biswas 17Precondition:• Content can be read andinterpreted correctly(understood) by machinesNatural language Processing• Technologies of TraditionalInformation Retrieval (SearchEngines)Semantic Web concept was first introduced in 1990’s byTim Berners – Lee who is also one of the creator of internet.Semantic Web• Natural language web content willbe explicitly annotated withsemantic metadata• Semantic metadata encode theMeaning (Semantics) of thecontent and can be read andinterpreted correctly by machines.
  • 18.
    How Can weAchieve the Semantic Web? –The Original Vision• Instead of publishing information to be consumed byhumans, publish machine-processable data and metadatausing terms/languages that can be understood by machines.• Build machines (agents) that will search for, query, integrateetc. this data.• Make sure all agents understand your terms/languages.4/4/2016Ankur Biswas 18
  • 19.
    The Semantic Weband Linked Data VisionToday• The Semantic Web is a web of data. There is lots of data we all useevery day, and it is not part of the web.• The Semantic Web is about two things:• It is about common formats for integration and combination of data drawn fromdiverse sources, where on the original Web mainly concentrated on theinterchange of documents.• It is also about language for recording how the data relates to real world objects.• That allows a person, or a machine, to start off in one database, andthen move through an unending set of databases which are connectednot by wires but by being about the same thing.4/4/2016Ankur Biswas 19
  • 20.
    Semantic Web TechnologyStack• Most apps use only a subset ofthe stack• Querying allows fine-graineddata access• Standardized informationexchange is a key• Formats are necessary but nottoo important• The semantic web is based onthe web4/4/2016Ankur Biswas 20
  • 21.
    Basic Layer ofSemantic Web TechnologyStack• The foundation of the layer is World Wide Web. Hence we rely on all technologies inworld wide web.• Semantic version of Wikipedia is DBpedia.• As Wikipedia is having template hence data is somewhat structured.• DBpedia extracts data from Wikipedia infoboxes.• DBpedia is having machine readable language  RDF• Dbpedia stores & publishes the result in RDF and a few other formats.• It also hosts a community effort to define extractors for the data, that can be usedwell beyond Wikipedia.• It provides a number of services around the extracted data, like DBpedia mobile, aSPARQL endpoint, a faceted browser, a number of mappings to external ontologies,an ontology itself, etc.4/4/2016Ankur Biswas 21
  • 22.
    Semantic Web Technologies•A set of technologies and frameworks that enablethe Web of Data:• Resource Description Framework (RDF)• A variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples)• Notations such as RDF Schema (RDFS) and the Web OntologyLanguage (OWL)• All are intended to provide a formal description of concepts, terms,and relationships within a given knowledge domain• Specialized query language (SPARQL) is just like SQL but can be morecomplicated and may be based on graph extraction4/4/2016Ankur Biswas 22
  • 23.
    Application in Webof Data• Linked Data• Linked Open Data (LOD) denote publicly available (RDF) Data in the web,identification via URI and accessible via HTTP. Linked data4/4/2016Ankur Biswas 23Web of Data:• >31 billion Facts• >500 million Links(Oct 2011)
  • 24.
    4/4/2016Ankur Biswas 24Whatis so special about BBC Music Website?• Information is dynamically aggregated fromexternal, publicly available data (Wikipedia,Music Brainz,…)• No Screen Scrapping• No specialized API• Data available as Linked Open Data.• Data access via simple HTTP Request• Data is always up to date without manualinteraction.
  • 25.
    How to buildsuch a site 1.• Site editors roam the Web for new facts• may discover further links while roaming• They update the site manually• And the site gets soon out-of-date4/4/2016Ankur Biswas 25
  • 26.
    How to buildsuch a site 2.• Editors roam the Web for new data published onWeb sites• “Scrape” the sites with a program to extract theinformation• i.e., write some code to incorporate the new data• Easily get out of date again…4/4/2016Ankur Biswas 26
  • 27.
    How to buildsuch a site 3.• Editors roam the Web for new data via API-s• Understand those…• input, output arguments, datatypes used, etc.• Write some code to incorporate the new data• Easily get out of date again…4/4/2016Ankur Biswas 27
  • 28.
    The choice ofthe BBC• Use external, public datasets• Wikipedia, MusicBrainz, …• They are available as data• not API-s or hidden on a Web site• data can be extracted using, e.g., HTTP requests orstandard queries4/4/2016Ankur Biswas 28
  • 29.
  • 30.
    Search Engines –Document Retrieval• General Problems:• Correct interpretation of querystring ->• Somehow the context of user hasto be considered• e.g. what was the query of the userjust before a specific query or theirusual preferences etc.• Correct identification of entities• Automatic disambiguation• Usability• personalization4/4/2016Ankur Biswas 30
  • 31.
    Intelligent Agents inSemantic WebWORLD WIDE WEB SEMANTIC WEB4/4/2016Ankur Biswas 31USERPresentationService (e.g.Firefox)Retrieval Service(e.g. Google)USERPersonalAssistantwww documentswww documentsIntelligentInfrastructureServices
  • 32.
    3 Generations ofWeb Documents4/4/2016Ankur Biswas 32Static WebPagesHTML / CSS1st GenerationVirtualWeb PagesInteractiveWeb PagesJava Script/ AppletsNetbotsInformation ExtractionPresentation PlanningDatabase AccessTemplate BasedGenerationUser ModelMachine LearningOnline LayoutDynamic WebPagesAdaptive WebPages2nd Generation 3rd Generation
  • 33.
    Toolbox for theSemantic Web• Standardized Language to express semantic of information content in theweb (XML/XSD, RDF(S), OWL, RIF)• Tools of semantic information in the web (RDFa, GRDDL,…)• Various Field of computer science:• Artificial Intelligence• Linguistics• Cryptography• Database• Theoretical Computer Science• Computer Architecture• Software Engineering• Systems Theory• Computer Networks4/4/2016Ankur Biswas 33
  • 34.
    Basic Architecture ofSemantic Web - I• Uniform  Different types ofresource identifiers allconstructed according touniform schema.•Resource  Whatever may beidentified by URI•Identifier  To distinguish oneresource from another4/4/2016Ankur Biswas 34
  • 35.
    Uniform Resource Identifier(URI)• A Uniform Resource Identifier (URI) defines a simple and extensibleschema for world wide unique identification of abstract or physicalresources.• Resources can be every object with a clear identity (according to the context ofthe application)• As e.g. webpages, books, locations, persons, relations among objects, abstract concepts,etc.• The concept of URI is already established in various domains as e.g.• The Web(URL (uniform resource locator), PRN (persistent uniform names), pURL(persistent uniform resource locator)• Books & Publications (ISBN, ISSN)• Digital Object Identifier (DOI)4/4/2016Ankur Biswas 35
  • 36.
    Uniform Resource Identifier(URI)• URI Combines• Address (Locator)• Uniform Resource Locator (URL, RFC1738)• Denotes, where a resource can befound in the web by stating itsprimary access mechanism• Might change during life time.• Identity (Name)• Uniform Resource Name (URN, RFC2141)• Persistent Identifier for a webresource• Remains unchanged during life cycle• URI Generic Syntax• Schema: e.g. http, ftp, mailto• Userinfo: e.g. username; password• Host: e.g. Domain name, IPv4/IPv6Address• Port: e.g. :80 stands for http port• Path: e.g. path in file system ofWWW server• Query: e.g. parameters to be passedover to applications• Fragment: e.g. determines specificfragment of a document4/4/2016Ankur Biswas 36URI=schema”://”[userinfo”@”]host[:port][path][“?”query][“#”fragment]
  • 37.
    Data on theWeb is not enough…• We need a proper infrastructure for a real Web ofData• data is available on the Web• accessible via standard Web technologies• data are interlinked over the Web• i.e., data can be integrated over the Web• This is where Semantic Web technologies come in• We will use a simplistic example to introduce themain Semantic Web concepts4/4/2016Ankur Biswas 37
  • 38.
    The rough structureof data integration• Map the various data onto an abstract datarepresentation• make the data independent of its internalrepresentation…• Merge the resulting representations• Start making queries on the whole!• queries not possible on the individual data sets4/4/2016Ankur Biswas 38
  • 39.
    We start witha book...4/4/2016Ankur Biswas 39
  • 40.
    A simplified bookstoredata(dataset “A”)4/4/2016Ankur Biswas 40ID Author Title Publisher YearISBN 0-00-6511409-X id_xyz The Glass Palace id_qpr 2000ID Name Homepageid_xyz Ghosh, Amitav http://www.amitavghosh.comID Publisher’s name Cityid_qpr Harper Collins London
  • 41.
    1st: we exportour data as a set of relations4/4/2016Ankur Biswas 41http://…isbn/000651409XGhosh, Amitav http://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:author
  • 42.
    Some notes onthe exporting the data• Relations form a graph• the nodes refer to the “real” data or contain some literal• how the graph is represented in machine is immaterial for now• Data export does not necessarily mean physical conversion of the data• relations can be generated on-the-fly at query time• via SQL “bridges”• scraping HTML pages• extracting data from Excel sheets• etc.• One can export part of the data4/4/2016Ankur Biswas 42
  • 43.
    Same book inFrench…4/4/2016Ankur Biswas 43
  • 44.
    Another bookstore data(dataset“F”)4/4/2016Ankur Biswas 44A B C D1ID Titre Traducteur Original2ISBN 2020286682 Le Palais des Miroirs $A12$ ISBN 0-00-6511409-X3456ID Auteur7ISBN 0-00-6511409-X $A11$8910Nom11Ghosh, Amitav12Besse, Christianne
  • 45.
    2nd: export yoursecond set of data4/4/2016Ankur Biswas 45http://…isbn/000651409XGhosh, AmitavBesse, ChristianneLe palais des miroirsf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nom
  • 46.
    3rd: start mergingyour data4/4/2016Ankur Biswas 46http://…isbn/000651409XGhosh, AmitavBesse, ChristianneLe palais des miroirsf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nomhttp://…isbn/000651409XGhosh, Amitavhttp://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:authorSame URI!
  • 47.
    3rd: start mergingyour data4/4/2016Ankur Biswas 47Ghosh, AmitavBesse, ChristianneLe palais des miroirsf:originalf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nomGhosh, Amitavhttp://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:authorhttp://…isbn/000651409X
  • 48.
    Start making queries…•User of data “F” can now ask queries like:• “give me the title of the original”• well, … « donnes-moi le titre de l’original »• This information is not in the dataset “F”…• …but can be retrieved by merging with dataset “A”!4/4/2016Ankur Biswas 48
  • 49.
    However, more canbe achieved…• We “feel” that a:author and f:auteur should be the same• But an automatic merge does not know that!• Let us add some extra information to the merged data:• a:author same as f:auteur• both identify a “Person”• a term that a community may have already defined:• a “Person” is uniquely identified by his/her name and, say, homepage• it can be used as a “category” for certain type of resources4/4/2016Ankur Biswas 49
  • 50.
    3rd revisited: usethe extra knowledge4/4/2016Ankur Biswas 50Besse, ChristianneLe palais des miroirsf:originalf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nomGhosh, Amitavhttp://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:authorhttp://…isbn/000651409Xhttp://…foaf/Personr:typer:typef:auteura:namea:homepagef:auteura:namea:homepagef:originalf:traducteurf:nomr:typef:auteura:namea:homepage
  • 51.
    Start making richerqueries!• User of dataset “F” can now query:• “donnes-moi la page d’accueil de l’auteur de l’original”• well… “give me the home page of the original’s ‘auteur’”• The information is not in datasets “F” or “A”…• …but was made available by:• merging datasets “A” and datasets “F”• adding three simple extra statements as an extra “glue”4/4/2016Ankur Biswas 51
  • 52.
    Combine with differentdatasets• Using, e.g., the “Person”, the dataset can be combined withother sources• For example, data in Wikipedia can be extracted usingdedicated tools• e.g., the “dbpedia” project can extract the “infobox” informationfrom Wikipedia already…4/4/2016Ankur Biswas 52
  • 53.
    Merge with Wikipediadata4/4/2016Ankur Biswas 53Besse, ChristianneLe palais des miroirsf:originalf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nomGhosh, Amitav http://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:authorhttp://…isbn/000651409Xhttp://…foaf/Personr:typer:typehttp://dbpedia.org/../Amitav_Ghoshhttp://dbpedia.org/../The_Hungry_Tidehttp://dbpedia.org/../The_Calcutta_Chromosomehttp://dbpedia.org/../The_Glass_Palacer:typefoaf:namew:referencew:author_ofw:author_ofw:isbna:authorf:originalf:traducteurf:nomr:typew:isbnhttp://dbpedia.org/../Kolkataw:author_ofw:born_inw:longw:lat
  • 54.
    Search Engines –FactRetrieval4/4/2016Ankur Biswas 54Query String: International Space Station - 17thMarch 2016• What is International Space Station?• Is it orbiting on 17th March 2016?• How to compute the position of satellite on thesaid date• External Data to be considered:• Constellation data• Planet data• Satellite dataQuery String: International Space Station - 17thMarch 2016• What is International Space Station?• Is it orbiting on 17th March 2016?• How to compute the position of satellite on thesaid date• External Data to be considered:• Constellation data• Planet data• Satellite data
  • 55.
    RDFRDF stands for•Resource: pages, dogs, ideas...everything that can have a URI• Description: attributes, features, andrelations of the resources• Framework: model, languages andsyntaxes for these descriptions•RDF is a triple model i.e. every piece ofknowledge is broken down into( subject , predicate , object )4/4/2016Ankur Biswas 55
  • 56.
    RDF4/4/2016Ankur Biswas 56•doc.html has for author Ankurand has for theme Research• doc.html has for author Ankurdoc.html has for theme Research• ( doc.html , author , Ankur )( doc.html , theme , Research )( subject , predicate , object )
  • 57.
    4/4/2016Ankur Biswas 57RDFisalso a graph model to link the descriptions of resourcesRDF triples can be seen as arcsof a graph (vertex,edge,vertex)AnkurDoc.htmlResearchAuthor Theme
  • 58.
    Resource Description Framework(RDF)• Another Triple Model:4/4/2016Ankur Biswas 58Subject Predicate ObjectRenee Miller Teaches CSC433Renee Miller Lives in Toronto<URI> <URI> <URI> or “Literal”<http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> <http://dbpedia.org/resource/Toronto><http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> Torontobb: renee-j-millerdbpedia: Toront0foaf: PersonRenee J. Millerrdf: typefoaf: namefoaf: based_nearbb: renee-j-millerbb: renee-j-millerbb: renee-j-millerfoaf: namebb: renee-j-millerfoaf: Friend of a Friend
  • 59.
    A Simple RDFExample (in RDF/XML)<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:foaf="http://xmlns.com/foaf/spec/#"xmlns:bb="http://data.bibbase.org/ontology/"><rdf:Description rdf:about="http://.../author/renee-j-miller/"><rdf:type rdf:resource="http://xmlns.com/foaf/spec/#term_Person"/><foaf:name xml:lang=“en">Renée J. Miller</foaf:name><foaf:based_nearrdf:resource="http://dbpedia.org/resource/Toronto"/></rdf:Description></rdf:RDF>4/4/2016Ankur Biswas 59dbpedia: Toront0foaf: PersonRenee J. Millerrdf: typefoaf: based_nearfoaf: namebb: renee-j-miller
  • 60.
    A Simple RDFExample (in Turtle)@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix foaf: <http://xmlns.com/foaf/spec/#> .@prefix bb: <http://data.bibbase.org/ontology/> .<http://data.bibbase.org/author/renee-j-miller/>rdf:type foaf:person .foaf:name “Renée J. Miller”@en ;foaf:based_near <http://dbpedia.org/resource/Toronto>4/4/2016Ankur Biswas 60dbpedia: Toront0foaf: PersonRenee J. Millerrdf: typefoaf: based_nearfoaf: namebb: renee-j-miller
  • 61.
    A Simple RDFExample (in RDFa)…<p about="http://.../author/renee-j-miller">The author“<span property=“foaf:name” lang=“en”>Renée J. Miller</span>”lives in the city“<span rel=“foaf:based_near“resource="http://…/Toronto">Toronto</span>”</p> .…4/4/2016Ankur Biswas 61dbpedia: Toront0foaf: PersonRenee J. Millerrdf: typefoaf: based_nearfoaf: namebb: renee-j-miller
  • 62.
    • SPARQL standsfor “SPARQL Protocoland RDF Query Language”.• It is the standard query language forRDF data proposed by the W3C.• It is based on matching graphpatterns against RDF graphs.• The simplest kind of graph pattern isa triple pattern.– A triple pattern is like an RDFtriple, but with the option of avariable in the subject, predicate orobject positions.4/4/2016Ankur Biswas 62
  • 63.
    Example Dataset@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix foaf: <http://xmlns.com/foaf/spec/#> .@prefix bb: <http://data.bibbase.org/ontology/> .<http://data.bibbase.org/author/renee-j-miller/>rdf:type foaf:person .foaf:name “Renée J. Miller”@en ;foaf:based_near [ rdf: type foaf:Place;foaf:name “Toronto”] .4/4/2016Ankur Biswas 63
  • 64.
    Example SPARQL QuerySELECT?nameWHERE { ?x foaf:name ?name .?x rdf:type foaf:Person .?x foaf:based_near ?y .?y foaf:name “Toronto” .}• Result4/4/2016Ankur Biswas 64?name“Renée J. Miller”
  • 65.
  • 66.
  • 67.
    SPARQL 1.0 allows•Extraction of Data as• URIs, Blank Nodes, typed & un-typed Literals• RDF Subgraphs• Exploration of data via Query for unknown relations.• Execution of complex join operations heterogeneous databases in asingle query• Transformation of RDF Data from one Vocabulary to another• Construction of new RDF Graphs based on RDF Query Subgraph4/4/2016Ankur Biswas 67
  • 68.
    SPARQL 1.1 (inprogress) allows• Additional Query Features• Aggregate function, subqueries, negations, project expressions, property paths,• Enables logical Entailment for• RDF, RDFS, OWL Direct & RDFS – Based Semantic entailment and RIF Coreentailment• Enables update of RDF graphs as a full data manipulation language• Enables the discovery of information about the SPARQL service• Enables Federated Queries distributed over different SPARQL.4/4/2016Ankur Biswas 68
  • 69.
    SPARQL usage inpractice• SPARQL is usually used over the network• Separate documents define the protocol and the result format• SPARQL Protocol for RDF with HTTP and SOAP bindings• SPARQL results in XML or JSON formats• Big datasets often offer “SPARQL endpoints” using thisprotocol• Typical example: SPARQL endpoint to DBpedia4/4/2016Ankur Biswas 69
  • 70.
    SPARQL as aunifying point4/4/2016Ankur Biswas 70ApplicationsSPARQL ProcessorRDF GraphHTMLNLPTechniqueRelational DatabaseSQL⇔RDFDatabaseSPARQLEndpointSPARQLEndpointTriple StoreUnstructured Text XML/XHTMLBased on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/
  • 71.
    Other Semantic WebTechnologies• Web Ontology Language (OWL)• A family of knowledge representation languages for authoring ontologies forthe Web• RDF Schema (RDFS)• RDF Vocabulary Description Language• http://www.w3.org/TR/rdf-schema/• How to use RDF to describe RDF vocabularies• Other RDF Vocabularies• Simple Knowledge Organization System (SKOS)• Designed for representation of thesauri, classification schemes, taxonomies,subject-heading systems, or any other type of structured controlledvocabulary• FOAF (Friend of a friend)• A machine-readable ontology describing persons, their activities and theirrelations to other people and object4/4/2016Ankur Biswas 71
  • 72.
  • 73.
    Ontologies• An ontologyis a formal, explicit, shared specification of aconceptualization of a domain (Gruber, 1993).• Conceptualization: the objects, concepts, and other entities that areassumed to exist in some area of interest and the relationships thathold among them. A conceptualization is an abstract, simplified viewof the world that we wish to represent for some purpose.• The term ontology is borrowed from Philosophy, where ontology is asystematic account of existence (what things exist, how they can bedifferentiated from each other etc.).• Today the word ontology is a synonym for a shared knowledge base.4/4/2016Ankur Biswas 73
  • 74.
    Ontologies – Components& Models• Classes, Relations & Instances• Classes represent concepts• Classes are described byattributes• Attributes are name value pairs4/4/2016Ankur Biswas 74The address contains the name, title andplace of address of a personSemi - Informal DescriptionAddress First name <string> Family name <string> Street <string> PIN Code <int> City <string> …Informal Description
  • 75.
  • 76.
    Very Large Ontologies•Recently there has been a lot of work on developing very largeontologies that capture various areas of human knowledge anddeploying this knowledge in applications such as search engines orquestion answering.• Example: Watson, IBM’s question answering system that beat humansin the quiz show Jeopardy (http://www-03.ibm.com/innovation/us/watson/index.html ).4/4/2016Ankur Biswas 76
  • 77.
    5 Open Data– by Tim Berners-Lee• Tim Berners-Lee, the inventor of the Web and Linked Data initiator,suggested a 5-star deployment scheme for Open Data. Here, we giveexamples for each step of the stars and explain costs and benefits thatcome along with it.4/4/2016Ankur Biswas 77
  • 78.
    BY EXAMPLE …makeyour stuff available on the Web (whatever format) under an openlicensemake it available as structured data (e.g., Excel instead of image scan of atable)make it available in a non-proprietary open format (e.g., CSV as well as ofExcel)use URIs to denote things, so that people can point at your stufflink your data to other data to provide context4/4/2016Ankur Biswas 78
  • 79.
    What are thecosts & benefits of ★ Webdata?• As a consumer …• You can look at it.• You can print it.• You can store it locally (on your hard drive or on an USB stick).• You can enter the data into any other system.• You can change the data as you wish.• You can share the data with anyone you like.• As a publisher …• It’s simple to publish.• You do not have explain repeatedly to others that they can use your data.4/4/2016Ankur Biswas 79
  • 80.
    What are thecosts & benefits of ★★ Webdata?• As a consumer, you can do all what you can do with ★ Webdata and additionally:• You can directly process it with proprietary software to aggregate it,perform calculations, visualize it, etc.• You can export it into another (structured) format.• As a publisher …• It’s still simple to publish.4/4/2016Ankur Biswas 80
  • 81.
    What are thecosts & benefits of ★★★ Webdata?• As a consumer, you can do all what you can dowith ★★ Web data and additionally:• You can manipulate the data in any way you like, without the needto own any proprietary software package.• As a publisher …• You might need converters or plug-ins to export the data from theproprietary format.• It’s still rather simple to publish.4/4/2016Ankur Biswas 81
  • 82.
    What are thecosts & benefits of ★★★★ Webdata?• As a consumer, you can do all what you can do with ★★★ Web data and additionally:• You can link to it from any other place (on the Web or locally).• You can bookmark it.• You can reuse parts of the data.• You may be able to reuse existing tools and libraries, even if they only understand parts of the patternthe publisher used.• Understanding the structure of an RDF “Graph” of data can be more effort than tabular (Excel/CSV) ortree (XML/JSON) data.• You can combine the data safely with other data. URIs are a global scheme so if two things have thesame URI then it’s intentional, and if so that’s well on it’s way to being 5-star data!• As a publisher …• You have fine-granular control over the data items and can optimize their access (load balancing,caching, etc.)• Other data publishers can now link into your data, promoting it to 5 star!• You typically invest some time slicing and dicing your data.• You’ll need to assign URIs to data items and think about how to represent the data.• You need to either find existing patterns to reuse or create your own.4/4/2016Ankur Biswas 82
  • 83.
    What are thecosts & benefits of ★★★★★ Webdata?• As a consumer, you can do all what you can do with ★★★★ Web data andadditionally:• You can discover more (related) data while consuming the data.You can directly learn about the data schema.• You now have to deal with broken data links, just like 404 errors in web pages.• Presenting data from an arbitrary link as fact is as risky as letting people includecontent from any website in your pages. Caution, trust and common sense are all stillnecessary.• As a publisher …• You make your data discoverable.• You increase the value of your data.• Your own organization will gain the same benefits from the links as the consumers.• You’ll need to invest resources to link your data to other data on the Web.• You may need to repair broken or incorrect links.4/4/2016Ankur Biswas 83
  • 84.
    Applications• Data integration(e.g., see project Optique http://www.optique-project.eu/)• E-government (e.g., open data)• E-commerce• Tourism• Medicine• Biology• Earth Observation (see the work of my group in projects TELEIOShttp://www.earthobservatory.eu/ and LEOhttp://www.linkedeodata.eu/ ).• …4/4/2016Ankur Biswas 84
  • 85.
    References:• Books:• Antoniou,Grigoris, and F. Van Harmelet. "A semantic web premier." England: The MIT PressCambridge (2004).• Segaran, Toby, Colin Evans, and Jamie Taylor. Programming the semantic web. " O'Reilly Media, Inc.", 2009.• Davies, John, Dieter Fensel, and Frank Van Harmelen. "Towards the semantic web." Ontology-DrivenKnowledge Management. Chichester (2003).• Scientific Papers:• Maedche, Alexander. Ontology learning for the semantic web. Vol. 665. Springer Science & Business Media,2012.• Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the linked data best practices indifferent topical domains." The semantic web–ISWC 2014. Springer International Publishing, 2014. 245-260.• Video Lectures & Slides• Video lectures on Semantic Web by Dr. Harald Sack, Hasso Plattner Institute, University in Potsdam, Germany• www.cs.toronto.edu/~oktie/slides/web-of-data-intro.pdf• https://www.w3.org/2010/Talks/0622-SemTech-IH/• Websites• http://dbpedia.org/snorql/• http://5stardata.info/en/4/4/2016Ankur Biswas 85
  • 86.

[8]ページ先頭

©2009-2025 Movatter.jp