Movatterモバイル変換

SEMANTIC WEBUNDERSTANDING IN BRIEF

INTRODUCTIONWEB OF DOCUMENTS VS. WEB OF DATA4/4/2016Ankur Biswas 2

A Walk Through Brief History of WorldWide Web• 1969 – ARPANET (Advanced Research Project Agency)launched• In 1980, Tim Berners-Lee built ENQUIRE, as a personaldatabase of people and software models, a way toplay with hypertext; each new page of information inENQUIRE had to be linked to an existing page.• In 1990, Berners-Lee built all the tools necessary forworking Web: HTTP 0.9, HTML, First Web Browser(Web-Editor), the first HTTP server software (CERNhttpd), the first web server (http://info.cern.ch), andthe first Web pages that described the project itself.WWW's historical logo designedby Robert CailliauThe NeXTcube used by TimBerners-Lee at CERN becamethe first Web server.34/4/2016Ankur Biswas

How big is web???• As per http://www.worldwidewebsize.com/the Indexed Web contains at least 4.84 billionpages (Thursday, 25 February, 2016).• Early estimates suggested that the deep webis 400 to 550 times larger than the surfaceweb.• Since more information and sites are alwaysbeing added, it can be assumed that the deepweb is growing exponentially at a rate thatcannot be quantified.44/4/2016Ankur Biswas

Understanding Information in the WWW• What is important and how do you know?• What is information, what is advertisement?• What does information mean?• How credible or trustworthy is the information?• What is redundant?54/4/2016Ankur Biswas

Understanding the Importance of Meaning• SEMANTICS: It is part of the linguistics focused on Sense & Meaning oflanguage or symbols of language.• It is study of interpretation of sign or symbols as used by agents orcommunities within particular circumstances and contexts.• Semantics asks, how sense and meaning of complex concepts can bederived from simple concepts based on the rules of syntax.• The semantics of a message depends of its context and pragmatics†.†Dealing with things sensibly and realistically in a way that is based on practical rather thantheoretical considerations.64/4/2016Ankur Biswas

• SYNTAX: In grammatics denotes the study of the principlesand processes by which sentences are constructed inparticular language.• In formal languages, syntax is just a set of rules, by whichwell formed expressions can be created from a fundamentalset of symbols (alphabet).• In computer science, Syntax defines the normative structureof data.Understanding the Importance of Meaning74/4/2016Ankur Biswas

Understanding the Importance of Meaning• CONTEXT: It denotes the surrounding expressions (concepts) in anexpressing represents its relationship with surrounding expressions(concepts) and further related elements.• Context denotes all elements of any sort of communications thatdefine the interpretation of the communicated content e.g.• General contexts: place, time, interrelation of action in message.• Personal or Social contexts: relation between sender and receiver of a message.• PRAGMATICS: It reflects the intention by which the language is used tocommunicate a message.• In linguistic pragmatics denotes the study of applying language indifferent situations It also denotes the intended purpose of speaker.Pragmatics studies the ways in which context contributes to meaning84/4/2016Ankur Biswas

The limits of web• Traditional key based search leads to many irrelevant results.• Ex.- From a simple term Jaguar it is not clear if the user mean car or animal orOS(Mac OS X Jaguar)• POLYSEMY: If you get some result for your search and get some otherresult as well with different meaning having same or similar name.94/4/2016Ankur Biswas

Problem 1: Information Retrieval• Jaguar (animal) Panthera Onca• Traditional keyword-based search doesn’t find all results.• Synonyms & metaphors (Not always addressed properly which results undesiredresults)Primary objects: documentsDegree of structure in data: fairly lowImplicit semantics of contentsDesigned for: human consumption4/4/2016Ankur Biswas 10HTML HTML HTMLAPI/XMLA B C DUntyped Links Untyped Links Untyped Links

Problem 2: Information Extraction• Identifying contents written in other languages e.g. Japanese orBengali• Pictures doesn’t give any information to search engines that what itshows.• Example – Google identifies the caption or name of the picture whichis embedded in it and makes it a reference keyword.4/4/2016Ankur Biswas 11

Problem 2: Information Extraction (Cont.)4/4/2016Ankur Biswas 12HTML HTML HTMLAPI/XMLA B C DUntyped Links Untyped Links Untyped LinksThings ThingsAre two Documentstalking about same“Thing”????????? ?

• Can only be solved, correctly by a human agent• Heterogeneous distribution and order of information.• Software agent does not have sufficient:• Knowledge of contexts• World knowledge and• ExperienceTo solve problemHence it will not be able to solve the problem without explicitsemantic available.Implicit knowledge, i.e. information doesn’t have specified explicitlybut must be derived via logical deductions from available information.4/4/2016Ankur Biswas 13Problem 2: Information Extraction (Cont.)

The more complex and voluminous a website is , the more complicated is themaintenance of the only weakly structured data.Problems: Syntactic consistency error: You have linked your webpage to anotherwebpage having some related content but now the webpage has moved tosome other place and the link to that address still exist. Semantic (link) consistency error: This is even more dangerous wherehyperlinked destinations is consistently changing. Correctness: It is tough to maintain correctness over time in automatedmanner Timeliness: Tracking the changes over time is really tough.Problem 3: Maintenance4/4/2016Ankur Biswas 14http 404 Error: File/Page not found

Problem 4: Personalization• Adaption of the presented information content to personalrequirements:User normally password protect their details and hence it becomes tough to accessany such kind of information.• Problems:• From where do we get the required (personal) information?• Personalization vs Data Security4/4/2016Ankur Biswas 15

INTRODUCTION TOSEMANTIC WEB TECHNOLOGIESTHE VISION OF THE SEMANTIC WEB4/4/2016Ankur Biswas 16

The vision of the Semantic Web4/4/2016Ankur Biswas 17Precondition:• Content can be read andinterpreted correctly(understood) by machinesNatural language Processing• Technologies of TraditionalInformation Retrieval (SearchEngines)Semantic Web concept was first introduced in 1990’s byTim Berners – Lee who is also one of the creator of internet.Semantic Web• Natural language web content willbe explicitly annotated withsemantic metadata• Semantic metadata encode theMeaning (Semantics) of thecontent and can be read andinterpreted correctly by machines.

How Can we Achieve the Semantic Web? –The Original Vision• Instead of publishing information to be consumed byhumans, publish machine-processable data and metadatausing terms/languages that can be understood by machines.• Build machines (agents) that will search for, query, integrateetc. this data.• Make sure all agents understand your terms/languages.4/4/2016Ankur Biswas 18

The Semantic Web and Linked Data VisionToday• The Semantic Web is a web of data. There is lots of data we all useevery day, and it is not part of the web.• The Semantic Web is about two things:• It is about common formats for integration and combination of data drawn fromdiverse sources, where on the original Web mainly concentrated on theinterchange of documents.• It is also about language for recording how the data relates to real world objects.• That allows a person, or a machine, to start off in one database, andthen move through an unending set of databases which are connectednot by wires but by being about the same thing.4/4/2016Ankur Biswas 19

Semantic Web Technology Stack• Most apps use only a subset ofthe stack• Querying allows fine-graineddata access• Standardized informationexchange is a key• Formats are necessary but nottoo important• The semantic web is based onthe web4/4/2016Ankur Biswas 20

Basic Layer of Semantic Web TechnologyStack• The foundation of the layer is World Wide Web. Hence we rely on all technologies inworld wide web.• Semantic version of Wikipedia is DBpedia.• As Wikipedia is having template hence data is somewhat structured.• DBpedia extracts data from Wikipedia infoboxes.• DBpedia is having machine readable language  RDF• Dbpedia stores & publishes the result in RDF and a few other formats.• It also hosts a community effort to define extractors for the data, that can be usedwell beyond Wikipedia.• It provides a number of services around the extracted data, like DBpedia mobile, aSPARQL endpoint, a faceted browser, a number of mappings to external ontologies,an ontology itself, etc.4/4/2016Ankur Biswas 21

Semantic Web Technologies• A set of technologies and frameworks that enablethe Web of Data:• Resource Description Framework (RDF)• A variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples)• Notations such as RDF Schema (RDFS) and the Web OntologyLanguage (OWL)• All are intended to provide a formal description of concepts, terms,and relationships within a given knowledge domain• Specialized query language (SPARQL) is just like SQL but can be morecomplicated and may be based on graph extraction4/4/2016Ankur Biswas 22

Application in Web of Data• Linked Data• Linked Open Data (LOD) denote publicly available (RDF) Data in the web,identification via URI and accessible via HTTP. Linked data4/4/2016Ankur Biswas 23Web of Data:• >31 billion Facts• >500 million Links(Oct 2011)

4/4/2016Ankur Biswas 24What is so special about BBC Music Website?• Information is dynamically aggregated fromexternal, publicly available data (Wikipedia,Music Brainz,…)• No Screen Scrapping• No specialized API• Data available as Linked Open Data.• Data access via simple HTTP Request• Data is always up to date without manualinteraction.

How to build such a site 1.• Site editors roam the Web for new facts• may discover further links while roaming• They update the site manually• And the site gets soon out-of-date4/4/2016Ankur Biswas 25

How to build such a site 2.• Editors roam the Web for new data published onWeb sites• “Scrape” the sites with a program to extract theinformation• i.e., write some code to incorporate the new data• Easily get out of date again…4/4/2016Ankur Biswas 26

How to build such a site 3.• Editors roam the Web for new data via API-s• Understand those…• input, output arguments, datatypes used, etc.• Write some code to incorporate the new data• Easily get out of date again…4/4/2016Ankur Biswas 27

The choice of the BBC• Use external, public datasets• Wikipedia, MusicBrainz, …• They are available as data• not API-s or hidden on a Web site• data can be extracted using, e.g., HTTP requests orstandard queries4/4/2016Ankur Biswas 28

Its all documented4/4/2016Ankur Biswas 29

Search Engines – Document Retrieval• General Problems:• Correct interpretation of querystring ->• Somehow the context of user hasto be considered• e.g. what was the query of the userjust before a specific query or theirusual preferences etc.• Correct identification of entities• Automatic disambiguation• Usability• personalization4/4/2016Ankur Biswas 30

Intelligent Agents in Semantic WebWORLD WIDE WEB SEMANTIC WEB4/4/2016Ankur Biswas 31USERPresentationService (e.g.Firefox)Retrieval Service(e.g. Google)USERPersonalAssistantwww documentswww documentsIntelligentInfrastructureServices

3 Generations of Web Documents4/4/2016Ankur Biswas 32Static WebPagesHTML / CSS1st GenerationVirtualWeb PagesInteractiveWeb PagesJava Script/ AppletsNetbotsInformation ExtractionPresentation PlanningDatabase AccessTemplate BasedGenerationUser ModelMachine LearningOnline LayoutDynamic WebPagesAdaptive WebPages2nd Generation 3rd Generation

Toolbox for the Semantic Web• Standardized Language to express semantic of information content in theweb (XML/XSD, RDF(S), OWL, RIF)• Tools of semantic information in the web (RDFa, GRDDL,…)• Various Field of computer science:• Artificial Intelligence• Linguistics• Cryptography• Database• Theoretical Computer Science• Computer Architecture• Software Engineering• Systems Theory• Computer Networks4/4/2016Ankur Biswas 33

Basic Architecture of Semantic Web - I• Uniform  Different types ofresource identifiers allconstructed according touniform schema.•Resource  Whatever may beidentified by URI•Identifier  To distinguish oneresource from another4/4/2016Ankur Biswas 34

Uniform Resource Identifier (URI)• A Uniform Resource Identifier (URI) defines a simple and extensibleschema for world wide unique identification of abstract or physicalresources.• Resources can be every object with a clear identity (according to the context ofthe application)• As e.g. webpages, books, locations, persons, relations among objects, abstract concepts,etc.• The concept of URI is already established in various domains as e.g.• The Web(URL (uniform resource locator), PRN (persistent uniform names), pURL(persistent uniform resource locator)• Books & Publications (ISBN, ISSN)• Digital Object Identifier (DOI)4/4/2016Ankur Biswas 35

Uniform Resource Identifier (URI)• URI Combines• Address (Locator)• Uniform Resource Locator (URL, RFC1738)• Denotes, where a resource can befound in the web by stating itsprimary access mechanism• Might change during life time.• Identity (Name)• Uniform Resource Name (URN, RFC2141)• Persistent Identifier for a webresource• Remains unchanged during life cycle• URI Generic Syntax• Schema: e.g. http, ftp, mailto• Userinfo: e.g. username; password• Host: e.g. Domain name, IPv4/IPv6Address• Port: e.g. :80 stands for http port• Path: e.g. path in file system ofWWW server• Query: e.g. parameters to be passedover to applications• Fragment: e.g. determines specificfragment of a document4/4/2016Ankur Biswas 36URI=schema”://”[userinfo”@”]host[:port][path][“?”query][“#”fragment]

Data on the Web is not enough…• We need a proper infrastructure for a real Web ofData• data is available on the Web• accessible via standard Web technologies• data are interlinked over the Web• i.e., data can be integrated over the Web• This is where Semantic Web technologies come in• We will use a simplistic example to introduce themain Semantic Web concepts4/4/2016Ankur Biswas 37

The rough structure of data integration• Map the various data onto an abstract datarepresentation• make the data independent of its internalrepresentation…• Merge the resulting representations• Start making queries on the whole!• queries not possible on the individual data sets4/4/2016Ankur Biswas 38

We start with a book...4/4/2016Ankur Biswas 39

A simplified bookstore data(dataset “A”)4/4/2016Ankur Biswas 40ID Author Title Publisher YearISBN 0-00-6511409-X id_xyz The Glass Palace id_qpr 2000ID Name Homepageid_xyz Ghosh, Amitav http://www.amitavghosh.comID Publisher’s name Cityid_qpr Harper Collins London

1st: we export our data as a set of relations4/4/2016Ankur Biswas 41http://…isbn/000651409XGhosh, Amitav http://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:author

Some notes on the exporting the data• Relations form a graph• the nodes refer to the “real” data or contain some literal• how the graph is represented in machine is immaterial for now• Data export does not necessarily mean physical conversion of the data• relations can be generated on-the-fly at query time• via SQL “bridges”• scraping HTML pages• extracting data from Excel sheets• etc.• One can export part of the data4/4/2016Ankur Biswas 42

Same book in French…4/4/2016Ankur Biswas 43

Another bookstore data(dataset “F”)4/4/2016Ankur Biswas 44A B C D1ID Titre Traducteur Original2ISBN 2020286682 Le Palais des Miroirs $A12$ ISBN 0-00-6511409-X3456ID Auteur7ISBN 0-00-6511409-X $A11$8910Nom11Ghosh, Amitav12Besse, Christianne

2nd: export your second set of data4/4/2016Ankur Biswas 45http://…isbn/000651409XGhosh, AmitavBesse, ChristianneLe palais des miroirsf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nom

3rd: start merging your data4/4/2016Ankur Biswas 46http://…isbn/000651409XGhosh, AmitavBesse, ChristianneLe palais des miroirsf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nomhttp://…isbn/000651409XGhosh, Amitavhttp://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:authorSame URI!

3rd: start merging your data4/4/2016Ankur Biswas 47Ghosh, AmitavBesse, ChristianneLe palais des miroirsf:originalf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nomGhosh, Amitavhttp://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:authorhttp://…isbn/000651409X

Start making queries…• User of data “F” can now ask queries like:• “give me the title of the original”• well, … « donnes-moi le titre de l’original »• This information is not in the dataset “F”…• …but can be retrieved by merging with dataset “A”!4/4/2016Ankur Biswas 48

However, more can be achieved…• We “feel” that a:author and f:auteur should be the same• But an automatic merge does not know that!• Let us add some extra information to the merged data:• a:author same as f:auteur• both identify a “Person”• a term that a community may have already defined:• a “Person” is uniquely identified by his/her name and, say, homepage• it can be used as a “category” for certain type of resources4/4/2016Ankur Biswas 49

3rd revisited: use the extra knowledge4/4/2016Ankur Biswas 50Besse, ChristianneLe palais des miroirsf:originalf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nomGhosh, Amitavhttp://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:authorhttp://…isbn/000651409Xhttp://…foaf/Personr:typer:typef:auteura:namea:homepagef:auteura:namea:homepagef:originalf:traducteurf:nomr:typef:auteura:namea:homepage

Start making richer queries!• User of dataset “F” can now query:• “donnes-moi la page d’accueil de l’auteur de l’original”• well… “give me the home page of the original’s ‘auteur’”• The information is not in datasets “F” or “A”…• …but was made available by:• merging datasets “A” and datasets “F”• adding three simple extra statements as an extra “glue”4/4/2016Ankur Biswas 51

Combine with different datasets• Using, e.g., the “Person”, the dataset can be combined withother sources• For example, data in Wikipedia can be extracted usingdedicated tools• e.g., the “dbpedia” project can extract the “infobox” informationfrom Wikipedia already…4/4/2016Ankur Biswas 52

Merge with Wikipedia data4/4/2016Ankur Biswas 53Besse, ChristianneLe palais des miroirsf:originalf:nomf:traducteurf:auteurhttp://…isbn/2020386682f:nomGhosh, Amitav http://www.amitavghosh.comThe Glass Palace2000LondonHarper Collinsa:namea:homepagea:authorhttp://…isbn/000651409Xhttp://…foaf/Personr:typer:typehttp://dbpedia.org/../Amitav_Ghoshhttp://dbpedia.org/../The_Hungry_Tidehttp://dbpedia.org/../The_Calcutta_Chromosomehttp://dbpedia.org/../The_Glass_Palacer:typefoaf:namew:referencew:author_ofw:author_ofw:isbna:authorf:originalf:traducteurf:nomr:typew:isbnhttp://dbpedia.org/../Kolkataw:author_ofw:born_inw:longw:lat

Search Engines – FactRetrieval4/4/2016Ankur Biswas 54Query String: International Space Station - 17thMarch 2016• What is International Space Station?• Is it orbiting on 17th March 2016?• How to compute the position of satellite on thesaid date• External Data to be considered:• Constellation data• Planet data• Satellite dataQuery String: International Space Station - 17thMarch 2016• What is International Space Station?• Is it orbiting on 17th March 2016?• How to compute the position of satellite on thesaid date• External Data to be considered:• Constellation data• Planet data• Satellite data

RDFRDF stands for• Resource: pages, dogs, ideas...everything that can have a URI• Description: attributes, features, andrelations of the resources• Framework: model, languages andsyntaxes for these descriptions•RDF is a triple model i.e. every piece ofknowledge is broken down into( subject , predicate , object )4/4/2016Ankur Biswas 55

RDF4/4/2016Ankur Biswas 56• doc.html has for author Ankurand has for theme Research• doc.html has for author Ankurdoc.html has for theme Research• ( doc.html , author , Ankur )( doc.html , theme , Research )( subject , predicate , object )

4/4/2016Ankur Biswas 57RDFis also a graph model to link the descriptions of resourcesRDF triples can be seen as arcsof a graph (vertex,edge,vertex)AnkurDoc.htmlResearchAuthor Theme

Resource Description Framework (RDF)• Another Triple Model:4/4/2016Ankur Biswas 58Subject Predicate ObjectRenee Miller Teaches CSC433Renee Miller Lives in Toronto<URI> <URI> <URI> or “Literal”<http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> <http://dbpedia.org/resource/Toronto><http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> Torontobb: renee-j-millerdbpedia: Toront0foaf: PersonRenee J. Millerrdf: typefoaf: namefoaf: based_nearbb: renee-j-millerbb: renee-j-millerbb: renee-j-millerfoaf: namebb: renee-j-millerfoaf: Friend of a Friend

A Simple RDF Example (in RDF/XML)<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:foaf="http://xmlns.com/foaf/spec/#"xmlns:bb="http://data.bibbase.org/ontology/"><rdf:Description rdf:about="http://.../author/renee-j-miller/"><rdf:type rdf:resource="http://xmlns.com/foaf/spec/#term_Person"/><foaf:name xml:lang=“en">Renée J. Miller</foaf:name><foaf:based_nearrdf:resource="http://dbpedia.org/resource/Toronto"/></rdf:Description></rdf:RDF>4/4/2016Ankur Biswas 59dbpedia: Toront0foaf: PersonRenee J. Millerrdf: typefoaf: based_nearfoaf: namebb: renee-j-miller

A Simple RDF Example (in Turtle)@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix foaf: <http://xmlns.com/foaf/spec/#> .@prefix bb: <http://data.bibbase.org/ontology/> .<http://data.bibbase.org/author/renee-j-miller/>rdf:type foaf:person .foaf:name “Renée J. Miller”@en ;foaf:based_near <http://dbpedia.org/resource/Toronto>4/4/2016Ankur Biswas 60dbpedia: Toront0foaf: PersonRenee J. Millerrdf: typefoaf: based_nearfoaf: namebb: renee-j-miller

A Simple RDF Example (in RDFa)…The author“Renée J. Miller”lives in the city“Toronto” .…4/4/2016Ankur Biswas 61dbpedia: Toront0foaf: PersonRenee J. Millerrdf: typefoaf: based_nearfoaf: namebb: renee-j-miller

• SPARQL stands for “SPARQL Protocoland RDF Query Language”.• It is the standard query language forRDF data proposed by the W3C.• It is based on matching graphpatterns against RDF graphs.• The simplest kind of graph pattern isa triple pattern.– A triple pattern is like an RDFtriple, but with the option of avariable in the subject, predicate orobject positions.4/4/2016Ankur Biswas 62

Example Dataset@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix foaf: <http://xmlns.com/foaf/spec/#> .@prefix bb: <http://data.bibbase.org/ontology/> .<http://data.bibbase.org/author/renee-j-miller/>rdf:type foaf:person .foaf:name “Renée J. Miller”@en ;foaf:based_near [ rdf: type foaf:Place;foaf:name “Toronto”] .4/4/2016Ankur Biswas 63

Example SPARQL QuerySELECT ?nameWHERE { ?x foaf:name ?name .?x rdf:type foaf:Person .?x foaf:based_near ?y .?y foaf:name “Toronto” .}• Result4/4/2016Ankur Biswas 64?name“Renée J. Miller”

Example SPARQL Query4/4/2016Ankur Biswas 66

SPARQL 1.0 allows• Extraction of Data as• URIs, Blank Nodes, typed & un-typed Literals• RDF Subgraphs• Exploration of data via Query for unknown relations.• Execution of complex join operations heterogeneous databases in asingle query• Transformation of RDF Data from one Vocabulary to another• Construction of new RDF Graphs based on RDF Query Subgraph4/4/2016Ankur Biswas 67

SPARQL 1.1 (in progress) allows• Additional Query Features• Aggregate function, subqueries, negations, project expressions, property paths,• Enables logical Entailment for• RDF, RDFS, OWL Direct & RDFS – Based Semantic entailment and RIF Coreentailment• Enables update of RDF graphs as a full data manipulation language• Enables the discovery of information about the SPARQL service• Enables Federated Queries distributed over different SPARQL.4/4/2016Ankur Biswas 68

SPARQL usage in practice• SPARQL is usually used over the network• Separate documents define the protocol and the result format• SPARQL Protocol for RDF with HTTP and SOAP bindings• SPARQL results in XML or JSON formats• Big datasets often offer “SPARQL endpoints” using thisprotocol• Typical example: SPARQL endpoint to DBpedia4/4/2016Ankur Biswas 69

SPARQL as a unifying point4/4/2016Ankur Biswas 70ApplicationsSPARQL ProcessorRDF GraphHTMLNLPTechniqueRelational DatabaseSQL⇔RDFDatabaseSPARQLEndpointSPARQLEndpointTriple StoreUnstructured Text XML/XHTMLBased on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

Other Semantic Web Technologies• Web Ontology Language (OWL)• A family of knowledge representation languages for authoring ontologies forthe Web• RDF Schema (RDFS)• RDF Vocabulary Description Language• http://www.w3.org/TR/rdf-schema/• How to use RDF to describe RDF vocabularies• Other RDF Vocabularies• Simple Knowledge Organization System (SKOS)• Designed for representation of thesauri, classification schemes, taxonomies,subject-heading systems, or any other type of structured controlledvocabulary• FOAF (Friend of a friend)• A machine-readable ontology describing persons, their activities and theirrelations to other people and object4/4/2016Ankur Biswas 71

ONTOLOGIESEXISTING OF BEING4/4/2016Ankur Biswas 72

Ontologies• An ontology is a formal, explicit, shared specification of aconceptualization of a domain (Gruber, 1993).• Conceptualization: the objects, concepts, and other entities that areassumed to exist in some area of interest and the relationships thathold among them. A conceptualization is an abstract, simplified viewof the world that we wish to represent for some purpose.• The term ontology is borrowed from Philosophy, where ontology is asystematic account of existence (what things exist, how they can bedifferentiated from each other etc.).• Today the word ontology is a synonym for a shared knowledge base.4/4/2016Ankur Biswas 73

Ontologies – Components & Models• Classes, Relations & Instances• Classes represent concepts• Classes are described byattributes• Attributes are name value pairs4/4/2016Ankur Biswas 74The address contains the name, title andplace of address of a personSemi - Informal DescriptionAddress First name <string> Family name <string> Street <string> PIN Code <int> City <string> …Informal Description

Learning Ontologies4/4/2016Ankur Biswas 75

Very Large Ontologies• Recently there has been a lot of work on developing very largeontologies that capture various areas of human knowledge anddeploying this knowledge in applications such as search engines orquestion answering.• Example: Watson, IBM’s question answering system that beat humansin the quiz show Jeopardy (http://www-03.ibm.com/innovation/us/watson/index.html ).4/4/2016Ankur Biswas 76

5 Open Data – by Tim Berners-Lee• Tim Berners-Lee, the inventor of the Web and Linked Data initiator,suggested a 5-star deployment scheme for Open Data. Here, we giveexamples for each step of the stars and explain costs and benefits thatcome along with it.4/4/2016Ankur Biswas 77

BY EXAMPLE …make your stuff available on the Web (whatever format) under an openlicensemake it available as structured data (e.g., Excel instead of image scan of atable)make it available in a non-proprietary open format (e.g., CSV as well as ofExcel)use URIs to denote things, so that people can point at your stufflink your data to other data to provide context4/4/2016Ankur Biswas 78

What are the costs & benefits of ★ Webdata?• As a consumer …• You can look at it.• You can print it.• You can store it locally (on your hard drive or on an USB stick).• You can enter the data into any other system.• You can change the data as you wish.• You can share the data with anyone you like.• As a publisher …• It’s simple to publish.• You do not have explain repeatedly to others that they can use your data.4/4/2016Ankur Biswas 79

What are the costs & benefits of ★★ Webdata?• As a consumer, you can do all what you can do with ★ Webdata and additionally:• You can directly process it with proprietary software to aggregate it,perform calculations, visualize it, etc.• You can export it into another (structured) format.• As a publisher …• It’s still simple to publish.4/4/2016Ankur Biswas 80

What are the costs & benefits of ★★★ Webdata?• As a consumer, you can do all what you can dowith ★★ Web data and additionally:• You can manipulate the data in any way you like, without the needto own any proprietary software package.• As a publisher …• You might need converters or plug-ins to export the data from theproprietary format.• It’s still rather simple to publish.4/4/2016Ankur Biswas 81

What are the costs & benefits of ★★★★ Webdata?• As a consumer, you can do all what you can do with ★★★ Web data and additionally:• You can link to it from any other place (on the Web or locally).• You can bookmark it.• You can reuse parts of the data.• You may be able to reuse existing tools and libraries, even if they only understand parts of the patternthe publisher used.• Understanding the structure of an RDF “Graph” of data can be more effort than tabular (Excel/CSV) ortree (XML/JSON) data.• You can combine the data safely with other data. URIs are a global scheme so if two things have thesame URI then it’s intentional, and if so that’s well on it’s way to being 5-star data!• As a publisher …• You have fine-granular control over the data items and can optimize their access (load balancing,caching, etc.)• Other data publishers can now link into your data, promoting it to 5 star!• You typically invest some time slicing and dicing your data.• You’ll need to assign URIs to data items and think about how to represent the data.• You need to either find existing patterns to reuse or create your own.4/4/2016Ankur Biswas 82

What are the costs & benefits of ★★★★★ Webdata?• As a consumer, you can do all what you can do with ★★★★ Web data andadditionally:• You can discover more (related) data while consuming the data.You can directly learn about the data schema.• You now have to deal with broken data links, just like 404 errors in web pages.• Presenting data from an arbitrary link as fact is as risky as letting people includecontent from any website in your pages. Caution, trust and common sense are all stillnecessary.• As a publisher …• You make your data discoverable.• You increase the value of your data.• Your own organization will gain the same benefits from the links as the consumers.• You’ll need to invest resources to link your data to other data on the Web.• You may need to repair broken or incorrect links.4/4/2016Ankur Biswas 83

Applications• Data integration (e.g., see project Optique http://www.optique-project.eu/)• E-government (e.g., open data)• E-commerce• Tourism• Medicine• Biology• Earth Observation (see the work of my group in projects TELEIOShttp://www.earthobservatory.eu/ and LEOhttp://www.linkedeodata.eu/ ).• …4/4/2016Ankur Biswas 84

References:• Books:• Antoniou, Grigoris, and F. Van Harmelet. "A semantic web premier." England: The MIT PressCambridge (2004).• Segaran, Toby, Colin Evans, and Jamie Taylor. Programming the semantic web. " O'Reilly Media, Inc.", 2009.• Davies, John, Dieter Fensel, and Frank Van Harmelen. "Towards the semantic web." Ontology-DrivenKnowledge Management. Chichester (2003).• Scientific Papers:• Maedche, Alexander. Ontology learning for the semantic web. Vol. 665. Springer Science & Business Media,2012.• Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the linked data best practices indifferent topical domains." The semantic web–ISWC 2014. Springer International Publishing, 2014. 245-260.• Video Lectures & Slides• Video lectures on Semantic Web by Dr. Harald Sack, Hasso Plattner Institute, University in Potsdam, Germany• www.cs.toronto.edu/~oktie/slides/web-of-data-intro.pdf• https://www.w3.org/2010/Talks/0622-SemTech-IH/• Websites• http://dbpedia.org/snorql/• http://5stardata.info/en/4/4/2016Ankur Biswas 85

4/4/2016Ankur Biswas 86Thank You

Movatterモバイル変換

Change Language

An Introduction to Semantic Web Technology

Embed presentation

Recommended

More Related Content

What's hot

Similar to An Introduction to Semantic Web Technology

Recently uploaded

An Introduction to Semantic Web Technology