This discusses the introduction of URIs as names in a system to scaleit to the web.
The web is extended in two ways - by adding new bits of technology to theexisting stuff, and by "webizing" existing applications and systems. Webizingis really important, not only as a way of bootstrapping the web using largeamount of legacy information, but because the existing systems have beenresearched and designed over the years and it is really important we do notlose the knowledge accrued during that process.
The essential process in webizing is to take a system which is designed asa closed world, and then ask what happens when it is considered as part of anopen world. Practically, this effect on a computer language is to replace thenames/tokens/identifiers for URIs. Thus, where before reference could only bemade to something in the same document/program/module one can with equal easemake reference to something in a different one somewhere in that abstractspace which is the Web.
In a clean case, this will be done so that the URI for an object is rathernaturally related to its representation in the original language. Forexample, the element with ID "foo" in bar.xml is bar.xml#foo. However, to dothe same for an attribute defined in a DTD or schema is more difficult,because of the complex nature of the spaces and subspaces for element andattribute names in XML. It is great when the webized language is very similarto the original language, and ideal when it actually compiles. Dan Connolly's2000/8webization of KIF uses URIs for identifiers,but to be accurate because URIs are case sensitive and KIF tokens not, lowercase letter had to be marked with escaped with backslashes in the translationwhich made the result less readable. Changing the underlying language insmall ways can make the translation much less cumbersome!.
Here is a slightly flippant view on the webize() function, each row ofwhich probably needs an essay of explanation, but provided here withoutany.
x | webize(x) |
Hypertext | WWW |
Data | Linked data |
Top-down structured design | Bottom-up ontology design |
Data Hiding | Data Re-use |
Goto Considered Harmful | Goto drives the economy |
unix file system | ACL'd r/w linked data |
Large-scale structure: Hierachy | Large-scale structure Scale free |
"Tired" | "Wired" |
Imagine that a database is to be made available on the web in RDF. Supposethe database itself will have a URI of http://weather.org/current An SQLdatabase is essentially a closed world, in that the various thing in it werenot designed to be linked to from outside. An SQL statement
SELECT temp, zip FROM weather WHERE temp > 30
makes reference to terms which have meaning within the database. There isno reference in that statement to the database - that is simply part of thecontext.
Now suppose we determine what the URI will be for the pieces of thedatabase, perhaps current/weather for a table, and current/weather.temp for acolumn in a table. We could then expend the syntax (excuse my SQL - I ammaking this up)
USING c FOR http://weather.org/current
USING u FOR http://places.org/usa
SELECTc:readings.temp,u:location.lat,u:location.long FROM JOINc:readings,u:location WHEREc:readings.zip =u:location.zip ANDc:readings.temp > 30;
This is an (incorrect I expect @@@) SQL which links out of the localdatabase to combine it with information from a remote one. This syntax I amsure won't work in practice, but should illustrate the principle. Namespacesc and u are introduced for two reasons: for brevity, as repeating them in thecode would have been too cumbersome; and for syntactic reasons as URIs tendto contain characters which would be ambiguous with other syntax is allowedin SQL column names.
Of course, whether actually SQL on a set of scattered databases isvaluable may be questionable - it may not optimize as well as some otherquery languages. However, suddenly the things defined by the database areavailable to the outside world. For example, the concept of temperaturereading as used by weather.org in its database of current conditions
http://weather.org/current/readings.temp
is now a concept, an RDF property in fact, which is available for all theworld to refer to. These references need not all be in SQL. Because theschema for the database will declare it to be an RDF property or somethingequivalent, many different systems can use the information and refer to theconcept.
I note, before we leave this example, that there are two conceptsimportant to a table. One is the type of thing described by a row. A row inthe reading table, for example, defined a weather reading, something whichhad a location and temperature and humidity and place. The other concept isthe set of objects which are actually in the table. In the classic SQLexample of the employees table, there is a rdf:class employee, subclass ofperson, and also the fact that someone works for the company iff they are inthe table.
A second note on exporting databases. When you really put something on theweb, there is often, for flexibility and security, a layer between what youexpose and the internal storage. Just as web pages are not files though oftenclosely related to files, and have the same form - a string of bytes and aMIME type. Exposed remote operations are not local procedures though closelyrelated to them, and have the same form -- a service URI and a method nameand parameters. Similarly one would probably export a derived view of adatabase in many cases - one which would have the form of a database. Thisallows different engineering decisions to be made on the externalmanifestation (persistent and what the customer wants) and the internal form(efficient and convenient for you).
Sometimes this is easy and sometimes it is hard. It is hard, for example,when the language uses nested scoping to great effect. In this case there isa very large amount of context which is completely different between thebeginning and end of such a link. Thego to instruction isconsidered harmful [ref] by Dijkstra because it "asit stands is just too primitive; it is too much an invitation to make a messof one's program." This of course is true of the hypertext link too, ina way. Both allow an open webbed world which typically, if used with norestraint, remove rules which give sanity and analysability to a language andallow optimization of the code compiled. So, just as some languages preventone from jumping into or out of an inner loop of a program, so it may make nosense to allow a link to be made into something within a nested structure,because the referenced thing just does not have any meaning when taken out ofcontext.
When dealing with language which have nested context, it may be necessaryeither to define how something inside represented independently of context,or to make it impossible.
Be careful, though, before jumping to this conclusion. In many cases, itis important to webize nested objects completely. For example, in a 3d scenelanguage, an object may be within a scene within an object within a scene andstill have identity which is important to be able to refer to. In a hypertextdocument, there is a nested context which for example affects the style, andthe reference is made to the destination anchor not as a isolated piece ofhypertext, but in the context of the whole document.
The principle that on the Web, anything must be able to say anything aboutanything means that these innermost nested objects must have URIs.
It may also be the case that an attempt to webize a language reveals badpoints in the design which really need to be ironed out anyway for the causeof good software engineering. If a name in some module has in fact quitedifferent meanings when used in different contexts, then it isn't suitablefor webizing as it is, and maybe two separate derived URIs should be made inthe mapping. Maybe the language should actually be cleaned up so that theconcepts are distinct.
A very simple case is in a documentation control system, when humans usethe same document name ("the pipe size draft") to refer to a particulardocument and also to the set of documents from
An exercise for the reader is to contemplate and determine whether it iswebized, and if not, what it would take, and what would be the cleanest wayof going it. Try looking at XML schemas (what is the URI of an elementtype?).
When stuck, recourse to common sense. Ask what the construct actuallyrepresents in a global context, if anything. This might mean clarifying thelanguage itself.
Webizing a language involves turning from a system which assumes a closedworld to one which will operate as part of the open web. Some cases areeasier than others. Webizing one application gets one a good idea of whatsorts of design decisions force a closed world assumption and make webizingdifficult, and what by contrast makes a weblike application which immediatelybenefits from the rest of everything out there.
GTCH: Edsger W. Dijkstra, "Go To Statement ConsideredHarmful",Communications of the ACM, Vol. 11, No. 3, March 1968,pp. 147-148.
Connolly, Dan, "Knowledge Interchange Format (KIF) as an RDFSchema", 2000/8
2000/8/31