NotificationsYou must be signed in to change notification settings
Fork5
Star27

Commit5950a98

committed

1. I've now produced an updated version (and called it 0.2) of my XML

parser interface code. It now uses libxml2 instead of expat (though I'veleft the old code in the tarball). This means *proper* XPath support, andthe provided function allows you to wrap your result set in XML tags toproduce a new XML document.John Gray

1 parent44ae35c commit5950a98Copy full SHA for 5950a98

File tree

4 files changed

+114

-77

lines changed

contrib/xml

4 files changed

+114

-77

lines changed

`‎contrib/xml/Makefile‎`

Lines changed: 8 additions & 6 deletions

Original file line number	Diff line number	Diff line change
`@@ -8,24 +8,22 @@ subdir = contrib/xml`
`8`	`8`	`top_builddir = ../..`
`9`	`9`	`include$(top_builddir)/src/Makefile.global`
`10`	`10`
`11`		`-overrideCFLAGS+=$(CFLAGS_SL)`
	`11`	`+overrideCFLAGS+=$(CFLAGS_SL) -g`
`12`	`12`
`13`	`13`
`14`	`14`	`#`
`15`	`15`	`# DLOBJS is the dynamically-loaded object files. The "funcs" queries`
`16`	`16`	`# include CREATE FUNCTIONs that load routines from these files.`
`17`	`17`	`#`
`18`		`-DLOBJS=pgxml$(DLSUFFIX)`
	`18`	`+DLOBJS=pgxml_dom$(DLSUFFIX)`
`19`	`19`
`20`	`20`
`21`		`-QUERIES=pgxml.sql`
	`21`	`+QUERIES=pgxml_dom.sql`
`22`	`22`
`23`	`23`	`all:$(DLOBJS)$(QUERIES)`
`24`	`24`
`25`		`-# Requires the expat library`
`26`		`-`
`27`	`25`	`%.so:%.o`
`28`		`-$(CC) -shared -lexpat -o$@$<`
	`26`	`+$(CC) -shared -lxml2 -o$@$<`
`29`	`27`
`30`	`28`
`31`	`29`	`%.sql:%.source`
`@@ -41,3 +39,7 @@ all: $(DLOBJS) $(QUERIES)`
`41`	`39`
`42`	`40`	`clean:`
`43`	`41`	`rm -f$(DLOBJS)$(QUERIES)`
	`42`	`+`
	`43`	`+`
	`44`	`+`
	`45`	`+`

`‎contrib/xml/README‎`

Lines changed: 65 additions & 25 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,18 +1,35 @@`
`1`		`-This package containsa couple ofsimple routines forhooking the`
`2`		`-expat XML parser up toPostgreSQL. This is a work-in-progress and all`
`3`		`-very basic at the moment (see the file TODO for some outline of what`
`4`		`-remains to be done).`
	`1`	`+This package containssomesimple routines formanipulating XML`
	`2`	`+documents stored inPostgreSQL. This is a work-in-progress and`
	`3`	`+somewhat basic at the moment (see the file TODO for some outline of`
	`4`	`+whatremains to be done).`
`5`	`5`
`6`		`-At present, two functions are defined, one which checks`
`7`		`-well-formedness, and the other which performs very simple XPath-type`
`8`		`-queries.`
	`6`	`+At present, two modules (based on different XML handling libraries)`
	`7`	`+are provided.`
`9`	`8`
`10`	`9`	`Prerequisite:`
`11`	`10`
	`11`	`+pgxml.c:`
`12`	`12`	`expat parser 1.95.0 or newer (http://expat.sourceforge.net)`
`13`	`13`
`14`		`-I used a shared library version -I'm sure you could use a static`
`15`		`-library if you wished though. I had no problems compiling from source.`
	`14`	`+or`
	`15`	`+`
	`16`	`+pgxml_dom.c:`
	`17`	`+libxml2 (http://xmlsoft.org)`
	`18`	`+`
	`19`	`+The libxml2 version provides more complete XPath functionality, and`
	`20`	`+seems like a good way to go. I've left the old versions in there for`
	`21`	`+comparison.`
	`22`	`+`
	`23`	`+Compiling and loading:`
	`24`	`+----------------------`
	`25`	`+`
	`26`	`+The Makefile only builds the libxml2 version.`
	`27`	`+`
	`28`	`+To compile, just type make.`
	`29`	`+`
	`30`	`+Then you can use psql to load the two function definitions:`
	`31`	`+\i pgxml_dom.sql`
	`32`	`+`
`16`	`33`
`17`	`34`	`Function documentation and usage:`
`18`	`35`	`---------------------------------`
`@@ -22,10 +39,21 @@ pgxml_parse(text) returns bool`
`22`	`39`	`well-formed or not. It returns NULL if the parser couldn't be`
`23`	`40`	`created for any reason.`
`24`	`41`
	`42`	`+pgxml_xpath (XQuery functions) - differs between the versions:`
	`43`	`+`
	`44`	`+pgxml.c (expat version) has:`
	`45`	`+`
`25`	`46`	`pgxml_xpath(text doc, text xpath, int n) returns text`
`26`	`47`	`parses doc and returns the cdata of the nth occurence of`
`27`		`-the "XPath" listed. See below for details on the syntax.`
	`48`	`+the "simple path" entry.`
`28`	`49`
	`50`	`+However, the remainder of this document will cover the pgxml_dom.c version.`
	`51`	`+`
	`52`	`+pgxml_xpath(text doc, text xpath, text toptag, text septag) returns text`
	`53`	`+ evaluates xpath on doc, and returns the result wrapped in`
	`54`	`+<toptag>...</toptag> and each result node wrapped in`
	`55`	`+<septag></septag>. toptag and septag may be empty strings, in which`
	`56`	`+case the respective tag will be omitted.`
`29`	`57`
`30`	`58`	`Example:`
`31`	`59`
`@@ -49,30 +77,42 @@ descriptions, in case anyone is wondering):`
`49`	`77`	`one can type:`
`50`	`78`
`51`	`79`	`select docid,`
`52`		`-pgxml_xpath(document,'/site/name',1) as sitename,`
`53`		`-pgxml_xpath(document,'/site/location',1) as location`
	`80`	`+pgxml_xpath(document,'//site/name/text()','','') as sitename,`
	`81`	`+pgxml_xpath(document,'//site/location/text()','','') as location`
`54`	`82`	`from docstore;`
`55`	`83`
`56`	`84`	`and get as output:`
`57`	`85`
`58`		`- docid \| sitename \| location`
`59`		`--------+-----------------------------+------------`
`60`		`- 1 \| Church Farm, Ashton Keynes \| SU04209424`
`61`		`- 2 \| Glebe Farm, Long Itchington \| SP41506500`
`62`		`-(2 rows)`
	`86`	`+ docid \| sitename \| location`
	`87`	`+-------+--------------------------------------+------------`
	`88`	`+ 1 \| Church Farm, Ashton Keynes \| SU04209424`
	`89`	`+ 2 \| Glebe Farm, Long Itchington \| SP41506500`
	`90`	`+ 3 \| The Bungalow, Thames Lane, Cricklade \| SU10229362`
	`91`	`+(3 rows)`
	`92`	`+`
	`93`	`+or, to illustrate the use of the extra tags:`
`63`	`94`
	`95`	`+select docid as id,`
	`96`	`+pgxml_xpath(document,'//find/type/text()','set','findtype')`
	`97`	`+from docstore;`
`64`	`98`
`65`		`-"XPath" syntax supported`
`66`		`-------------------------`
	`99`	`+ id \| pgxml_xpath`
	`100`	`+----+-------------------------------------------------------------------------`
	`101`	`+ 1 \| <set></set>`
	`102`	`+ 2 \| <set><findtype>Urn</findtype></set>`
	`103`	`+ 3 \| <set><findtype>Pottery</findtype><findtype>Animal bone</findtype></set>`
	`104`	`+(3 rows)`
`67`	`105`
`68`		`-At present it only supports paths of the form:`
`69`		`-'tag1/tag2' or '/tag1/tag2'`
	`106`	`+Which produces a new, well-formed document. Note that document 1 had`
	`107`	`+no matching instances, so the set returned contains no`
	`108`	`+elements. document 2 has 1 matching element and document 3 has 2.`
`70`	`109`
`71`		`-The first case will find any <tag2> within a <tag1>, the second will`
`72`		`-find any <tag2> within a <tag1> at the top level of the document.`
	`110`	`+This is just scratching the surface because XPath allows all sorts of`
	`111`	`+operations.`
`73`	`112`
`74`		`-The real XPath is much more complex (see TODO file).`
	`113`	`+Note: I've only implemented the return of nodeset and string values so`
	`114`	`+far. This covers (I think) many types of queries, however.`
`75`	`115`
	`116`	`+John Gray <jgray@azuli.co.uk> 16 August 2001`
`76`	`117`
`77`		`-John Gray <jgray@azuli.co.uk> 26 July 2001`
`78`	`118`

`‎contrib/xml/TODO‎`

Lines changed: 40 additions & 45 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,67 +1,57 @@`
`1`	`1`	`PGXML TODO List`
`2`	`2`	`===============`
`3`	`3`
`4`		`-Some of these items still require much more thought!The data model`
`5`		`-for XML documents andtheparsing model of expat don't really fit so`
`6`		`-well with a standard SQL model.`
	`4`	`+Some of these items still require much more thought!Since the first`
	`5`	`+release,theXPath support has improved (because I'm no longer using a`
	`6`	`+homemade algorithm!).`
`7`	`7`
`8`		`-1.Generalised XML parsing support`
	`8`	`+1.Performance considerations`
`9`	`9`
`10`		`-Allow a user to specify handlers (in any PL) to be used by the parser.`
`11`		`-This must permit distinct sets of parser settings -user may want some`
`12`		`-documents in a database to parsed with one set of handlers, others`
`13`		`-with a different set.`
	`10`	`+At present each document is parsed to produce the DOM tree on every query.`
`14`	`11`
`15`		`-i.e. the pgxml_parse function would take as parameters (document,`
`16`		`-parsername) where parsername was the identifier for a collection of`
`17`		`-handler etc. settings.`
	`12`	`+Pros:`
	`13`	`+Easy`
	`14`	`+No persistent memory or storage allocation for parsed trees`
	`15`	`+(libxml docs suggest representation of a document might`
	`16`	`+ be 4 times the size of the text)`
`18`	`17`
`19`		`-"Stub" handlers in the pgxml code would invoke the functions through`
`20`		`-the standard fmgr interface. The parser interface would define the`
`21`		`-prototype for these functions. How does the handler function know`
`22`		`-which document/context has resulted it in being called?`
	`18`	`+Cons:`
	`19`	`+Slow/ CPU intensive to parse.`
	`20`	`+Makes it difficult for PLs to apply libxml manipulations to create`
	`21`	`+new documents or amend existing ones.`
`23`	`22`
`24`		`-Mechanism for defining collection of parser settings (in a table? -but`
`25`		`-maybe copied for efficiency into a structure when first required by a`
`26`		`-query?)`
`27`	`23`
`28`		`-2.Support for other parsers`
	`24`	`+2.XQuery`
`29`	`25`
`30`		`-Expat may not be the best choice as a parser because a new parser`
`31`		`-instance is needed for each document i.e. all the handlers must be set`
`32`		`-again for each document. Another parser may have a more efficient way`
`33`		`-of parsing a set of documents identically.`
	`26`	`+I'm not sure if the addition of XQuery would be best as a function or`
	`27`	`+as a new front-end parser. This is one to think about, but with a`
	`28`	`+decent implementation of XPath, one of the prerequisites is covered.`
`34`	`29`
`35`		`-3.XPath support`
	`30`	`+3.DOM Interfaces`
`36`	`31`
`37`		`-Proper XPath support. I really need to sit down and plough`
`38`		`-through the specification...`
	`32`	`+Expose more aspects of the DOM to user functions/ PLs. This would`
	`33`	`+allow a procedure in a PL to run some queries and then use exposed`
	`34`	`+interfaces to libxml to create an XML document out of the query`
	`35`	`+results. I accept the argument that this might be more properly`
	`36`	`+performed on the client side.`
`39`	`37`
`40`		`-The very simple text comparison system currently used is too`
`41`		`-basic. Need to convert the path to an ordered list of nodes. Each node`
`42`		`-is an element qualifier, and may have a list of attribute`
`43`		`-qualifications attached. This probably requires lexx/yacc combination.`
`44`		`-(James Clark has written a yacc grammar for XPath). Not all the`
`45`		`-features of XPath are necessarily relevant.`
	`38`	`+4. Returning sets of documents from XPath queries.`
`46`	`39`
`47`		`-An option to return subdocuments (i.e. subelements AND cdata, not just`
`48`		`-cdata). This should maybe be the default.`
`49`		`-`
`50`		`-4. Multiple occurences of elements.`
`51`		`-`
`52`		`-This section is all very sketchy, and has various weaknesses.`
	`40`	`+Although the current implementation allows you to amalgamate the`
	`41`	`+returned results into a single document, it's quite possible that`
	`42`	`+you'd like to use the returned set of nodes as a source for FROM.`
`53`	`43`
`54`	`44`	`Is there a good way to optimise/index the results of certain XPath`
`55`	`45`	`operations to make them faster?:`
`56`	`46`
`57`		`-select docid, pgxml_xpath(document,'/site/location',1) as location`
`58`		`-where pgxml_xpath(document,'/site/name',1) = 'Church Farm';`
	`47`	`+select docid, pgxml_xpath(document,'//site/location/text()','','') as location`
	`48`	`+where pgxml_xpath(document,'//site/name/text()','','') = 'Church Farm';`
`59`	`49`
`60`	`50`	`and with multiple element occurences in a document?`
`61`	`51`
`62`		`-select d.docid, pgxml_xpath(d.document,'/site/location',1)`
	`52`	`+select d.docid, pgxml_xpath(d.document,'//site/location/text()','','')`
`63`	`53`	`from docstore d,`
`64`		`-pgxml_xpaths('docstore','document','feature/type','docid') ft`
	`54`	`+pgxml_xpaths('docstore','document','//feature/type/text()','docid') ft`
`65`	`55`	`where ft.key = d.docid and ft.value ='Limekiln';`
`66`	`56`
`67`	`57`	`pgxml_xpaths params are relname, attrname, xpath, returnkey. It would`
`@@ -71,10 +61,15 @@ defined by relname and attrname.`
`71`	`61`
`72`	`62`	`The pgxml_xpaths function could be the basis of a functional index,`
`73`	`63`	`which could speed up the above query very substantially, working`
`74`		`-through the normal query planner mechanism. Syntax above is fragile`
`75`		`-through using names rather than OID.`
	`64`	`+through the normal query planner mechanism.`
	`65`	`+`
	`66`	`+5. Return type support.`
	`67`	`+`
	`68`	`+Better support for returning e.g. numeric or boolean values. I need to`
	`69`	`+get to grips with the returned data from libxml first.`
	`70`	`+`
`76`	`71`
`77`		`-John Gray <jgray@azuli.co.uk>`
	`72`	`+John Gray <jgray@azuli.co.uk> 16 August 2001`
`78`	`73`
`79`	`74`
`80`	`75`

`‎contrib/xml/pgxml.source‎`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -3,5 +3,5 @@`
`3`	`3`	`CREATE FUNCTION pgxml_parse(text) RETURNS bool`
`4`	`4`	`AS '_OBJWD_/pgxml_DLSUFFIX_' LANGUAGE 'c' WITH (isStrict);`
`5`	`5`
`6`		`-CREATE FUNCTION pgxml_xpath(text,text,int) RETURNS text`
	`6`	`+CREATE FUNCTION pgxml_xpath(text,text,text,text) RETURNS text`
`7`	`7`	`AS '_OBJWD_/pgxml_DLSUFFIX_' LANGUAGE 'c' WITH (isStrict);`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit5950a98

File tree

4 files changed

4 files changed

`‎contrib/xml/Makefile‎`

`‎contrib/xml/README‎`

`‎contrib/xml/TODO‎`

`‎contrib/xml/pgxml.source‎`

0 commit comments