11PGXML TODO List
22===============
33
4- Some of these items still require much more thought!The data model
5- for XML documents and theparsing model of expat don't really fit so
6- well with a standard SQL model .
4+ Some of these items still require much more thought!Since the first
5+ release, theXPath support has improved (because I'm no longer using a
6+ homemade algorithm!) .
77
8- 1.Generalised XML parsing support
8+ 1.Performance considerations
99
10- Allow a user to specify handlers (in any PL) to be used by the parser.
11- This must permit distinct sets of parser settings -user may want some
12- documents in a database to parsed with one set of handlers, others
13- with a different set.
10+ At present each document is parsed to produce the DOM tree on every query.
1411
15- i.e. the pgxml_parse function would take as parameters (document,
16- parsername) where parsername was the identifier for a collection of
17- handler etc. settings.
12+ Pros:
13+ Easy
14+ No persistent memory or storage allocation for parsed trees
15+ (libxml docs suggest representation of a document might
16+ be 4 times the size of the text)
1817
19- "Stub" handlers in the pgxml code would invoke the functions through
20- the standard fmgr interface. The parser interface would define the
21- prototype for these functions. How does the handler function know
22- which document/context has resulted it in being called?
18+ Cons:
19+ Slow/ CPU intensive to parse.
20+ Makes it difficult for PLs to apply libxml manipulations to create
21+ new documents or amend existing ones.
2322
24- Mechanism for defining collection of parser settings (in a table? -but
25- maybe copied for efficiency into a structure when first required by a
26- query?)
2723
28- 2.Support for other parsers
24+ 2.XQuery
2925
30- Expat may not be the best choice as a parser because a new parser
31- instance is needed for each document i.e. all the handlers must be set
32- again for each document. Another parser may have a more efficient way
33- of parsing a set of documents identically.
26+ I'm not sure if the addition of XQuery would be best as a function or
27+ as a new front-end parser. This is one to think about, but with a
28+ decent implementation of XPath, one of the prerequisites is covered.
3429
35- 3.XPath support
30+ 3.DOM Interfaces
3631
37- Proper XPath support. I really need to sit down and plough
38- through the specification...
32+ Expose more aspects of the DOM to user functions/ PLs. This would
33+ allow a procedure in a PL to run some queries and then use exposed
34+ interfaces to libxml to create an XML document out of the query
35+ results. I accept the argument that this might be more properly
36+ performed on the client side.
3937
40- The very simple text comparison system currently used is too
41- basic. Need to convert the path to an ordered list of nodes. Each node
42- is an element qualifier, and may have a list of attribute
43- qualifications attached. This probably requires lexx/yacc combination.
44- (James Clark has written a yacc grammar for XPath). Not all the
45- features of XPath are necessarily relevant.
38+ 4. Returning sets of documents from XPath queries.
4639
47- An option to return subdocuments (i.e. subelements AND cdata, not just
48- cdata). This should maybe be the default.
49-
50- 4. Multiple occurences of elements.
51-
52- This section is all very sketchy, and has various weaknesses.
40+ Although the current implementation allows you to amalgamate the
41+ returned results into a single document, it's quite possible that
42+ you'd like to use the returned set of nodes as a source for FROM.
5343
5444Is there a good way to optimise/index the results of certain XPath
5545operations to make them faster?:
5646
57- select docid, pgxml_xpath(document,'/site/location',1 ) as location
58- where pgxml_xpath(document,'/site/name',1 ) = 'Church Farm';
47+ select docid, pgxml_xpath(document,'// site/location/text()','','' ) as location
48+ where pgxml_xpath(document,'// site/name/text()','','' ) = 'Church Farm';
5949
6050and with multiple element occurences in a document?
6151
62- select d.docid, pgxml_xpath(d.document,'/site/location',1 )
52+ select d.docid, pgxml_xpath(d.document,'// site/location/text()','','' )
6353from docstore d,
64- pgxml_xpaths('docstore','document','feature/type','docid') ft
54+ pgxml_xpaths('docstore','document','// feature/type/text() ','docid') ft
6555where ft.key = d.docid and ft.value ='Limekiln';
6656
6757pgxml_xpaths params are relname, attrname, xpath, returnkey. It would
@@ -71,10 +61,15 @@ defined by relname and attrname.
7161
7262The pgxml_xpaths function could be the basis of a functional index,
7363which could speed up the above query very substantially, working
74- through the normal query planner mechanism. Syntax above is fragile
75- through using names rather than OID.
64+ through the normal query planner mechanism.
65+
66+ 5. Return type support.
67+
68+ Better support for returning e.g. numeric or boolean values. I need to
69+ get to grips with the returned data from libxml first.
70+
7671
77- John Gray <jgray@azuli.co.uk>
72+ John Gray <jgray@azuli.co.uk> 16 August 2001
7873
7974
8075