Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitadca025

Browse files
committed
Thanks to the generous support of Torchbox (http://www.torchbox.com), I
have been able to significantly improve the contrib/xml XPathintegration code.New features:* XPath set-returning function allows multiple results from an severalXPath queries to be used as a virtual table.* Using libxslt, XSLT transformations (with and without parameters) aresupported. (Caution: This support allows generic URL fetching fromwithin the backend as well).I've removed the old code so that it is all libxml based. Rather thanattach as a patch, I've put the tar.gz (10k!) athttp://www.azuli.co.uk/pgxml-1.0.tar.gz(all files in archive are xml/....).I think this is worth replacing the contrib version with, even thoughthe function names have changed (though the same functionality isthere), because it includes a SRF and some SPI usage, in addition tolinking to an external library. And it isn't a big module! Obviously, Iunderstand that people might prefer to move it elsewhere, or might havereservations about replacing an existing contrib module with anincompatible one. I'm open to suggestions.John Gray
1 parent1973971 commitadca025

File tree

10 files changed

+1288
-845
lines changed

10 files changed

+1288
-845
lines changed

‎contrib/xml/Makefile

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,18 @@
1-
# $PostgreSQL: pgsql/contrib/xml/Makefile,v 1.4 2003/11/29 19:51:36 pgsql Exp $
2-
1+
# This makefile will build the new XML and XSLT routines.
32
subdir = contrib/xml
4-
top_builddir = ../..
3+
top_builddir = ../../
54
include$(top_builddir)/src/Makefile.global
65

7-
MODULE_big = pgxml_dom
8-
OBJS = pgxml_dom.o
9-
SHLIB_LINK = -lxml2
10-
DATA_built = pgxml_dom.sql
6+
MODULE_big = pgxml
7+
8+
# Remove xslt_proc.o from the following line if you don't have libxslt
9+
OBJS = xpath.o xslt_proc.o
10+
11+
# Remove -lxslt from the following line if you don't have libxslt.
12+
SHLIB_LINK = -lxml2 -lxslt
13+
14+
DATA_built = pgxml.sql
1115
DOCS = README.pgxml
1216

13-
include$(top_srcdir)/contrib/contrib-global.mk
17+
include$(top_builddir)contrib/contrib-global.mk
18+

‎contrib/xml/README.pgxml

Lines changed: 144 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -1,118 +1,179 @@
1-
This package contains some simple routines for manipulating XML
2-
documents stored in PostgreSQL. This is a work-in-progress and
3-
somewhat basic at the moment (see the file TODO for some outline of
4-
what remains to be done).
1+
XML-handling functions for PostgreSQL
2+
=====================================
53

6-
At present, two modules (based on different XML handling libraries)
7-
are provided.
4+
Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com)
85

9-
Prerequisite:
6+
This version of the XML functions provides both XPath querying and
7+
XSLT functionality. There is also a new table function which allows
8+
the straightforward return of multiple XML results. Note that the current code
9+
doesn't take any particular care over character sets - this is
10+
something that should be fixed at some point!
1011

11-
pgxml.c:
12-
expat parser 1.95.0 or newer (http://expat.sourceforge.net)
12+
Installation
13+
------------
1314

14-
or
15+
The current build process will only work if the files are in
16+
contrib/xml in a PostgreSQL 7.3 or 7.4 source tree which has been
17+
configured and built (If you alter the subdir value in the Makefile
18+
you can place it in a different directory in a PostgreSQL tree).
1519

16-
pgxml_dom.c:
17-
libxml2 (http://xmlsoft.org)
20+
Before you begin, just check the Makefile, and then just 'make' and
21+
'make install'.
1822

19-
The libxml2 version provides more complete XPath functionality, and
20-
seems like a good way to go. I've left the old versions in there for
21-
comparison.
23+
This code requires libxml to be previously installed.
2224

23-
Compiling and loading:
24-
----------------------
25+
Description of functions
26+
------------------------
2527

26-
TheMakefile only builds the libxml2 version.
28+
Thefirst set of functions are straightforward XML parsing and XPath queries:
2729

28-
To compile, just type make.
30+
pgxml_parse(document) RETURNS bool
2931

30-
Then you can use psql to load the two function definitions:
31-
\i pgxml_dom.sql
32+
This parses the document text in its parameter and returns true if the
33+
document is well-formed XML.
3234

35+
xpath_string(document,query) RETURNS text
36+
xpath_number(document,query) RETURNS float4
37+
xpath_bool(document,query) RETURNS bool
3338

34-
Function documentation and usage:
35-
---------------------------------
39+
These functions evaluate the XPath query on the supplied document, and
40+
cast the result to the specified type.
3641

37-
pgxml_parse(text) returns bool
38-
parses the provided text and returns true or false if it is
39-
well-formed or not. It returns NULL if the parser couldn't be
40-
created for any reason.
4142

42-
pgxml_xpath (XQuery functions) - differs between the versions:
43+
xpath_nodeset(document,query,toptag,itemtag) RETURNS text
4344

44-
pgxml.c (expat version) has:
45+
This evaluates query on document and wraps the result in XML tags. If
46+
the result is multivalued, the output will look like:
4547

46-
pgxml_xpath(text doc, text xpath, int n) returns text
47-
parses doc and returns the cdata of the nth occurence of
48-
the "simple path" entry.
48+
<toptag>
49+
<itemtag>Value 1 which could be an XML fragment</itemtag>
50+
<itemtag>Value 2....</itemtag>
51+
</toptag>
4952

50-
However, the remainder of this document will cover the pgxml_dom.c version.
53+
If either toptag or itemtag is an empty string, the relevant tag is omitted.
54+
There are also wrapper functions for this operation:
5155

52-
pgxml_xpath(text doc, text xpath, text toptag, text septag) returns text
53-
evaluates xpath on doc, and returns the result wrapped in
54-
<toptag>...</toptag> and each result node wrapped in
55-
<septag></septag>. toptag and septag may be empty strings, in which
56-
case the respective tag will be omitted.
56+
xpath_nodeset(document,query) RETURNS text omits both tags.
57+
xpath_nodeset(document,query,itemtag) RETURNS text omits toptag.
5758

58-
Example:
5959

60-
Given a table docstore:
60+
xpath_list(document,query,seperator) RETURNS text
6161

62-
Attribute | Type | Modifier
63-
-----------+---------+----------
64-
docid | integer |
65-
document | text |
62+
This function returns multiple values seperated by the specified
63+
seperator, e.g. Value 1,Value 2,Value 3 if seperator=','.
6664

67-
containing documents such as (these are archaeological site
68-
descriptions, in case anyone is wondering):
65+
xpath_list(document,query) RETURNS text
6966

70-
<?XML version="1.0"?>
71-
<site provider="Foundations" sitecode="ak97" version="1">
72-
<name>Church Farm, Ashton Keynes</name>
73-
<invtype>watching brief</invtype>
74-
<location scheme="osgb">SU04209424</location>
75-
</site>
67+
This is a wrapper for the above function that uses ',' as the seperator.
7668

77-
one can type:
7869

79-
select docid,
80-
pgxml_xpath(document,'//site/name/text()','','') as sitename,
81-
pgxml_xpath(document,'//site/location/text()','','') as location
82-
from docstore;
83-
84-
and get as output:
70+
xpath_table
71+
-----------
8572

86-
docid | sitename | location
87-
-------+--------------------------------------+------------
88-
1 | Church Farm, Ashton Keynes | SU04209424
89-
2 | Glebe Farm, Long Itchington | SP41506500
90-
3 | The Bungalow, Thames Lane, Cricklade | SU10229362
91-
(3 rows)
73+
This is a table function which evaluates a set of XPath queries on
74+
each of a set of documents and returns the results as a table. The
75+
primary key field from the original document table is returned as the
76+
first column of the result so that the resultset from xpath_table can
77+
be readily used in joins.
9278

93-
or, to illustrate the use of the extra tags:
79+
The function itself takes 5 arguments, all text.
9480

95-
select docid as id,
96-
pgxml_xpath(document,'//find/type/text()','set','findtype')
97-
from docstore;
81+
xpath_table(key,document,relation,xpaths,criteria)
9882

99-
id | pgxml_xpath
100-
----+-------------------------------------------------------------------------
101-
1 | <set></set>
102-
2 | <set><findtype>Urn</findtype></set>
103-
3 | <set><findtype>Pottery</findtype><findtype>Animal bone</findtype></set>
104-
(3 rows)
83+
key - the name of the "key" field - this is just a field to be used as
84+
the first column of the output table i.e. it identifies the record from
85+
which each output row came.
10586

106-
Which produces a new, well-formed document. Note that document 1 had
107-
no matching instances, so the set returned contains no
108-
elements. document 2 has 1 matching element and document 3 has 2.
87+
document - the name of the field containing the XML document
10988

110-
This is just scratching the surface because XPath allows all sorts of
111-
operations.
89+
relation - the name of the table or view containing the documents
11290

113-
Note: I've only implemented the return of nodeset and string values so
114-
far. This covers (I think) many types of queries, however.
91+
xpaths - multiple xpath expressions separated by |
11592

116-
John Gray <jgray@azuli.co.uk> 16 August 2001
93+
criteria - The contents of the where clause. This needs to be specified,
94+
so use "true" or "1=1" here if you want to process all the rows in the
95+
relation.
11796

97+
NB These parameters (except the XPath strings) are just substituted
98+
into a plain SQL SELECT statement, so you have some flexibility - the
99+
statement is
118100

101+
SELECT <key>,<document> FROM <relation> WHERE <criteria>
102+
103+
so those parameters can be *anything* valid in those particular
104+
locations. The result from this SELECT needs to return exactly two
105+
columns (which it will unless you try to list multiple fields for key
106+
or document). Beware that this simplistic approach requires that you
107+
validate any user-supplied values to avoid SQL injection attacks.
108+
109+
Using the function
110+
111+
The function has to be used in a FROM expression. This gives the following
112+
form:
113+
114+
SELECT * FROM
115+
xpath_table('article_id',
116+
'article_xml',
117+
'articles',
118+
'/article/author|/article/pages|/article/title',
119+
'date_entered > ''2003-01-01'' ')
120+
AS t(article_id integer, author text, page_count integer, title text);
121+
122+
The AS clause defines the names and types of the columns in the
123+
virtual table. If there are more XPath queries than result columns,
124+
the extra queries will be ignored. If there are more result columns
125+
than XPath queries, the extra columns will be NULL.
126+
127+
Note that I've said in this example that pages is an integer. The
128+
function deals internally with string representations, so when you say
129+
you want an integer in the output, it will take the string
130+
representation of the XPath result and use PostgreSQL input functions
131+
to transform it into an integer (or whatever type the AS clause
132+
requests). An error will result if it can't do this - for example if
133+
the result is empty - so you may wish to just stick to 'text' as the
134+
column type if you think your data has any problems.
135+
136+
The select statement doesn't need to use * alone - it can reference the
137+
columns by name or join them to other tables. The function produces a
138+
virtual table with which you can perform any operation you wish (e.g.
139+
aggregation, joining, sorting etc). So we could also have:
140+
141+
SELECT t.title, p.fullname, p.email
142+
FROM xpath_table('article_id','article_xml','articles',
143+
'/article/title|/article/author/@id',
144+
'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
145+
AS t(article_id integer, title text, author_id integer),
146+
tblPeopleInfo AS p
147+
WHERE t.author_id = p.person_id;
148+
149+
as a more complicated example. Of course, you could wrap all
150+
of this in a view for convenience.
151+
152+
XSLT functions
153+
--------------
154+
155+
The following functions are available if libxslt is installed (this is
156+
not currently detected automatically, so you will have to amend the
157+
Makefile)
158+
159+
xslt_process(document,stylesheet,paramlist) RETURNS text
160+
161+
This function appplies the XSL stylesheet to the document and returns
162+
the transformed result. The paramlist is a list of parameter
163+
assignments to be used in the transformation, specified in the form
164+
'a=1,b=2'. Note that this is also proof-of-concept code and the
165+
parameter parsing is very simple-minded (e.g. parameter values cannot
166+
contain commas!)
167+
168+
Also note that if either the document or stylesheet values do not
169+
begin with a < then they will be treated as URLs and libxslt will
170+
fetch them. It thus follows that you can use xslt_process as a means
171+
to fetch the contents of URLs - you should be aware of the security
172+
implications of this.
173+
174+
There is also a two-parameter version of xslt_process which does not
175+
pass any parameters to the transformation.
176+
177+
If you have any comments or suggestions, please do contact me at
178+
jgray@azuli.co.uk. Unfortunately, this isn't my main job, so I can't
179+
guarantee a rapid response to your query!

‎contrib/xml/TODO

Lines changed: 0 additions & 78 deletions
This file was deleted.

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp