XQuery Optimization
Document processing may require a lot of time and resources, but in many cases, it can be made faster and more efficient thanks to document projection and streaming mode. In this section, we will describe these two techniques.
Document Projection
Document projection is a special mechanism that analyzes the query, determines which parts of the document it may need, and then builds a tree to represent it using only these parts.
As a result, each child element is independently added, and there is no need to load the entire document in memory. And so the memory footprint is reduced.
XQuery Streaming
Even faster and more efficient document processing can be achieved with the so-called streaming mode in which the XQuery is executed directly on the parsing events - even relatively large XML documents can be processed in this way.
However, there are certain limitations: some operations (such as sorting) cannot be performed in the streaming mode, because the independent subtrees are processed without loading the entire document, and the data are processed in order of arrival. And generally, the more complex the query, the less likely the possibility of using the streaming mode on it.
Syntax
SELECT
XMLQUERY
(
'/*:root/*:child'
PASSING doc
)
Rules
The streaming mode has the following pre-requisites:
- Document projection can be applied to the given document;
XQuery path expressions meet certain conditions.
The essential principle (and the main limitation) of XQuery streaming is that for a given node, only its child elements can be read once and once by one. When working with XQuery streaming, keep in mind the following points:
- Path expressions must only use downward selection;
- Even one expression that could potentially use many nodes (e.g. //x rather than /a/b/x) will prevent XQuery streaming optimization;
- Predicates can reference attributes, but not child elements of the node being filtered;
- A streamed node may not be bound to a variable (this rules out many uses of FLWOR expressions);
- A streamed node must not be passed as an argument to a function call (except for built-in function calls);
- Global variables in the query must not reference the context item;
- XML processing in streaming mode may work differently from the non-streaming mode in dealing with whitespace-only text nodes.
Eligible and Ineligible Queries
Document projection can only be used on optimized XQuery with a path expression such as "[/][ns:]root/[ns1:]elem/...", where a namespace prefix or element name can also be the * wild card. As with normal XQuery processing, if namespace prefixes are used in the XQuery expression, they should be declared using the XMLNAMESPACES
clause.
XML Queries
Eligible XML Query
For the following query, we can use document projection and XQuery streaming optimization, because it is dealing with parent and child elements defined exactly in the path:
SELECT XMLQUERY
(
'/*:book/*:volume'
PASSING
XMLPARSE
(
document
'<book xmlns:ns1="uri">
<volume>The Fellowship of the Ring</volume>
<volume>The Two Towers</volume>
<volume>The Return of the King</volume>
</book>'
)
)
Ineligible XML Query
This query returns the same result as the query above, but in this case, XQuery optimization cannot be applied because the descendant axis //*:volume
is used:
SELECT XMLQUERY
(
'//*:volume'
PASSING
XMLPARSE
(
document
'<book xmlns:ns1="uri">
<volume>The Fellowship of the Ring</volume>
<volume>The Two Towers</volume>
<volume>The Return of the King</volume>
</book>'
)
)
Eligible XMLTABLE Queries
When using XMLTABLE
, the COLUMN PATH XQUERY EXPRESSION
has additional restrictions:
- They are allowed to reference any part of the element subtree formed by the
CONTEXT XQUERY EXPRESSION
; - They may use any attribute value from their direct parentage;
- Any
COLUMN PATH XQUERY EXPRESSION
referencing a non-direct ancestor or sibling of the current context item prevents XQuery streaming optimization from being used.
Eligible XMLTABLE Query
In this case, the context XQuery expression and the column path XQuery expressions allow XQuery streaming optimization:
SELECT
x.fullchild,
x.parent_attr,
x.child_val
FROM XMLTABLE
(
'/*:book/*:volume'
PASSING
XMLPARSE
(
document
'
<book attr="The Lord of the Rings">
<volume attr="Part 1">The Fellowship of the Ring</volume>
<volume attr="Part 2">The Two Towers</volume>
<volume attr="Part 3">The Return of the King</volume>
<Author attr="Context">J. R. R. Tolkien</Author>
</book>'
)
COLUMNS
fullchild XML PATH '.',
parent_attr string PATH '../@attr',
child_val string PATH '.'
) x
Ineligible XMLTABLE Query
In the following example, the reference of an element outside of the child subtree in the sibling_attr
path prevents XQuery streaming optimization from being used:
SELECT x.sibling_attr
FROM XMLTABLE
(
'/*:book/*:volume'
PASSING
XMLPARSE
(
document
'
<book attr="The Lord of the Rings">
<volume attr="Part 1">The Fellowship of the Ring</volume>
<volume attr="Part 2">The Two Towers</volume>
<volume attr="Part 3">The Return of the King</volume>
<Author attr="Context">J. R. R. Tolkien</Author>
</book>'
)
COLUMNS
sibling_attr string PATH '../Author/@attr'
) x