Whitespace-only Text Nodes
You are looking at an older version of the documentation. The latest version is found here.
A special case in XML processing is text nodes consisting of only whitespace. XML processing in streaming mode may work differently from the non-streaming mode in dealing with whitespace-only text nodes (i.e. elements with only whitespaces as content).
Whitespace-only text nodes are processed in non-streaming mode, but not in streaming mode by default. You can configure this manually for queries eligible for XQuery optimization using the ALLOW_XML_TEXT_NODE_WHITESPACE
command option. It is a boolean function, so that it may have one of the two values:
Value | Default for | Description |
---|---|---|
FALSE | XML processing with XQuery streaming optimization | Content of element is removed, and it becomes an empty element |
TRUE | XML processing without XQuery streaming optimization | Content of element is left unchanged |
In the first example below, for the streaming mode, the XML parser does not remove whitespaces at the beginning or the end, and multiple whitespaces inside the string are also not removed. The parser removes the text node consisting of whitespaces only:
SELECT xt.name, xt.surname, LENGTH(xt.surname)
FROM XMLTABLE('/root/band_members/band_member' PASSING XMLPARSE(DOCUMENT
'<?xml version="1.0" encoding="UTF-8"?>
<root>
<band_members>
<band_member>
<name>George does not stream</name>
<surname> starts with a space, ends with a space </surname>
</band_member>
<band_member>
<name>Paul does not stream</name>
<surname>we have three - - spaces here inside the string</surname>
</band_member>
<band_member>
<name>Ringo does not stream.</name>
<surname> </surname>
</band_member>
</band_members>
</root>')
COLUMNS
name STRING PATH 'name',
surname STRING PATH 'surname'
) xt
/**
* Outcome:
* "George does not stream", " starts with a space, ends with a space ", 40
* "Paul does not stream", "we have three - - spaces here inside the string", 49
* "Ringo does not stream.", "", 0
*/
Rerunning the same query with the ALLOW_XML_TEXT_NODE_WHITESPACE
command option and setting to FALSE returns a different result for the third entry
SELECT xt.name, xt.surname, LENGTH(xt.surname)
FROM XMLTABLE('/root/band_members/band_member' PASSING XMLPARSE(DOCUMENT
'<?xml version="1.0" encoding="UTF-8"?>
<root>
<band_members>
<band_member>
<name>George does not stream</name>
<surname> starts with a space, ends with a space </surname>
</band_member>
<band_member>
<name>Paul does not stream</name>
<surname>we have three - - spaces here inside the string</surname>
</band_member>
<band_member>
<name>Ringo does not stream.</name>
<surname> </surname>
</band_member>
</band_members>
</root>')
COLUMNS
name STRING PATH 'name',
surname STRING PATH 'surname'
) xt
OPTION $ALLOW_XML_TEXT_NODE_WHITESPACE TRUE
/**
* Outcome:
* "George does not stream", " starts with a space, ends with a space ", 40
* "Paul does not stream", "we have three - - spaces here inside the string", 49
* "Ringo does not stream.", " ", 3
*/
If the non-streaming mode is used, the surname becomes an empty element in all cases and passing either TRUE
or FALSE
for ALLOW_XML_TEXT_NODE_WHITESPACE
does not have any effect:
SELECT xt.name, xt.surname, LENGTH(xt.surname)
FROM XMLTABLE('/root/band_members/band_member[3]/surname' PASSING XMLPARSE(DOCUMENT
'<?xml version="1.0" encoding="UTF-8"?>
<root>
<band_members>
<band_member>
<name>George does not stream</name>
<surname> starts with a space, ends with a space </surname>
</band_member>
<band_member>
<name>Paul does not stream</name>
<surname>we have three- -spaces here inside the string</surname>
</band_member>
<band_member>
<name>Ringo does not stream.</name>
<surname> </surname>
</band_member>
</band_members>
</root>')
COLUMNS
name STRING PATH '../name',
surname STRING PATH '.'
) xt
/**
* Outcome:
* "Ringo does not stream.", " ", 3
*/