9 Full Text Search Over XML

This chapter describes Full Text search over XML using Oracle.

First we motivate the topic by introducing Full Text search and XML. Then we give an overview and comparison of the CONTAINS SQL function and the ora:contains XPath function, the two functions used by Oracle to do Full Text search over XML. Then we examine each of these functions in detail. The detailed descriptions have similar headings, so you can compare the two approaches easily.

To get the most out of this chapter you should be familiar with XML, XML DB and Oracle Text. This chapter includes a review of some Oracle Text features.

See Also:

Oracle Text Reference and Oracle Text Application Developer's Guide for more information on Oracle Text

This chapter contains these topics:

Full Text Search and XML
About the Examples in this Chapter
Overview of CONTAINS and ora:contains
CONTAINS SQL Function
ora:contains XPath Function
Text Path BNF
Example Support

Full Text Search and XML

Oracle supports Full Text search on documents that are managed by the Oracle Database. If your documents are XML, then you can use the XML structure of the document to restrict the Full Text search. For example, you may want to find all purchase orders that contain the word "electric" using Full Text search. If the purchase orders are XML, then you can restrict the search by finding all purchase orders that contain the word "electric" in a comment, or by finding all purchase orders that contain the word "electric" in a comment under items. If your XML documents are of type XMLType, then you can project the results of your query using the XML structure of the document. For example, after finding all purchase orders that contain the word "electric" in a comment, you may want to return just the comments, or just the comments that contain the word "electric".

Comparison of Full Text Search and Other Search Types

Full Text search differs from structured search or substring search in the following ways:

A Full Text search searches for words rather than substrings. A substring search for comments that contain the string "law" will return a comment that contains "my lawn is going wild". A Full Text search for the word "law" will not.
A Full Text search will support some language-based and word-based searches which substring searches cannot. You can use a language-based search, for example, to find all the comments that contain a word with the same linguistic stem as "mouse", and Oracle Text will find "mouse" and "mice". You can use a word-based search, for example, to find all the comments that contain the word "lawn" within 5 words of "wild".
A Full Text search generally involves some notion of relevance. When you do a Full Text search for all the comments that contain the word "lawn", for example, some results are more relevant than others. Relevance is often related to the number of times the search word (or similar words) occur in the document.

XML search

XML search is different from unstructured document search. In unstructured document search you generally search across a set of documents to return the documents that satisfy your text predicate. In XML search you often want to use the structure of the XML document to restrict the search. And you often want to return just the part of the document that satisfies the search.

Search using Full Text and XML Structure

There are two ways to do a search that includes Full Text search and XML structure:

Include the structure inside the Full Text predicate, using the CONTAINS SQL function:
```
... WHERE CONTAINS( DOC, 'electric INPATH (/purchaseOrder/items/item/comment)' )>0 ...
```
The CONTAINS SQL function is an extension to SQL and can be used in any query. CONTAINS requires a CONTEXT Full Text index.
Include the Full Text predicate inside the structure, using the ora:contains XPath function:
```
... '/purchaseOrder/items/item/comment[ora:contains(text(), "electric")>0]' ...
```
The ora:contains XPath function is an extension to XPath and can be used in any call to existsNode, extract or extractValue.

About the Examples in this Chapter

This section describes details about the examples included in this chapter.

Roles and Privileges

To run the examples you will need the CTXAPP role, as well as CONNECT and RESOURCE. You must also have EXECUTE privilege on the ctxsys package CTX_DDL.

Examples Schema and Data

Examples in this chapter are based on "The Purchase Order Schema", w3c XML Schema Part 0: Primer.

See Also:

http://www.w3.org/TR/xmlschema-0/#POSchema

The data in the examples is "Purchase Order po001.xml". Some of the performance examples are based on a bigger table (PURCHASE_ORDERS_xmltype_big), which is included in the downloadable version only.

See Also:

http://www.w3.org/TR/xmlschema-0/#po.xml

Some examples use VARCHAR2, others use XMLType. All the examples that use VARCHAR2 will also work on XMLType.

See Also:

Oracle Technology Network (http://otn.oracle.com) for the example data, the example schema, and a script to run all the examples

Overview of CONTAINS and ora:contains

This section contains these topics:

Overview of the CONTAINS SQL Function
Overview of the ora:contains XPath Function
Comparison of CONTAINS and ora:contains

Overview of the CONTAINS SQL Function

CONTAINS returns a positive number for rows where [schema.]column matches text_query, and zero otherwise. CONTAINS is a user-defined function, a standard extension method in SQL. CONTAINS requires an index of type CONTEXT. If there is no CONTEXT index on the column being searched, then CONTAINS throws an error.

Syntax

CONTAINS signatureCONTAINS(
    [schema.]column,
    text_query    VARCHAR2
    [,label       NUMBER]
    )
RETURN NUMBER

Example 9-1 Simple CONTAINS Query

A typical query looks like this:

SELECT ID
    FROM PURCHASE_ORDERS
    WHERE CONTAINS( DOC, 'lawn' )>0 ;

This query uses table PURCHASE_ORDERS and index po_index. It returns the ID for each row in table PURCHASE_ORDERS where the DOC column contains the word "lawn".

Example 9-2 CONTAINS with a Structured Predicate

CONTAINS can be used in any SQL query. Here is an example using table PURCHASE_ORDERS and index po_index:

SELECT ID
    FROM PURCHASE_ORDERS
    WHERE CONTAINS( DOC, 'lawn' )>0  AND id<25 ;

Example 9-3 CONTAINS Using XML Structure to Restrict the Query

Suppose DOC is a column that contains a set of XML documents. You can do Full Text search over DOC, using its XML structure to restrict the query. This query uses table PURCHASE_ORDERS and index po_index-path-section:

SELECT ID
    FROM PURCHASE_ORDERS
    WHERE CONTAINS( DOC, 'lawn WITHIN comment' )>0 ;

Example 9-4 CONTAINS with Structure Inside Full Text Predicate

More complex structure restrictions can be applied with the INPATH operator and an XPath expression. This query uses table PURCHASE_ORDERS and index po_index-path-section:

SELECT ID
FROM PURCHASE_ORDERS
WHERE CONTAINS( DOC, 'electric INPATH (/purchaseOrder/items/item/comment)' )>0 ;

Overview of the ora:contains XPath Function

Function ora:contains can be used in an XPath expression in a call to existsNode, extract, or extractValue to further restrict the structural search with a Full Text predicate. Function ora:contains returns a positive integer when the input_text matches text_query, and zero otherwise.

In this version, input_text must evaluate to a single text node or an attribute. The syntax and semantics of text_query in ora:contains are the same as text_query in CONTAINS, except that in ora:contains the text_query cannot include any structure operators (WITHIN, INPATH, or HASPATH). Function ora:contains extends XPath through a standard mechanism: it is a user-defined function in the Oracle XML DB namespace.

Syntax

ora:contains(
         input_text      node*,
         text_query      string
         [,policy_name   string]
         [,policy_owner  string]
         )
RETURN NUMBER

Example 9-5 shows a call to ora:contains in the XPath parameter to existsNode. Note that the third parameter (the XML DB namespace) is required. This example uses table PURCHASE_ORDERS_xmltype.

Example 9-5 ora:contains with an Arbitrarily Complex Text Query

SELECT ID
  FROM PURCHASE_ORDERS_xmltype
  WHERE existsNode( DOC, 
           '/purchaseOrder/comment[ora:contains(text(), 
           "($lawns AND wild) OR flamingo")>0]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"'
         ) = 1 ;

See Also:

"ora:contains XPath Function" for more on the ora:contains XPath function

Comparison of CONTAINS and ora:contains

The CONTAINS SQL function:

Needs a CONTEXT index to run

If there is no index, then you get an error.
Does an indexed search and is generally very fast
Returns a score (through the score operator)
Can restrict a search using both Full Text and XML structure
Restricts a search based on documents (rows in a table) rather than nodes
Cannot be used for XML structure-based projection (pulling out parts of an XML document)

The ora:contains XPath function:

Does not need an index to run, so it is very flexible
Separates application logic from storing and indexing considerations
Might do an unindexed search, so it might be resource-intensive
Does not return a score
Can restrict a search using Full Text in an XPath expression
Can be used for XML structure-based projection (pulling out parts of an XML document)

Use CONTAINS when you want a fast, index-based Full Text search over XML documents, possibly with simple XML structure constraints. Use ora:contains when you need the flexibility of Full Text search in XPath (possibly without an index), or when you need to do projection, and you do not need a score.

CONTAINS SQL Function

This section contains these topics:

Full Text Search
Score
Structure: Restricting the Scope of the Search
Structure: Projecting the Result
Indexing

Full Text Search

The second argument to CONTAINS, text_query, is a string that specifies the Full Text search. text_query has its own language, based on the SQL/MM Full-Text standard. The operators in the text_query language are documented in [Oracle Text Reference].

See Also:

ISO/IEC 13249-2:2000, Information technology - Database languages - SQL Multimedia and Application Packages - Part 2: Full-Text, International Organization For Standardization, 2000
Oracle Text Reference for more information on the operators in the text_query language

The examples in the rest of this section show some of the power of Full Text search. They use just a few of the available operators: Booleans (AND, OR, NOT) and stemming. The example queries search over a VARCHAR2 column (PURCHASE_ORDERS.doc) with a text index (indextype CTXSYS.CONTEXT).

Boolean Operators: AND, OR, NOT

The text_query language supports arbitrary combinations of AND, OR and NOT. Precedence can be controlled using parentheses. The Boolean operators can be written as:

AND, OR, NOT
and, or, not
&, |, ~

Example 9-6 CONTAINS Query with Simple Boolean

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'lawn AND wild' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index.

Example 9-7 CONTAINS Query with Complex Boolean

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, '( (lawn OR garden) AND (wild OR flooded) )
        NOT(flamingo)' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index.

See Also:

Oracle Text Reference for a full list of the operators you can use in CONTAINS and ora:contains

Stemming: $

The text_query language supports stemmed search. Example 9-8 returns all documents that contain some word with the same linguistic stem as "lawns", so it will find "lawn" or "lawns". The stem operator is written as a dollar sign ($). There is no operator STEM or stem.

Example 9-8 CONTAINS Query with Stemming

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, '$(lawns)' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index.

Combining Boolean and Stemming Operators

operators in the text_query language can be arbitrarily combined, as shown in Example 9-9.

Example 9-9 CONTAINS Query with Complex Query Expression

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, '($lawns AND wild) OR flamingo' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index.

See Also:

Oracle Text Reference for a full list of text_query operators

Score

The CONTAINS function has an ancillary operator SCORE that can be used anywhere in the query. It is a measure of relevance, and it is especially useful when doing Full Text searches across large document sets. SCORE is typically returned as part of the query result, used in the ORDER BY clause, or both.

Syntax

SCORE( label       NUMBER )
RETURN NUMBER

In Example 9-10, SCORE(10) returns the score for each row in the result set. SCORE is the relevance of a row in the result set with respect to a particular CONTAINS call. A call to SCORE is linked to a call to CONTAINS by a LABEL (in this case the number 10).

Example 9-10 Simple CONTAINS Query with SCORE

SELECT SCORE(10), ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'lawn', 10 )>0 
  AND SCORE(10)>2
  ORDER BY SCORE(10) DESC ;

This example uses table PURCHASE_ORDERS and index po_index.

SCORE always returns 0 if, for the corresponding CONTAINS, text_query does not match the input_text, according to the matching rules dictated by the text index. If the CONTAINS text_query does match the input_text, then SCORE will return a number greater than 0 and less than or equal to 100. This number indicates the relevance of the text_query to the input_text. A higher number means a better match.

If the CONTAINS text_query consists of only the HASPATH operator and a Text Path, the score will be either 0 or 100, because HASPATH tests for an exact match.

See Also:

Oracle Text Reference for details on how the score is calculated

Structure: Restricting the Scope of the Search

CONTAINS does a Full Text search across the whole document by default. In our examples, a search for "lawn" with no structure restriction will find all purchase orders with the word "lawn" anywhere in the purchase order.

Oracle offers three ways to restrict CONTAINS queries using XML structure:

WITHIN
INPATH
HASPATH

Note:

For the purposes of this discussion, consider section to be the same as an XML node.

WITHIN

The WITHIN operator restricts a query to some section within an XML document. A search for purchase orders that contain the word "lawn" somewhere inside a comment section might use WITHIN. Section names are case-sensitive.

Example 9-11 WITHIN

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'lawn WITHIN comment' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

Nested WITHIN

You can restrict the query further by nesting WITHIN. Example 9-12 finds all documents that contain the word "lawn" within a section "comment", where that occurrence of "lawn" is also within a section "item".

Example 9-12 Nested WITHIN

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, '(lawn WITHIN comment) WITHIN item' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

Example 9-12 returns no rows. Our sample purchase order does contain the word "lawn" within a comment. But the only comment within an item is "Confirm this is electric". So the nested WITHIN query will return no rows.

WITHIN Attributes

You can also search within attributes. Example 9-13 finds all purchase orders that contain the word "10" in the orderDate attribute of a purchaseOrder element.

Example 9-13 WITHIN an Attribute

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, '10 WITHIN purchaseOrder@orderDate' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

Note that by default the minus sign ("-") is treated as a word-separator: "1999-10-20" is treated as the three words "1999", "10" and "20". So this query returns 1 row.

Text in an attribute is not a part of the main searchable document. If you search for "10" without qualifying the text_query with WITHIN purchaseOrder@orderDate, then you will get no rows.

You cannot search attributes in a nested WITHIN.

WITHIN and AND

Suppose you want to find purchase orders that contain two words within a comment section: "lawn" and "electric". There can be more than one comment section in a purchaseOrder. So there are two ways to write this query, with two distinct results.

If you want to find purchase orders that contain both words, where each word occurs in some comment section, you would write a query like Example 9-14.

Example 9-14 WITHIN and AND: Two Words in Some Comment Section

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, '(lawn WITHIN comment) AND (electric WITHIN comment) '
 )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

If you run this query against the purchaseOrder data, then it returns 1 row. Note that the parentheses are not needed in this example, but they make the query more readable.

If you want to find purchase orders that contain both words, where both words occur in the same comment, you would write a query like Example 9-15.

Example 9-15 WITHIN and AND: Two Words in the Same Comment

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, '(lawn AND electric) WITHIN comment' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

Example 9-15 will return no rows. Example 9-16, which omits the parentheses around lawn AND electric, on the other hand, will return 1 row.

Example 9-16 WITHIN and AND: No Parentheses

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'lawn AND electric WITHIN comment' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

WITHIN has a higher operator precedence than AND, so Example 9-16 is parsed as Example 9-17.

Example 9-17 WITHIN and AND: Parentheses Illustrating Operator Precedence

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, ' lawn AND (electric WITHIN comment) ' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

Definition of Section

The foregoing examples have used the WITHIN operator to search within a section. A section can be a:

PATH or ZONE section

This is a concatenation, in document order, of all text nodes that are descendants of a node, with whitespace separating the text nodes. To convert from a node to a ZONE section, you must serialize the node and replace all tags with whitespace. PATH sections have the same scope and behavior as ZONE sections, except that PATH sections support queries with INPATH and HASPATH operators.
FIELD section

This is the same as a ZONE section, except that repeating nodes in a document are concatenated into a single section, with whitespace as a separator.
Attribute section
Special section (sentence or paragraph)

See Also:
Oracle Text Reference for more information on special sections

INPATH

The WITHIN operator provides an easy and intuitive way to express simple structure restrictions in the text_query. For queries that use abundant XML structure, you can use the INPATH operator plus a Text Path instead of nested WITHIN operators.

The INPATH operator takes a text_query on the left and a Text Path, enclosed in parentheses, on the right. Example 9-18 finds purchaseOrders that contain the word "electric" in the path /purchaseOrder/items/item/comment.

Example 9-18 Structure Inside Full Text Predicate: INPATH

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS(DOC, 'electric INPATH (/purchaseOrder/items/item/comment)')>0;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

The scope of the search is the section indicated by the Text Path. If you choose a broader path, such as /purchaseOrder/items, you will still get 1 row returned, as shown in Example 9-19.

Example 9-19 Structure Inside Full Text Predicate: INPATH

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'electric INPATH (/purchaseOrder/items)' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

The Text Path

The syntax and semantics of the Text Path are based on the w3c XPath 1.0 recommendation. Simple path expressions are supported (abbreviated syntax only), but functions are not. The following examples are meant to give the general flavor.

See Also:

http://www.w3.org/TR/xpath for information on the w3c XPath 1.0 recommendation
"Text Path BNF" for the Text Path grammar

Example 9-20 finds all purchase orders that contain the word "electric" in a "comment" which is the direct child of an "item" with an attribute partNum equal to "872-AA", which in turn is the direct child of an "items", which is any number of levels down from the root node.

Example 9-20 INPATH with Complex Path Expression (1)

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'electric INPATH
      (//items/item[@partNum="872-AA"]/comment)' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

Example 9-21 finds all purchase orders that contain the word "lawnmower" in a third-level "item" (or any of its descendants) that has a "comment" descendant at any level. This query returns 1 row. Note that the scope of the query is not a "comment", but the set of "items" that have a "comment" as a descendant.

Example 9-21 INPATH with Complex Path Expression (2)

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'lawnmower INPATH (/*/*/item[.//comment])' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

Text Path Compared to XPath

The Text Path language differs from the XPath language in the following ways:

Not all XPath operators are included in the Text Path language.
XPath built-in functions are not included in the Text Path language.
Text Path language operators are case-insensitive.
If you use "=" inside a filter (inside square brackets), matching follows text-matching rules.

Rules for case-sensitivity, normalization, stopwords and whitespace depend on the text index definition. To emphasize this difference, this kind of equality is referred to here as text-equals.
Namespace support is not included in the Text Path language.

The name of an element, including a namespace prefix if it exists, is treated as a string. So two namespace prefixes that map to the same namespace URI will not be treated as equivalent in the Text Path language.
In a Text Path the context is always the root node of the document.

So in the purchaseOrder data purchaseOrder/items/item, /purchaseOrder/items/item and ./purchaseOrder/items/item are equivalent.
If you want to search within an attribute value, then the direct parent of the attribute must be specified (wildcards cannot be used).
A Text Path may not end in a wildcard (*).

See Also:

"Text Path BNF" for the Text Path grammar

Nested INPATH

You can nest INPATH expressions. The context for the Text Path is always the root node. It is not changed by a nested INPATH.

Example 9-22 finds purchase orders that contain the word "electric" in a "comment" section at any level, where the occurrence of that word is also in an "items" section that is the direct child of the top-level purchaseOrder.

Example 9-22 Nested INPATH

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, '(electric INPATH (//comment)) INPATH
     (/purchaseOrder/items)' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

This nested INPATH query could be written more concisely as shown in Example 9-23.

Example 9-23 Nested INPATH Rewritten

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'electric INPATH (/purchaseOrder/items//comment)' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

HASPATH

The HASPATH operator takes only one operand: a Text Path, enclosed in parentheses, on the right. Use HASPATH when you want to find documents that contain a particular section in a particular path, possibly with an "=" predicate. Note that this is a path search rather than a Full Text search. You can check for existence of a section, or you can match the contents of a section, but you cannot do word searches. If your data is of type XMLType, then consider using existsNode instead of HASPATH.

Example 9-24 finds purchaseOrders that have some item that has a USPrice.

Example 9-24 Simple HASPATH

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'HASPATH (/purchaseOrder//item/USPrice)' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

Example 9-25 finds purchaseOrders that have some item that has a USPrice that text-equals "148.95".

See Also:

"Text Path Compared to XPath" for an explanation of text-equals

Example 9-25 HASPATH Equality

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'HASPATH (/purchaseOrder//item/USPrice="148.95")' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

HASPATH can be combined with other CONTAINS operators such as INPATH. Example 9-26 finds purchaseOrders that contain the word "electric" anywhere in the document AND have some "item" that has a USPrice that text-equals "148.95" AND contain "10" in the purchaseOrder attribute orderDate.

Example 9-26 HASPATH with Other Operators

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'electric AND HASPATH
     (/purchaseOrder//item/USPrice="148.95") AND 10 INPATH
     (/purchaseOrder/@orderDate)' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

Structure: Projecting the Result

The result of a SQL query with a CONTAINS predicate in the WHERE clause is always a set of rows (and possibly SCORE information), or a projection over the rows that match the query. If you want to return only a part of each XML document that satisfies the CONTAINS predicate, then use the SQL/XML extensions extract and extractValue. Note that extract and extractValue operate on the XMLType type, so the following examples use the table PURCHASE_ORDERS_xmltype.

See Also:

Oracle XML DB Developer's Guide for more information on extract and extractValue

Example 9-27 finds purchaseOrders that contain the word "electric" in a "comment" that is a descendant of the top-level purchaseOrder. Instead of returning the ID of the row for each result, extract is used to return only the "comment".

Example 9-27 Using Extract to Scope the Results of a CONTAINS Query

SELECT 
  extract( DOC, 
           '/purchaseOrder//comment', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"'
         ) "Item Comment"
  FROM PURCHASE_ORDERS_xmltype
  WHERE CONTAINS( DOC, 'electric INPATH (/purchaseOrder//comment) ' )>0 ;

This example uses table PURCHASE_ORDERS_xmltype and index po_index_xmltype.

Note that the result of Example 9-27 is two instances of "comment". CONTAINS tells us which rows contain the word "electric" in a "comment" (the row with ID=1), and extract extracts all the instances of "comment" in the document at that row. There are two instances of "comment" in our purchaseOrder, and the query returns both of them.

This might not be what you want. If you want the query to return only the instances of "comment" that satisfy the CONTAINS predicate, then you must repeat that predicate in the extract. You do that with ora:contains, which is an XPath function.

Example 9-28 returns only the "comment" that matches the CONTAINS predicate.

Example 9-28 Using Extract Plus ora:contains to Project the Results of a CONTAINS Query

SELECT 
  extract( DOC, 
       '/purchaseOrder/items/item/comment[ora:contains(text(), "electric")>0]', 
       'xmlns:ora="http://xmlns.oracle.com/xdb"') "Item Comment"
  FROM PURCHASE_ORDERS_xmltype
  WHERE CONTAINS( DOC, 'electric 
       INPATH (/purchaseOrder/items/item/comment) ' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

Indexing

This section contains these topics:

Introduction to the CONTEXT Index
Effect of the CONTEXT Index on CONTAINS
The CONTEXT Index: Preferences
Introduction to Section Groups

Introduction to the CONTEXT Index

The Oracle general purpose Full Text indextype is the CONTEXT indextype, owned by the database user CTXSYS. To create a default Full Text index, use the regular SQL CREATE INDEX command, and add the clause INDEXTYPE IS CTXSYS.CONTEXT, as shown in Example 9-29.

Example 9-29 Simple CONTEXT Index on PURCHASE_ORDERS Table

CREATE INDEX po_index
  ON PURCHASE_ORDERS( DOC )
  INDEXTYPE IS ctxsys.CONTEXT ;

This example uses table PURCHASE_ORDERS.

You have many choices available when building a Full Text index. These choices are expressed as indexing preferences. To use an indexing preference, add the PARAMETERS clause to CREATE INDEX, as shown in Example 9-30.

See Also:

"The CONTEXT Index: Preferences"

Example 9-30 Simple CONTEXT Index on PURCHASE_ORDERS Table with Path Section Group

CREATE INDEX po_index
  ON PURCHASE_ORDERS( DOC )
  INDEXTYPE IS ctxsys.CONTEXT 
  PARAMETERS ( 'section group ctxsys.PATH_SECTION_GROUP' ) ;

This example uses table PURCHASE_ORDERS.

Oracle Text provides other indextypes, such as CTXCAT and CTXRULE, which are outside the scope of this chapter.

See Also:

Oracle Text Reference for more information on CONTEXT indexes

CONTEXT Index on XMLType Table

You can build a CONTEXT index on any data that contains text. Example 9-29 creates a CONTEXT index on a VARCHAR2 column. The syntax to create a CONTEXT index on a column of type CHAR, VARCHAR, VARCHAR2, BLOB, CLOB, BFILE, XMLType, or URIType is the same. Example 9-31 creates a CONTEXT index on a column of type XMLType.

Example 9-31 Simple CONTEXT Index on PURCHASE_ORDERS_xmltype Table (Defaults to PATH_SECTION_GROUP)

CREATE INDEX po_index_xmltype
  ON PURCHASE_ORDERS_xmltype( DOC )
  INDEXTYPE IS ctxsys.CONTEXT ;

This example uses table PURCHASE_ORDERS_xmltype.

If you have a table of type XMLType, then you need to use object syntax to create the CONTEXT index as shown in Example 9-32.

Example 9-32 Simple CONTEXT Index on XMLType Table

CREATE INDEX po_index_xmltype_table
  ON PURCHASE_ORDERS_xmltype_table T ( value(T) )
  INDEXTYPE IS ctxsys.CONTEXT ;

This example uses table PURCHASE_ORDERS_xmltype.

You can then query the table using the syntax in Example 9-33.

Example 9-33 CONTAINS Query on XMLType Table

SELECT 
  extract( value(T), '/purchaseOrder/@orderDate' ) "Order Date"
  FROM 
    PURCHASE_ORDERS_xmltype_table T
    WHERE CONTAINS( value(T), 'electric INPATH (/purchaseOrder//comment) ' )>0 ;

This example uses table PURCHASE_ORDERS_xmltype_table and index po_index_xmltype_table.

Maintaining the CONTEXT Index

The CONTEXT index, like most Full Text indexes, is asynchronous. When indexed data is changed, the CONTEXT index might not change until you take some action, such as calling a procedure to synchronize the index. There are a number of ways to manage changes to the CONTEXT index, including some options that are new for this release.

The CONTEXT index might get fragmented over time. A fragmented index uses more space, and it leads to slower queries. There are a number of ways to optimize (defragment) the CONTEXT index, including some options that are new for this release.

See Also:

Oracle Text Reference for more information on CONTEXT index maintenance

Roles and Privileges

You do not need any special privileges to create a CONTEXT index. You need the CTXAPP role to create and delete preferences and to use the Oracle Text PL/SQL packages. You must also have EXECUTE privilege on the ctxsys package CTX_DDL.

Effect of the CONTEXT Index on CONTAINS

You must create an index of type CONTEXT in order to use the CONTAINS function. If you call the CONTAINS function, and the column given in the first argument does not have an index of type CONTEXT, then you will get an error.

The syntax and semantics of text_query depend on the choices you make when you build the CONTEXT index. For example:

What counts as a word?
Are very common words processed?
What is a common word?
Is the text search case-sensitive?
Can the text search include themes (concepts) as well as keywords?

The CONTEXT Index: Preferences

A preference can be considered a collection of indexing choices. Preferences include section group, datastore, filter, wordlist, stoplist and storage. This section shows how to set up a lexer preference to make searches case-sensitive.

You can use the procedure CTX_DDL.CREATE_PREFERENCE (CTX_DDL.CREATE_STOPLIST) to create a preference. Override default choices in that preference group by setting attributes of the new preference, using the CTX_DDL.SET_ATTRIBUTE procedure. Then use the preference in a CONTEXT index by including <preference type> <preference_name> in the PARAMETERS string of CREATE INDEX.

Once a preference has been created, you can use it to build any number of indexes.

Making Search Case-Sensitive

Full Text searches with CONTAINS are case-insensitive by default. That is, when matching words in text_query against words in the document, case is not considered. Section names and attribute names, however, are always case-sensitive.

If you want Full Text searches to be case-sensitive, then you need to make that choice when building the CONTEXT index. Example 9-34 returns 1 row, because "HURRY" in text_query matches "Hurry" in the purchaseOrder with the default case-insensitive index.

Example 9-34 CONTAINS: Default Case Matching

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'HURRY INPATH (/purchaseOrder/comment)' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-path-section.

Example 9-35 creates a new lexer preference my_lexer, with the attribute mixed_case set to TRUE. It also sets printjoin characters to "-" and "!" and ",". You can use the same preferences for building CONTEXT indexes and for building policies.

See Also:

Oracle Text Reference for a full list of lexer attributes

Example 9-35 Create a Preference for Mixed Case

BEGIN
  Ctx_Ddl.Create_Preference (
    preference_name  =>  'my_lexer',
    object_name      =>  'BASIC_LEXER'
    ) ;
    
  Ctx_Ddl.Set_Attribute (
    preference_name  =>  'my_lexer', 
    attribute_name   =>  'mixed_case', 
    attribute_value  =>  'TRUE'
    ) ;
    
  Ctx_Ddl.Set_Attribute (
    preference_name  =>  'my_lexer', 
    attribute_name   =>  'printjoins', 
    attribute_value  =>  '-,!'
    ) ;

END ;
/

Example 9-36 builds a CONTEXT index using the new my_lexer lexer preference.

Example 9-36 CONTEXT Index on PURCHASE_ORDERS Table, Mixed Case

CREATE INDEX po_index
  ON PURCHASE_ORDERS( DOC )
  INDEXTYPE IS ctxsys.context
  PARAMETERS( 'lexer my_lexer section group ctxsys.PATH_SECTION_GROUP' ) ;

This example uses table PURCHASE_ORDERS and preference: preference-case-mixed.

Example 9-36 returns no rows, because "HURRY" in text_query no longer matches "Hurry" in the purchaseOrder. Example 9-37 returns 1 row, because the text_query term "Hurry" exactly matches the word "Hurry" in the purchaseOrder.

Example 9-37 CONTAINS: Mixed (Exact) Case Matching

SELECT ID
  FROM PURCHASE_ORDERS
  WHERE CONTAINS( DOC, 'Hurry INPATH (/purchaseOrder/comment)' )>0 ;

This example uses table PURCHASE_ORDERS and index po_index-case-mixed.

Introduction to Section Groups

One of the choices you make when creating a CONTEXT index is section group. A section group instance is based on a section group type. The section group type specifies the kind of structure in your documents, and how to index (and therefore search) that structure. The section group instance may specify which structure elements are indexed. Most users will either take the default section group or use a pre-defined section group.

Choosing a Section Group Type

The section group types that are useful in XML searching are:

PATH_SECTION_GROUP

Choose this when you want to use WITHIN, INPATH and HASPATH in queries, and you want to be able to consider all sections to scope the query.
XML_SECTION_GROUP

Choose this when you want to use WITHIN, but not INPATH and HASPATH, in queries, and you want to be able to consider only explicitly-defined sections to scope the query. XML_SECTION_GROUP section group type supports FIELD sections in addition to ZONE sections. In some cases FIELD sections offer significantly better query performance.
AUTO_SECTION_GROUP

Choose this when you want to use WITHIN, but not INPATH and HASPATH, in queries, and you want to be able to consider most sections to scope the query. By default all sections are indexed (available for query restriction). You can specify that some sections are not indexed (by defining STOP sections).
NULL_SECTION_GROUP

Choose this when defining no XML sections.

Other section group types include:

BASIC_SECTION_GROUP
HTML_SECTION_GROUP
NEWS_SECTION_GROUP

Oracle recommends that most users with XML Full Text search requirements use PATH_SECTION_GROUP. Some users may prefer XML_SECTION_GROUP with FIELD sections. This choice will generally give better query performance and a smaller index, but it is limited to documents with fielded structure (searchable nodes are all non-repeating leaf nodes).

See Also:

Oracle Text Reference for a detailed description of the XML_SECTION_GROUP section group type

Choosing a Section Group

When choosing a section group to use with your index, you can choose a supplied section group, take the default, or create a new section group based on the section group type you have chosen.

There are supplied section groups for section group types PATH_SECTION_GROUP, AUTO_SECTION_GROUP, and NULL_SECTION_GROUP. The supplied section groups are owned by CTXSYS and have the same name as their section group types. For example, the supplied section group of section group type PATH_SECTION_GROUP is CTXSYS.PATH_SECTION_GROUP.

There is no supplied section group for section group type XML_SECTION_GROUP, because a default XML_SECTION_GROUP would be empty and therefore meaningless. If you want to use section group type XML_SECTION_GROUP, then you must create a new section group and specify each node that you want to include as a section.

When you create a CONTEXT index on data of type XMLType, the default section group is the supplied section group CTXSYS.PATH_SECTION_GROUP. If the data is VARCHAR or CLOB, then the default section group is CTXSYS.NULL_SECTION_GROUP.

See Also:

Oracle Text Reference for instructions on creating your own section group

To associate a section group with an index, add section group <section group name> to the PARAMETERS string, as in Example 9-38.

Example 9-38 Simple CONTEXT Index on PURCHASE_ORDERS Table with Path Section Group

CREATE INDEX po_index
  ON PURCHASE_ORDERS( DOC )
  INDEXTYPE IS ctxsys.CONTEXT 
  PARAMETERS ( 'section group ctxsys.PATH_SECTION_GROUP' ) ;

This example uses table PURCHASE_ORDERS.

ora:contains XPath Function

Function ora:contains is an Oracle-defined XPath function for use in the XPath argument to the SQL/XML functions existsNode, extract, and extractValue.

Note:

These functions are not yet a part of the SQL/XML standard. But these functions or very similar functions are expected to be part of a future version of SQL/XML.

The ora:contains function name consists of a name (contains) plus a namespace prefix (ora:). When you use ora:contains in existsNode, extract or extractValue you must also supply a namespace mapping parameter, xmlns:ora="http://xmlns.oracle.com/xdb".

Full Text Search

The ora:contains argument text_query is a string that specifies the Full Text search. The ora:contains text_query is the same as the CONTAINS text_query, with the following restrictions:

ora:contains text_query must not include the structure operators WITHIN, INPATH, or HASPATH
ora:contains text_query may include the score weighting operator weight(*), but weights will be ignored

If you include any of the following in the ora:contains text_query, the query cannot use a CONTEXT index:

Score-based operators MINUS(-) or threshold(>)
Selective, corpus-based expansion operators FUZZY(?) or soundex(!)

See Also:

"Query-Rewrite and the CONTEXT Index"

Example 9-39 shows a Full Text search using an arbitrary combination of Boolean operators and $ (stemming).

Example 9-39 ora:contains with an Arbitrarily Complex Text Query

SELECT ID
  FROM PURCHASE_ORDERS_xmltype
  WHERE existsNode( DOC, 
           '/purchaseOrder/comment[ora:contains(text(), 
           "($lawns AND wild) OR flamingo")>0]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"'
         ) = 1 ;

This example uses table PURCHASE_ORDERS_xmltype.

See Also:

"Full Text Search" for a description of Full Text operators
Oracle Text Reference for a full list of the operators you can use in CONTAINS and ora:contains

Matching rules are defined by the policy <policy_owner>.<policy_name>. If policy_owner is absent, then the policy owner defaults to the current user. If both policy_name and policy_owner are absent, then the policy defaults to CTXSYS.DEFAULT_POLICY_ORACONTAINS.

Score

ora:contains is an XPath function that returns a number. It returns a positive number if the text_query matches the input_text. Otherwise it returns zero. ora:contains does not return a score.

Structure: Restricting the Scope of the Query

When you use ora:contains in an XPath expression, the scope is defined by input_text. This argument is evaluated in the current XPath context. If the result is a single text node or an attribute, then that node is the target of the ora:contains search. If input_text does not evaluate to a single text node or an attribute, an error is raised.

The policy determines the matching rules for ora:contains. The section group associated with the default policy for ora:contains is of type NULL_SECTION_GROUP.

ora:contains can be used anywhere in an XPath expression, and its input_text argument can be any XPath expression that evaluates to a single text node or an attribute.

Structure: Projecting the Result

If you want to return only a part of each XML document, then use extract to project a node sequence or extractValue to project the value of a node.

Example 9-40 returns the orderDate for each purchaseOrder that has some comment that contains the word "electric".

Example 9-40 ora:contains in existsNode Plus Extract

SELECT extract( DOC, '/purchaseOrder/@orderDate' ) "Order date"
  FROM PURCHASE_ORDERS_xmltype
  WHERE existsNode( DOC, 
           '/purchaseOrder/comment[ora:contains(text(), "lawn")>0]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"' ) = 1 ;

This example uses table PURCHASE_ORDERS_xmltype.

In Example 9-40 existsNode restricts the result to rows (documents) where the purchaseOrder includes some comment that contains the word "electric". extract then returns the PurchaseOrder attribute orderDate from those purchaseOrders. Note that if we extracted //comment we would get both comments from the sample document, not just the comment that matched the WHERE clause.

Policies

The CONTEXT index on a column determines the semantics of CONTAINS queries on that column. Because ora:contains does not rely on a supporting index, some other means must be found to provide many of the same choices when doing ora:contains queries. A policy is a collection of preferences that can be associated with an ora:contains query to give the same sort of semantic control as the indexing choices give to the CONTAINS user.

Introduction to Policies

When using CONTAINS, indexing preferences affect the semantics of the query. You create a preference, using the package CTX_DDL.CREATE_PREFERENCE (or CTX_DDL.CREATE_STOPLIST). You override default choices by setting attributes of the new preference, using the CTX_DDL.SET_ATTRIBUTE procedure. Then you use the preference in a CONTEXT index by including preference_type preference_name in the PARAMETERS string of CREATE INDEX.

See Also:

"The CONTEXT Index: Preferences"

Because ora:contains does not have a supporting index, a different mechanism is needed to apply preferences to a query. That mechanism is called a policy, consisting of a collection of preferences, and it is used as a parameter to ora:contains.

Policy Example: Supplied Stoplist

Example 9-41 creates a policy with an empty stopwords list.

Example 9-41 Create a Policy to Use with ora:contains

BEGIN
  Ctx_Ddl.Create_Policy (
    policy_name  =>  'my_nostopwords_policy',
    stoplist     =>  'ctxsys.EMPTY_STOPLIST'
    ) ;
END ;
/

For simplicity, this policy consists of an empty stoplist, which is owned by the user ctxsys. You could create a new stoplist to include in this policy, or you could reuse a stoplist (or lexer) definition that you created for a CONTEXT index.

Refer to this policy in any ora:contains to search for all words, including the most common ones (stopwords). Example 9-42 returns 0 comments, because "is" is a stopword by default and cannot be queried.

Example 9-42 Query on a Common Word with ora:contains

SELECT ID
  FROM PURCHASE_ORDERS_xmltype
  WHERE existsNode( DOC, 
           '/purchaseOrder/comment[ora:contains(text(), "is")>0]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"'
         ) = 1 ;

This example uses table PURCHASE_ORDERS_xmltype.

Example 9-43 uses the policy created in Example 9-41 to specify an empty stopword list. This query finds "is" and returns 1 comment.

Example 9-43 Query on a Common Word with ora:contains and Policy my_nostopwords_policy

SELECT ID
  FROM PURCHASE_ORDERS_xmltype
  WHERE existsNode( DOC, 
           '/purchaseOrder/comment[ora:contains(text(), 
           "is", "my_nostopwords_policy")>0]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"' ) = 1 ;

This example uses table PURCHASE_ORDERS_xmltype and policy my_nostopwords_policy.

Effect of Policies on ora:contains

The ora:contains policy affects the matching semantics of text_query. The ora:contains policy may include a lexer, stoplist, wordlist preference, or any combination of these. Other preferences that can be used to build a CONTEXT index are not applicable to ora:contains. The effects of the preferences are as follows:

The wordlist preference tweaks the semantics of the stem operator.
The stoplist preference defines which words are too common to be indexed (searchable).
The lexer preference defines how words are tokenized and matched. For example, it defines which characters count as part of a word and whether matching is case-sensitive.

See Also:

"Policy Example: Supplied Stoplist" for an example of building a policy with a predefined stoplist
"Policy Example: User-Defined Lexer" for an example of a case-sensitive policy

Policy Example: User-Defined Lexer

When you search for a document that contains a particular word, you usually want the search to be case-insensitive. If you do a search that is case-sensitive, then you will often miss some expected results. For example, if you search for purchaseOrders that contain the phrase "baby monitor", then you would not expect to miss our example document just because the phrase is written "Baby Monitor".

Full Text searches with ora:contains are case-insensitive by default. Section names and attribute names, however, are always case-sensitive.

If you want Full Text searches to be case-sensitive, then you need to make that choice when you create a policy. You can use this procedure:

Create a preference using the procedure CTX_DDL.CREATE_PREFERENCE (or CTX_DDL.CREATE_STOPLIST).
Override default choices in that preference object by setting attributes of the new preference, using the CTX_DDL.SET_ATTRIBUTE procedure.
Use the preference as a parameter to CTX_DDL.CREATE_POLICY.
Use the policy name as the third argument to ora:contains in a query.

Once you have created a preference, you can reuse it in other policies or in CONTEXT index definitions. You can use any policy with any ora:contains query.

Example 9-44 returns 1 row, because "HURRY" in text_query matches "Hurry" in the purchaseOrder with the default case-insensitive index.

Example 9-44 ora:contains, Default Case-Sensitivity

SELECT ID
  FROM PURCHASE_ORDERS_xmltype
  WHERE existsNode( DOC, 
           '/purchaseOrder/comment[ora:contains(text(), "HURRY")>0]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"'
         ) = 1 ;

This example uses table PURCHASE_ORDERS_xmltype.

Example 9-45 creates a new lexer preference my_lexer, with the attribute mixed_case set to TRUE. It also sets printjoin characters to "-" and "!" and ",". You can use the same preferences for building CONTEXT indexes and for building policies.

See Also:

Oracle Text Reference for a full list of lexer attributes

Example 9-45 Create a Preference for Mixed Case

BEGIN

  Ctx_Ddl.Create_Preference (
    preference_name  =>  'my_lexer',
    object_name      =>  'BASIC_LEXER'
    ) ;
    
  Ctx_Ddl.Set_Attribute (
    preference_name  =>  'my_lexer', 
    attribute_name   =>  'mixed_case', 
    attribute_value  =>  'TRUE'
    ) ;
    
  Ctx_Ddl.Set_Attribute (
    preference_name  =>  'my_lexer', 
    attribute_name   =>  'printjoins', 
    attribute_value  =>  '-,!'
    ) ;
END ;
/

Example 9-46 creates a new policy my_policy and specifies only the lexer. All other preferences are defaulted.

Example 9-46 Create a Policy with Mixed Case (Case-Insensitive)

BEGIN
  Ctx_Ddl.Create_Policy 
  (
    policy_name  => 'my_policy',
    lexer        => 'my_lexer'
 ) ;
  
END ;
/

This example uses preference-case-mixed.

Example 9-47 uses the new policy in a query. It returns no rows, because "HURRY" in text_query no longer matches "Hurry" in the purchaseOrder.

Example 9-47 ora:contains, Case-Sensitive (1)

SELECT ID
  FROM PURCHASE_ORDERS_xmltype
  WHERE existsNode( DOC, 
           '/purchaseOrder/comment[ora:contains(text(), 
           "HURRY", "my_policy")>0]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"'
         ) = 1 ;

This example uses table PURCHASE_ORDERS_xmltype.

Example 9-48 returns 1 row, because the text_query term "Hurry" exactly matches the word "Hurry" in the purchaseOrder.

Example 9-48 ora:contains, Case-Sensitive (2)

SELECT ID
  FROM PURCHASE_ORDERS_xmltype
  WHERE existsNode( DOC, 
           '/purchaseOrder/comment[ora:contains(text(), "is going wild")>0]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"'
         ) = 1 ;

This example uses table PURCHASE_ORDERS_xmltype.

Policy Defaults

The policy argument to ora:contains is optional. If it is omitted, then the query uses the default policy CTXSYS.DEFAULT_POLICY_ORACONTAINS.

When you create a policy for use with ora:contains, you do not need to specify every preference. In Example 9-46, for example, only the lexer preference was specified. For the preferences that are not specified, CREATE_POLICY uses the default preferences:

CTXSYS.DEFAULT_LEXER
CTXSYS.DEFAULT_STOPLIST
CTXSYS.DEFAULT_ WORDLIST

Creating a policy follows copy semantics for preferences and their attributes, just as creating a CONTEXT index follows copy semantics for index metadata.

ora:contains Performance

The ora:contains XPath function does not depend on a supporting index. ora:contains is very flexible. But if you use it to search across large amounts of data without an index, then it can also be resource-intensive. In this section we discuss how to get the best performance from queries that include XPath expressions with ora:contains.

Note:

Function-based indexes can also be very effective in speeding up XML queries, but they are not generally applicable to Text queries.

The examples in this section use table PURCHASE_ORDERS_xmltype_big. This has the same table structure and XML Schema as PURCHASE_ORDERS_xmltype, but it has around 1,000 rows. Each row has a unique ID (in the "id" column), and some different text in /purchaseOrder/items/item/comment. Where an execution plan is shown, it was produced using the SQL*Plus AUTOTRACE. Execution plans can also be produced using SQL trace and tkprof. A description of AUTOTRACE, SQL trace and tkprof is outside the scope of this chapter.

This section contains these topics:

Use a Primary Filter in the Query
Use a CTXXPath Index
Query-Rewrite and the CONTEXT Index

Use a Primary Filter in the Query

Because ora:contains is relatively expensive to process, Oracle recommends that you write queries that include a primary filter wherever possible. This will minimize the number of rows actually processed by ora:contains.

Example 9-49 examines every row in the table (does a full table scan), as we can see from the Plan in Example 9-50. In this example, ora:contains is evaluated for every row.

Example 9-49 ora:contains in existsNode, Big Table

SELECT ID
  FROM PURCHASE_ORDERS_xmltype_big
  WHERE existsNode( DOC, 
           '/purchaseOrder/items/item/comment[ora:contains(text(),
           "constitution")>0]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"'
         ) = 1 ;

Example 9-50 Explain Plan: existsNode

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE
1     0   TABLE ACCESS (FULL) OF 'PURCHASE_ORDERS_XMLTYPE_BIG' (TABLE)

If you create an index on the ID column, as shown in Example 9-51, and add a selective ID predicate to the query, as shown in Example 9-52, then it is apparent from Example 9-53 that Oracle will drive off the ID index. ora:contains will be executed only for the rows where the ID predicate is true (where ID is less than 5).

Example 9-51 B-Tree Index on ID

CREATE INDEX id_index 
  ON PURCHASE_ORDERS_xmltype_big( ID ) ;

This example uses table PURCHASE_ORDERS.

Example 9-52 ora:contains in existsNode, Mixed Query

SELECT ID
  FROM PURCHASE_ORDERS_xmltype_big
  WHERE existsNode( DOC, 
           '/purchaseOrder/items/item/comment[ora:contains(text(),
           "constitution")>0]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"'
         ) = 1
         AND id>5 ;

Example 9-53 Explain Plan: existsNode

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0    TABLE ACCESS (BY INDEX ROWSELECT ID) OF 'PURCHASE_ORDERS_XMLTYPE_BIG' (TABLE)
   2    1         INDEX (RANGE SCAN) OF 'SELECT ID_INDEX' (INDEX)

Use a CTXXPath Index

The CTXXPATH index can be used as a primary filter for existsNode. CTXXPATH is not related to ora:contains. CTXXPATH can be a primary filter for any existsNode query.

The CTXXPATH index stores enough information about a document to produce a superset of the results of an XPath expression. For an existsNode query it is often helpful to interrogate the CTXXPATH index and then apply existsNode to that superset, rather than applying existsNode to each document in turn.

Example 9-54 produces the execution plan shown in Example 9-55.

Example 9-54 ora:contains in existsNode, Big Table

SELECT ID
  FROM PURCHASE_ORDERS_xmltype_big
  WHERE existsNode( DOC, 
           '/purchaseOrder/items/item/comment[ora:contains(text(),
           "constitution")>0]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"'
         ) = 1 ;

Example 9-55 Explain Plan: existsNode

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0    TABLE ACCESS (FULL) OF 'PURCHASE_ORDERS_XMLTYPE_BIG' (TABLE)

Now create a CTXXPATH index on the DOC column, as shown in Example 9-56. You can create a CTXXPATH index and a CONTEXT index on the same column.

Example 9-56 Create a CTXXPATH Index on PURCHASE_ORDERS_xmltype_big(DOC)

CREATE INDEX doc_xpath_index 
  ON PURCHASE_ORDERS_xmltype_big( DOC )
  INDEXTYPE IS ctxsys.CTXXPATH ;

Run Example 9-54 again and you will see from the plan, shown in Example 9-57, that the query now uses the CTXXPATH index.

Example 9-57 Explain Plan: existsNode with CTXXPATH Index

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=2044)
   1    0     TABLE ACCESS (BY INDEX ROWSELECT ID) OF 'PURCHASE_ORDERS_XMLTYPE_BIG' (TABLE) 
                (Cost=2 Card=1 Bytes=2044)

   2    1          DOMAIN INDEX OF 'DOC_XPATH_INDEX' (INDEX (DOMAIN)) 
                     (Cost=0)

When to Use CTXXPATH

CTXXPATH processes only a part of the XPath expression, to give a guaranteed superset (a first-pass estimate) of the results of XPath evaluation.

CTXXPATH does not process:

Functions, including ora:contains
Range operators: <=, <, >=, >
'..', "|"
Attribute following '.', '*' or '//'
Predicate following '.' or '*'
'.' or '*' at the end of a path
Any node with unabbreviated XPath syntax

So in Example 9-54, the CTXXPATH index cannot return results for /purchaseOrder/items/item/comment[ora:contains(., "constitution")>0], because it cannot process the function ora:contains. But the CTXXPATH index can act as a primary filter by returning all documents that contain the path /purchaseOrder/items/item/comment. By calculating this superset of results, CTXXPATH can significantly reduce the number of documents considered by existsNode in this case.

There are two situations where a CTXXPATH index will give a significant performance boost:

If the document collection is heterogeneous, then knowing which documents contain the path (some purchaseOrder with some items child with some item child with some comment child) is enough to significantly reduce the documents considered by existsNode.
If many of the queries include XPath expressions with equality predicates rather than range predicates or functions (such as Example 9-58), then CTXXPATH will process those predicates and therefore will be a useful primary filter. CTXXPATH handles both string and number equality predicates.

Example 9-58 Equality Predicate in XPath, Big Table

SELECT count(*)
  FROM PURCHASE_ORDERS_xmltype_big
  WHERE existsNode( DOC, 
           '/purchaseOrder/items/item[USPrice=148.9500]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"'
         ) = 1 ;

If you are not sure that CTXXPATH will be useful, then create a CTXXPATH index and gather statistics on it, as shown in Example 9-59. With these statistics in place, the Oracle Cost Based Optimizer can make an informed choice about whether to use the CTXXPATH index or to ignore it.

Example 9-59 Gathering Index Statistics

BEGIN
      DBMS_STATS.GATHER_INDEX_STATS (
        ownname  =>  'test',
        indname  =>  'doc_xpath_index'
        ) ;  
END;
/

This example uses index-ctxxpath-1.

Maintaining the CTXXPATH Index

The CTXXPATH index, like the CONTEXT index, is asynchronous. When indexed data is changed, the CTXXPATH index might not change until you take some action, such as calling a procedure to synchronize the index. There are a number of ways to manage changes to the CTXXPATH index, including some options that are new for this release.

If the CTXXPATH index is not kept in synch with the data, then the index gradually becomes less efficient. The CTXXPATH index still calculates a superset of the true result, by adding all unsynchronized (unindexed) rows to the result set. So existsNode must process all the rows identified by the CTXXPATH index plus all unsynchronized (unindexed) rows.

The CTXXPATH index may get fragmented over time. A fragmented index uses more space and leads to slower queries. There are a number of ways to optimize (defragment) the CTXXPATH index, including some options that are new for this release.

See Also:

Oracle Text Reference for information on CTXXPATH index maintenance

Query-Rewrite and the CONTEXT Index

ora:contains does not rely on a supporting index. But under some circumstances an ora:contains may use an existing CONTEXT index for better performance.

Introducing Query-Rewrite

Oracle will, in some circumstances, rewrite a SQL/XML query into an object-relational query. This is done as part of query optimization and is transparent to the user. Two of the benefits of query-rewrite are:

The re-written query can directly access the underlying object-relational tables instead of processing the whole XML document.
The re-written query can make use of any available indexes.

Query-rewrite is a performance optimization. Query-rewrite only makes sense if the XML data is stored object-relationally, which in turn requires the XML data to be Schema-based.

See Also:

Chapter 1, " Introducing Oracle XML DB" for a full description of the query-rewrite process

From Documents to Nodes

Consider Example 9-60, a simple ora:contains query. To naively process the XPath expression in this query, each cell in the DOC column must be considered, and each cell must be tested to see if it matches /purchaseOrder/items/item/comment[ora:contains(text(), "electric")>0].

Example 9-60 ora:contains in existsNode

SELECT ID
  FROM PURCHASE_ORDERS_xmltype
  WHERE existsNode( DOC, 
           '/purchaseOrder/items/item/comment[ora:contains(text(),
           "electric")>0]', 
           'xmlns:ora="http://xmlns.oracle.com/xdb"'
         ) = 1 ;

This example uses table PURCHASE_ORDERS_xmltype.

But if DOC is schema-based, and the purchaseOrder documents are physically stored in object-relational tables, then it makes sense to go straight to the /purchaseOrder/items/item/comment column (if one exists) and test each cell there to see if it matches "electric".

This is the first query-rewrite step. If the first argument to ora:contains (text_input) maps to a single relational column, then ora:contains executes against that column. Even if there are no indexes involved, this can significantly improve query performance.

From ora:contains to CONTAINS

As noted in "From Documents to Nodes", Oracle may rewrite a query so that an XPath expression in existsNode may be resolved by applying ora:contains to some underlying column instead of applying the whole XPath to the whole XML document. In this section it will be shown how that query might make use of a CONTEXT index on the underlying column.

If you are running ora:contains against a text node or attribute that maps to a column with a CONTEXT index on it, why would you not use that index? One powerful reason is that a re-written query should give the same results as the original query. To ensure consistent results, the following conditions must be true before a CONTEXT index can be used.

First, the ora:contains target (input_text) must be either a single text node whose parent node maps to a column or an attribute that maps to a column. The column must be a single relational column (possibly in a nested table).

Second, as noted in "Policies", the indexing choices (for CONTAINS) and policy choices (for ora:contains) affect the semantics of queries. A simple mismatch might be that the index-based CONTAINS would do a case-sensitive search, while ora:contains specifies a case-insensitive search. To ensure that the ora:contains and the rewritten CONTAINS have the same semantics, the ora:contains policy must exactly match the index choices of the CONTEXT index.

Both the ora:contains policy and the CONTEXT index must also use the NULL_SECTION_GROUP section group type. The default section group for an ora:contains policy is ctxsys.NULL_SECTION_GROUP.

Third, the CONTEXT index is generally asynchronous. If you add a new document that contains the word "dog", but do not synchronize the CONTEXT index, then a CONTAINS query for "dog" will not return that document. But an ora:contains query against the same data will. To ensure that the ora:contains and the rewritten CONTAINS will always return the same results, the CONTEXT index must be built with the TRANSACTIONAL keyword in the PARAMETERS string (see [Oracle Text Reference]).

See Also:

Oracle Text Reference

Query-Rewrite: Summary

A query with existsNode, extract or extractValue, where the XPath includes ora:contains, may be considered for query-rewrite if:

The XML is schema-based
The first argument to ora:contains (text_input) is either a single text node whose parent node maps to a column, or an attribute that maps to a column. The column must be a single relational column (possibly in a nested table).

The rewritten query will use a CONTEXT index if:

There is a CONTEXT index on the column that the parent node (or attribute node) of text_input maps to.
The ora:contains policy exactly matches the index choices of the CONTEXT index.
The CONTEXT index was built with the TRANSACTIONAL keyword in the PARAMETERS string.

Query-rewrite can speed up queries significantly, especially if there is a suitable CONTEXT index.

Text Path BNF

HasPathArg           ::=    LocationPath
                         |  EqualityExpr  
InPathArg            ::=    LocationPath 
LocationPath         ::=    RelativeLocationPath
                         |  AbsoluteLocationPath 
AbsoluteLocationPath ::=    ("/" RelativeLocationPath)
                         |  ("//" RelativeLocationPath) 
RelativeLocationPath ::=    Step
                         |  (RelativeLocationPath "/" Step)
                         |  (RelativeLocationPath "//" Step) 
Step                 ::=    ("@" NCName)
                         |  NCName
                         |  (NCName Predicate)
                         |  Dot
                         |  "*" 
Predicate            ::=    ("[" OrExp "]")
                         |  ("[" Digit+ "]") 
OrExpr               ::=    AndExpr
                         |  (OrExpr "or" AndExpr) 
AndExpr              ::=    BooleanExpr
                         |  (AndExpr "and" BooleanExpr) 
BooleanExpr          ::=    RelativeLocationPath
                         |  EqualityExpr
                         |  ("(" OrExpr ")")
                         |  ("not" "(" OrExpr ")") 
EqualityExpr         ::=    (RelativeLocationPath "=" Literal)
                         |  (Literal "=" RelativeLocationPath)
                         |  (RelativeLocationPath "=" Literal)
                         |  (Literal "!=" RelativeLocationPath)
                         |  (RelativeLocationPath "=" Literal)
                         |  (Literal "!=" RelativeLocationPath) 
Literal              ::=    (DoubleQuote [~"]* DoubleQuote)
                         |  (SingleQuote [~']* SingleQuote) 
NCName               ::=    (Letter |  Underscore) NCNameChar* 
NCNameChar           ::=    Letter
                         |  Digit
                         |  Dot
                         |  Dash
                         |  Underscore 
Letter               ::=    ([a-z] | [A-Z]) 
Digit                ::=    [0-9] 
Dot                  ::=    "." 
Dash                 ::=    "-" 
Underscore           ::=    "_"

Example Support

This section contains these topics:

Purchase Order po001.xml
Create Table Statements
An XML Schema for the Sample Data

Purchase Order po001.xml

<?xml version="1.0" encoding="UTF-8"?>
<purchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation="xmlschema/po.xsd" orderDate="1999-10-20">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
        <zip>90952</zip>
    </shipTo>
    <billTo country="US">
        <name>Robert Smith</name>
        <street>8 Oak Avenue</street>
        <city>Old Town</city>
        <state>PA</state>
        <zip>95819</zip>
    </billTo>
    <comment>Hurry, my lawn is going wild!</comment>
    <items>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity>1</quantity>
            <USPrice>148.95</USPrice>
            <comment>Confirm this is electric</comment>
        </item>
        <item partNum="926-AA">
            <productName>Baby Monitor</productName>
            <quantity>1</quantity>
            <USPrice>39.98</USPrice>
            <shipDate>1999-05-21</shipDate>
        </item>
    </items>
</purchaseOrder>

Create Table Statements

Example 9-61 CREATE TABLE PURCHASE_ORDERS

CREATE TABLE PURCHASE_ORDERS (ID   NUMBER,
                              DOC  VARCHAR2(4000));
INSERT INTO PURCHASE_ORDERS  (ID,  DOC)
  VALUES (1,
          '<?xml version="1.0" encoding="UTF-8"?>
           <purchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xsi:noNamespaceSchemaLocation="xmlschema/po.xsd"
           orderDate="1999-10-20">
               <shipTo country="US">
                   <name>Alice Smith</name>
                   <street>123 Maple Street</street>
                   <city>Mill Valley</city>
                   <state>CA</state>
                   <zip>90952</zip>
               </shipTo>
               <billTo country="US">
                   <name>Robert Smith</name>
                   <street>8 Oak Avenue</street>
                   <city>Old Town</city>
                   <state>PA</state>
                   <zip>95819</zip>
               </billTo>
               <comment>Hurry, my lawn is going wild!</comment>
               <items>
                   <item partNum="872-AA">
                       <productName>Lawnmower</productName>
                       <quantity>1</quantity>
                       <USPrice>148.95</USPrice>
                       <comment>Confirm this is electric</comment>
                   </item>
                   <item partNum="926-AA">
                       <productName>Baby Monitor</productName>
                       <quantity>1</quantity>
                       <USPrice>39.98</USPrice>
                       <shipDate>1999-05-21</shipDate>
                   </item>
               </items>           </purchaseOrder>') ;COMMIT ;

Example 9-62 CREATE TABLE PURCHASE_ORDERS_xmltype

CREATE TABLE PURCHASE_ORDERS_xmltype (ID   NUMBER ,
                                      DOC  XMLType);
INSERT INTO PURCHASE_ORDERS_xmltype  (ID,  DOC)
  VALUES (1,
          XMLTYPE ('<?xml version="1.0" encoding="UTF-8"?>
                   <purchaseOrder
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:noNamespaceSchemaLocation="po.xsd"
                   orderDate="1999-10-20">
                       <shipTo country="US">
                           <name>Alice Smith</name>
                           <street>123 Maple Street</street>
                           <city>Mill Valley</city>
                           <state>CA</state>
                           <zip>90952</zip>
                       </shipTo>
                       <billTo country="US">
                           <name>Robert Smith</name>
                           <street>8 Oak Avenue</street>
                           <city>Old Town</city>
                           <state>PA</state>
                           <zip>95819</zip>
                       </billTo>
                       <comment>Hurry, my lawn is going wild!</comment>
                       <items>
                           <item partNum="872-AA">
                               <productName>Lawnmower</productName>
                               <quantity>1</quantity>
                               <USPrice>148.95</USPrice>
                               <comment>Confirm this is electric</comment>
                           </item>
                           <item partNum="926-AA">
                               <productName>Baby Monitor</productName>
                               <quantity>1</quantity>
                               <USPrice>39.98</USPrice>
                               <shipDate>1999-05-21</shipDate>
                           </item>
                       </items>
                   </purchaseOrder>'));
COMMIT ;

Example 9-63 CREATE TABLE PURCHASE_ORDERS_xmltype_table

CREATE TABLE PURCHASE_ORDERS_xmltype_table OF XMLType;
INSERT INTO PURCHASE_ORDERS_xmltype_table
  VALUES (
    XMLTYPE ('<?xml version="1.0" encoding="UTF-8"?>             <purchaseOrder 
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:noNamespaceSchemaLocation="xmlschema/po.xsd"
             orderDate="1999-10-20">
                 <shipTo country="US">
                     <name>Alice Smith</name>
                     <street>123 Maple Street</street>
                     <city>Mill Valley</city>
                     <state>CA</state>
                     <zip>90952</zip>
                 </shipTo>
                 <billTo country="US">
                     <name>Robert Smith</name>
                     <street>8 Oak Avenue</street>
                     <city>Old Town</city>
                     <state>PA</state>
                     <zip>95819</zip>
                 </billTo>
                 <comment>Hurry, my lawn is going wild!</comment>
                 <items>
                     <item partNum="872-AA">
                         <productName>Lawnmower</productName>
                         <quantity>1</quantity>
                         <USPrice>148.95</USPrice>
                         <comment>Confirm this is electric</comment>
                     </item>
                     <item partNum="926-AA">
                         <productName>Baby Monitor</productName>
                         <quantity>1</quantity>
                         <USPrice>39.98</USPrice>
                         <shipDate>1999-05-21</shipDate>
                     </item>
                 </items>
             </purchaseOrder>' ));
COMMIT ;

An XML Schema for the Sample Data

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <xsd:annotation>
      <xsd:documentation xml:lang="en">
       Purchase order schema for Example.com.
       Copyright 2000 Example.com. All rights reserved.
      </xsd:documentation>
   </xsd:annotation>
   <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
   <xsd:element name="comment" type="xsd:string"/>
   <xsd:complexType name="PurchaseOrderType">
      <xsd:sequence>
         <xsd:element name="shipTo" type="USAddress"/>
         <xsd:element name="billTo" type="USAddress"/>
         <xsd:element ref="comment" minOccurs="0"/>
         <xsd:element name="items" type="Items"/>
      </xsd:sequence>
      <xsd:attribute name="orderDate" type="xsd:date"/>
   </xsd:complexType>
   <xsd:complexType name="USAddress">
      <xsd:sequence>
         <xsd:element name="name" type="xsd:string"/>
         <xsd:element name="street" type="xsd:string"/>
         <xsd:element name="city" type="xsd:string"/>
         <xsd:element name="state" type="xsd:string"/>
         <xsd:element name="zip" type="xsd:decimal"/>
      </xsd:sequence>
      <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
   </xsd:complexType>
   <xsd:complexType name="Items">
      <xsd:sequence>
         <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
            <xsd:complexType>
               <xsd:sequence>
                  <xsd:element name="productName" type="xsd:string"/>
                  <xsd:element name="quantity">
                     <xsd:simpleType>
                        <xsd:restriction base="xsd:positiveInteger">
                           <xsd:maxExclusive value="100"/>
                        </xsd:restriction>
                     </xsd:simpleType>
                  </xsd:element>
                  <xsd:element name="USPrice" type="xsd:decimal"/>
                  <xsd:element ref="comment" minOccurs="0"/>
                  <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
               </xsd:sequence>
               <xsd:attribute name="partNum" type="SKU" use="required"/>
            </xsd:complexType>
         </xsd:element>
      </xsd:sequence>
   </xsd:complexType>
   <!-- Stock Keeping Unit, a code for identifying products -->
   <xsd:simpleType name="SKU">
      <xsd:restriction base="xsd:string">
         <xsd:pattern value="\d{3}-[A-Z]{2}"/>
      </xsd:restriction>
   </xsd:simpleType>
</xsd:schema>