Oracle® Text Reference 11g Release 1 (11.1) Part Number B28304-01 |
|
|
View PDF |
This chapter describes the CTX_DOC
PL/SQL package for requesting document services, such as highlighting extracted text or generating a list of themes for a document.
Many of these procedures exist in two versions: those that make use of indexes, and those that do not. Those that do not are called "policy-based" procedures. They are offered because there are times when you may want to use document services on a single document without creating a Context index in advance. Policy-based procedures enable you to do this.
The policy_* procedures mirror the conventional in-memory document services and are used with policy_name replacing index_ name, and document of type VARCHAR2
, CLOB
, BLOB
, or BFILE
replacing textkey. Thus, you need not create an index to obtain document services output with these procedures.
For the procedures that generate character offsets and lengths, such as HIGHLIGHT
and TOKENS
, Oracle Text follows USC-2 codepoint semantics.
The CTX_DOC
package includes the following procedures and functions:
Name | Description |
---|---|
FILTER |
Generates a plain text or HTML version of a document. |
GIST |
Generates a Gist or theme summaries for a document. |
HIGHLIGHT |
Generates plain text or HTML highlighting offset information for a document. |
IFILTER |
Generates a plain text version of binary data. Can be called from a USER_DATASTORE procedure. |
MARKUP |
Generates a plain text or HTML version of a document with query terms highlighted. |
PKENCODE |
Encodes a composite textkey string (value) for use in other CTX_DOC procedures. |
POLICY_FILTER |
Generates a plain text or HTML version of a document, without requiring an index. |
POLICY_GIST |
Generates a Gist or theme summaries for a document, without requiring an index. |
POLICY_HIGHLIGHT |
Generates plain text or HTML highlighting offset information for a document, without requiring an index. |
POLICY_MARKUP |
Generates a plain text or HTML version of a document with query terms highlighted, without requiring an index. |
POLICY_SNIPPET |
Generates a concordance for a document, based on query terms, without requiring an index. |
POLICY_THEMES |
Generates a list of themes for a document, without requiring an index. |
POLICY_TOKENS |
Generates all index tokens for a document, without requiring an index. |
SET_KEY_TYPE |
Sets CTX_DOC procedures to accept rowid or primary key document identifiers. |
SNIPPET |
Generates a concordance for a document, based on query terms, without requiring an index. |
THEMES |
Generates a list of themes for a document. |
TOKENS |
Generates all index tokens for a document. |
Use the CTX_DOC.FILTER
procedure to generate either a plain text or HTML version of a document. You can store the rendered document in either a result table or in memory. This procedure is generally called after a query, from which you identify the document to be filtered.
Note:
The resultant HTML document does not include graphics.Syntax 1: In-memory Result Storage
exec CTX_DOC.FILTER( index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN OUT NOCOPY CLOB, plaintext IN BOOLEAN DEFAULT FALSE); exec CTX_DOC.HIGHLIGHT_CLOB_QUERY( index_name IN VARCHAR2, textkey IN VARCHAR2, text_query IN CLOB, restab IN OUT NOCOPY HIGHLIGHT_TAB, plaintext IN BOOLEAN DEFAULT FALSE);
Syntax 2: Result Table Storage
exec CTX_DOC.FILTER( index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN VARCHAR2, query_id IN NUMBER DEFAULT 0, plaintext IN BOOLEAN DEFAULT FALSE); exec CTX_DOC.HIGHLIGHT_CLOB_QUERY( index_name IN VARCHAR2, textkey IN VARCHAR2, text_query IN CLOB, restab IN VARCHAR2, query_id IN NUMBER DEFAULT 0, plaintext IN BOOLEAN DEFAULT FALSE);
Specify the name of the index associated with the text column containing the document identified by textkey.
Specify the unique identifier (usually the primary key) for the document.
The textkey parameter can be as follows:
a single column primary key value
encoded specification for a composite (multiple column) primary key. Use CTX_DOC.PKENCODE
the rowid of the row containing the document
Toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE.
You can specify that this procedure store the marked-up text to either a table or to an in-memory CLOB
.
To store results to a table, specify the name of the table. The table to which you want to store results must exist before you make this call.
See Also:
"Filter Table" in Appendix A, "Oracle Text Result Tables" for more information about the structure of the filter result tableTo store results in memory, specify the name of the CLOB
locator. If restab is NULL
, then a temporary CLOB
is allocated and returned. You must de-allocate the locator after using it with DBMS_LOB.FREETEMPORARY()
.
If restab is not NULL
, then the CLOB
is truncated before the operation.
Specify an identifier to use to identify the row inserted into restab.
When query_id is not specified or set to NULL
, it defaults to 0. You must manually truncate the table specified in restab.
Specify TRUE
to generate a plaintext version of the document. Specify FALSE
to generate an HTML version of the document if you are using the AUTO_FILTER
filter or indexing HTML documents.
Example
The following code shows how to filter a document to HTML in memory.
declare mklob clob; amt number := 40; line varchar2(80); begin ctx_doc.filter('myindex','1', mklob, FALSE); -- mklob is NULL when passed-in, so ctx-doc.filter will allocate a temporary -- CLOB for us and place the results there. dbms_lob.read(mklob, amt, 1, line); dbms_output.put_line('FIRST 40 CHARS ARE:'||line); -- have to de-allocate the temp lob dbms_lob.freetemporary(mklob); end;
Create the filter result table to store the filtered document as follows:
create table filtertab (query_id number, document clob);
To obtain a plaintext version of document with textkey 20, enter the following statement:
begin ctx_doc.filter('newsindex', '20', 'filtertab', '0', TRUE); end;
Use the CTX_DOC.GIST
procedure to generate gist and theme summaries for a document. You can generate paragraph-level or sentence-level gists or theme summaries.
Note:
CTX_DOC.GIST
requires an installed knowledge base. A knowledge base may or may not have been installed with Oracle Text. For more information on knowledge bases, see the Oracle Text Application Developer's Guide.Syntax 1: In-Memory Storage
CTX_DOC.GIST(
index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN OUT CLOB, glevel IN VARCHAR2 DEFAULT 'P', pov IN VARCHAR2 DEFAULT 'GENERIC', numParagraphs IN NUMBER DEFAULT 16, maxPercent IN NUMBER DEFAULT 10, num_themes IN NUMBER DEFAULT 50);
Syntax 2: Result Table Storage
CTX_DOC.GIST(
index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN VARCHAR2, query_id IN NUMBER DEFAULT 0, glevel IN VARCHAR2 DEFAULT 'P', pov IN VARCHAR2 DEFAULT NULL, numParagraphs IN NUMBER DEFAULT 16, maxPercent IN NUMBER DEFAULT 10, num_themes IN NUMBER DEFAULT 50);
Specify the name of the index associated with the text column containing the document identified by textkey.
Specify the unique identifier (usually the primary key) for the document.
The textkey parameter can as follows:
a single column primary key value
an encoded specification for a composite (multiple column) primary key. To encode a composite textkey, use the CTX_DOC.PKENCODE
procedure
the rowid of the row containing the document
Toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE
.
Specify that this procedure store the gist and theme summaries to either a table or to an in-memory CLOB
.
To store results to a table specify the name of an existing table.
To store results in memory, specify the name of the CLOB
locator. If restab is NULL
, then a temporary CLOB
is allocated and returned. You must de-allocate the locator after using it.
If restab
is not NULL
, then the CLOB
is truncated before the operation.
Specify an identifier to use to identify the row(s) inserted into restab.
Specify the type of gist or theme summary to produce. The possible values are:
P for paragraph
S for sentence
The default is P.
Specify whether a gist or a single theme summary is generated. The type of gist or theme summary generated (sentence-level or paragraph-level) depends on the value specified for glevel
.
To generate a gist for the entire document, specify a value of 'GENERIC
' for pov
. To generate a theme summary for a single theme in a document, specify the theme as the value for pov
.
When using result table storage, if you do not specify a value for pov
, then this procedure returns the generic gist plus up to 50 theme summaries for the document.
When using in-memory result storage to a CLOB
, you must specify a pov
. However, if you do not specify a pov
, then this procedure generates only a generic gist for the document.
Note:
Thepov
parameter is case sensitive. To return a gist for a document, specify 'GENERIC
' in all uppercase. To return a theme summary, specify the theme exactly as it is generated for the document.
Only the themes generated byTHEMESfor a document can be used as input for pov
.
Specify the maximum number of document paragraphs (or sentences) selected for the document gist or theme summaries. The default is 16.
Note:
ThenumParagraphs
parameter is used only when this parameter yields a smaller gist or theme summary size than the gist or theme summary size yielded by the maxPercent
parameter.
This means that the system always returns the smallest size gist or theme summary.
Specify the maximum number of document paragraphs (or sentences) selected for the document gist or theme summaries as a percentage of the total paragraphs (or sentences) in the document. The default is 10.
Note:
ThemaxPercent
parameter is used only when this parameter yields a smaller gist or theme summary size than the gist or theme summary size yielded by the numParagraphs
parameter.
This means that the system always returns the smallest size gist or theme summary.
Specify the number of theme summaries to produce when you do not specify a value for pov
. For example, if you specify 10, this procedure returns the top 10 theme summaries. The default is 50.
If you specify 0 or NULL, then this procedure returns all themes in a document. If the document contains more than 50 themes, only the top 50 themes show conceptual hierarchy.
Examples
In-Memory Gist
The following example generates a non-default size generic gist of at most 10 paragraphs. The result is stored in memory in a CLOB
locator. The code then de-allocates the returned CLOB
locator after using it.
set serveroutput on; declare gklob clob; amt number := 40; line varchar2(80); begin ctx_doc.gist('newsindex','34',gklob, pov => 'GENERIC',numParagraphs => 10); -- gklob is NULL when passed-in, so ctx-doc.gist will allocate a temporary -- CLOB for us and place the results there. dbms_lob.read(gklob, amt, 1, line); dbms_output.put_line('FIRST 40 CHARS ARE:'||line); -- have to de-allocate the temp lob dbms_lob.freetemporary(gklob); end;
Result Table Gists
The following example creates a gist table called CTX_GIST
:
create table CTX_GIST (query_id number, pov varchar2(80), gist CLOB);
The following example returns a default sized paragraph-level gist for document 34 as well as the top 10 theme summaries in the document:
begin ctx_doc.gist('newsindex','34','CTX_GIST', 1, num_themes=>10); end;
The following example generates a non-default size gist of at most 10 paragraphs:
begin ctx_doc.gist('newsindex','34','CTX_GIST',1,pov =>'GENERIC',numParagraphs=>10); end;
The following example generates a gist whose number of paragraphs is at most 10 percent of the total paragraphs in document:
begin ctx_doc.gist('newsindex','34','CTX_GIST',1,pov => 'GENERIC', maxPercent => 10); end;
Theme Summary
The following example returns a paragraph-level theme summary for insects for document 34. The default theme summary size is returned.
begin ctx_doc.gist('newsindex','34','CTX_GIST',1, pov => 'insects'); end;
Use the CTX_DOC.HIGHLIGHT
procedure to generate highlight offsets for a document. The offset information is generated for the terms in the document that satisfy the query you specify. These highlighted terms are either the words that satisfy a word query or the themes that satisfy an ABOUT
query.
You can generate highlight offsets for either plaintext or HTML versions of the document. The table returned by CTX_DOC.HIGHLIGHT
does not include any graphics found in the original document. Apply the offset information to the same documents filtered with CTX_DOC.FILTER.
You usually call this procedure after a query, from which you identify the document to be processed.
You can store the highlight offsets to either an in-memory PL/SQL table or a result table.
See CTX_DOC.POLICY_HIGHLIGHT for a version of this procedure that does not require an index.
Syntax 1: In-Memory Result Storage
exec CTX_DOC.HIGHLIGHT( index_name IN VARCHAR2, textkey IN VARCHAR2, text_query IN VARCHAR2, restab IN OUT NOCOPY HIGHLIGHT_TAB, plaintext IN BOOLEAN DEFAULT FALSE); exec CTX_DOC.HIGHLIGHT_CLOB_QUERY( index_name IN VARCHAR2, textkey IN VARCHAR2, text_query IN CLOB, restab IN OUT NOCOPY HIGHLIGHT_TAB, plaintext IN BOOLEAN DEFAULT FALSE);
Syntax 2: Result Table Storage
exec CTX_DOC.HIGHLIGHT( index_name IN VARCHAR2, textkey IN VARCHAR2, text_query IN VARCHAR2, restab IN VARCHAR2, query_id IN NUMBER DEFAULT 0, plaintext IN BOOLEAN DEFAULT FALSE); exec CTX_DOC.HIGHLIGHT_CLOB_QUERY( index_name IN VARCHAR2, textkey IN VARCHAR2, text_query IN CLOB, restab IN VARCHAR2, query_id IN NUMBER DEFAULT 0, plaintext IN BOOLEAN DEFAULT FALSE);
Specify the name of the index associated with the text column containing the document identified by textkey.
Specify the unique identifier (usually the primary key) for the document.
The textkey parameter can be as follows:
a single column primary key value
encoded specification for a composite (multiple column) primary key. Use the CTX_DOC.PKENCODE procedure.
the rowid of the row containing the document
Toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE.
Specify the original query expression used to retrieve the document. If NULL, no highlights are generated.
If text_query
includes wildcards, stemming, fuzzy matching which result in stopwords being returned, HIGHLIGHT
does not highlight the stopwords.
If text_query
contains the threshold operator, the operator is ignored. The HIGHLIGHT
procedure always returns highlight information for the entire result set.
You can specify that this procedure store highlight offsets to either a table or to an in-memory PL/SQL table.
To store results to a table specify the name of the table. The table must exist before you call this procedure.
See Also:
see "Highlight Table" in Appendix A, "Oracle Text Result Tables" for more information about the structure of the highlight result table.To store results to an in-memory table, specify the name of the in-memory table of type CTX_DOC.HIGHLIGHT_TAB
. The HIGHLIGHT_TAB
datatype is defined as follows:
type highlight_rec is record ( offset number, length number ); type highlight_tab is table of highlight_rec index by binary_integer;
CTX_DOC.HIGHLIGHT
clears HIGHLIGHT_TAB
before the operation.
Specify the identifier used to identify the row inserted into restab.
When query_id is not specified or set to NULL
, it defaults to 0. You must manually truncate the table specified in restab.
Specify TRUE
to generate a plaintext offsets of the document.
Specify FALSE
to generate HTML offsets of the document if you are using the AUTO_FILTER
filter or indexing HTML documents.
Examples
Create the highlight table to store the highlight offset information:
create table hightab(query_id number, offset number, length number);
Word Highlight Offsets
To obtain HTML highlight offset information for document 20 for the word dog:
begin ctx_doc.highlight('newsindex', '20', 'dog', 'hightab', 0, FALSE); end;
Assuming the index newsindex has a theme component, obtain HTML highlight offset information for the theme query of politics by issuing the following query:
begin ctx_doc.highlight('newsindex', '20', 'about(politics)', 'hightab', 0, FALSE); end;
The output for this statement are the offsets to highlighted words and phrases that represent the theme of politics in the document.
Restrictions
CTX_DOC.HIGHLIGHT
does not support the use of query templates.
Related Topics
Use this procedure to filter binary data to text.
This procedure takes binary data (BLOB IN
), filters the data with the AUTO_FILTER
filter, and writes the text version to a CLOB
. (Any graphics in the original document are ignored.) CTX_DOC.IFILTER
employs the safe callout, and it does not require an index, as CTX_DOC.FILTER
does.
Note:
This procedure will not be supported in future releases. Applications should use CTX_DOC.POLICY_FILTER instead.Requirements
Because CTX_DOC.IFILTER
employs the safe callout mechanism, the SQL*Net listener must be running and configured for extproc
agent startup.
Syntax
CTX_DOC.IFILTER(data IN BLOB, text IN OUT NOCOPY CLOB);
Specify the binary data to be filtered.
Specify the destination CLOB
. The filtered data is placed in here. This parameter must be a valid CLOB
locator that is writable. Passing NULL
or a non-writable CLOB
will result in an error. Filtered text will be appended to the end of existing content, if any.
Example
The document text used in a MATCHES
query can be VARCHAR2
or CLOB
. It does not accept BLOB
input, so you cannot match filtered documents directly. Instead, you must filter the binary content to CLOB
using the AUTO_FILTER
filter. Assuming the document data is in bind variable :doc_blob
:
declare doc_text clob; begin -- create a temporary CLOB to hold the document text doc_text := dbms_lob.createtemporary(doc_text, TRUE, DBMS_LOB.SESSION); -- call ctx_doc.ifilter to filter the BLOB to CLOB data ctx_doc.ifilter(:doc_blob, doc_text); -- now do the matches query using the CLOB version for c1 in (select * from queries where matches(query_string, doc_text)>0) loop -- do what you need to do here end loop; dbms_lob.freetemporary(doc_text); end;
The CTX_DOC.MARKUP
procedure takes a query specification and a document textkey and returns a version of the document in which the query terms are marked up. These marked-up terms are either the words that satisfy a word query or the themes that satisfy an ABOUT
query.
You can set the marked-up output to be either plaintext or HTML. The marked-up document returned by CTX_DOC.MARKUP
does not include any graphics found in the original document.
You can use one of the pre-defined tag sets for marking highlighted terms, including a tag sequence that enables HTML navigation.
You usually call CTX_DOC.MARKUP
after a query, from which you identify the document to be processed.
You can store the marked-up document either in memory or in a result table.
See CTX_DOC.POLICY_MARKUP for a version of this procedure that does not require an index.
Note:
Oracle Text does not guarantee well-formed output fromCTX.DOC.MARKUP
, especially for terms that are already marked up with HTML or XML. In particular, unexpected nesting of markup tags may occasionally result.Syntax 1: In-Memory Result Storage
exec CTX_DOC.MARKUP(
index_name IN VARCHAR2, textkey IN VARCHAR2, text_query IN VARCHAR2, restab IN OUT NOCOPY CLOB, plaintext IN BOOLEAN DEFAULT FALSE, tagset IN VARCHAR2 DEFAULT 'TEXT_DEFAULT', starttag IN VARCHAR2 DEFAULT NULL, endtag IN VARCHAR2 DEFAULT NULL, prevtag IN VARCHAR2 DEFAULT NULL, nexttag IN VARCHAR2 DEFAULT NULL); exec CTX_DOC.MARKUP_CLOB_QUERY( index_name IN VARCHAR2, textkey IN VARCHAR2, text_query IN CLOB, restab IN OUT NOCOPY CLOB, plaintext IN BOOLEAN DEFAULT FALSE, tagset IN VARCHAR2 DEFAULT 'TEXT_DEFAULT', starttag IN VARCHAR2 DEFAULT NULL, endtag IN VARCHAR2 DEFAULT NULL, prevtag IN VARCHAR2 DEFAULT NULL, nexttag IN VARCHAR2 DEFAULT NULL);
Syntax 2: Result Table Storage
exec CTX_DOC.MARKUP(
index_name IN VARCHAR2, textkey IN VARCHAR2, text_query IN VARCHAR2, restab IN VARCHAR2, query_id IN NUMBER DEFAULT 0, plaintext IN BOOLEAN DEFAULT FALSE, tagset IN VARCHAR2 DEFAULT 'TEXT_DEFAULT', starttag IN VARCHAR2 DEFAULT NULL, endtag IN VARCHAR2 DEFAULT NULL, prevtag IN VARCHAR2 DEFAULT NULL, nexttag IN VARCHAR2 DEFAULT NULL); exec CTX_DOC.MARKUP_CLOB_QUERY( index_name IN VARCHAR2, textkey IN CLOB, text_query IN VARCHAR2, restab IN VARCHAR2, query_id IN NUMBER DEFAULT 0, plaintext IN BOOLEAN DEFAULT FALSE, tagset IN VARCHAR2 DEFAULT 'TEXT_DEFAULT', starttag IN VARCHAR2 DEFAULT NULL, endtag IN VARCHAR2 DEFAULT NULL, prevtag IN VARCHAR2 DEFAULT NULL, nexttag IN VARCHAR2 DEFAULT NULL);
Specify the name of the index associated with the text column containing the document identified by textkey.
Specify the unique identifier (usually the primary key) for the document.
The textkey parameter can be as follows:
A single column primary key value
Encoded specification for a composite (multiple column) primary key. Use the CTX_DOC.PKENCODE procedure.
The rowid of the row containing the document
Toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE.
Specify the original query expression used to retrieve the document.
If text_query includes wildcards, stemming, fuzzy matching which result in stopwords being returned, MARKUP
does not highlight the stopwords.
If text_query contains the threshold operator, the operator is ignored. The MARKUP
procedure always returns highlight information for the entire result set.
You can specify that this procedure store the marked-up text to either a table or to an in-memory CLOB
.
To store results to a table specify the name of the table. The result table must exist before you call this procedure.
See Also:
For more information about the structure of the markup result table, see "Markup Table" in Appendix A, "Oracle Text Result Tables".To store results in memory, specify the name of the CLOB locator. If restab is NULL
, a temporary CLOB is allocated and returned. You must de-allocate the locator after using it.
If restab is not NULL
, the CLOB is truncated before the operation.
Specify the identifier used to identify the row inserted into restab.
When query_id is not specified or set to NULL
, it defaults to 0. You must manually truncate the table specified in restab.
Specify TRUE
to generate plaintext marked-up document. Specify FALSE
to generate a marked-up HTML version of document if you are using the AUTO_FILTER
filter or indexing HTML documents.
Specify one of the following pre-defined tag sets. The second and third columns show how the four different tags are defined for each tagset
:
Tagset | Tag | Tag Value |
---|---|---|
TEXT_DEFAULT |
starttag | <<< |
endtag | >>> |
|
prevtag | ||
nexttag | ||
HTML_DEFAULT |
starttag | <B> |
endtag | </B> |
|
prevtag | ||
nexttag | ||
HTML_NAVIGATE |
starttag | <A NAME=ctx%CURNUM><B> |
endtag | </B></A> |
|
prevtag | <A HREF=#ctx%PREVNUM><</A> |
|
nexttag | <A HREF=#ctx%NEXTNUM>></A> |
Specify the character(s) inserted by MARKUP
to indicate the start of a highlighted term.
The sequence of starttag, endtag, prevtag and nexttag with respect to the highlighted word is as follows:
... prevtag starttag word endtag nexttag...
Specify the character(s) inserted by MARKUP
to indicate the end of a highlighted term.
Specify the markup sequence that defines the tag that navigates the user to the previous highlight.
In the markup sequences prevtag and nexttag, you can specify the following offset variables which are set dynamically:
Offset Variable | Value |
---|---|
%CURNUM |
the current offset number |
%PREVNUM |
the previous offset number |
%NEXTNUM |
the next offset number |
See the description of the HTML_NAVIGATE
tag set for an example.
Specify the markup sequence that defines the tag that navigates the user to the next highlight tag.
Within the markup sequence, you can use the same offset variables you use for prevtag. See the explanation for prevtag and the HTML_NAVIGATE
tag set for an example.
Examples
In-Memory Markup
The following code takes document (the dog chases the cat), performs the assigned markup on it, and stores the result in memory.
set serveroutput on drop table mark_tab; create table mark_tab (id number primary key, text varchar2(80) ); insert into mark_tab values ('1', 'The dog chases the cat.'); create index mark_tab_idx on mark_tab(text) indextype is ctxsys.context parameters ('filter ctxsys.null_filter'); declare mklob clob; amt number := 40; line varchar2(80); begin ctx_doc.markup('mark_tab_idx','1','dog AND cat', mklob); -- mklob is NULL when passed-in, so ctx_doc.markup will -- allocate a temporary CLOB for us and place the results there. dbms_lob.read(mklob, amt, 1, line); dbms_output.put_line('FIRST 40 CHARS ARE:'||line); -- have to de-allocate the temp lob dbms_lob.freetemporary(mklob); end; /
The output from this example shows what the marked-up document looks like:
FIRST 40 CHARS ARE: The <<<dog>>> chases the <<<cat>>>.
Markup Table
Create the highlight markup table to store the marked-up document as follows:
create table markuptab (query_id number, document clob);
You can also store your MARKUP results in a table. To create HTML highlight markup for the words dog or cat for document 23, enter the following statement:
begin ctx_doc.markup(index_name => 'my_index', textkey => '23', text_query => 'dog|cat', restab => 'markuptab', query_id => '1', tagset => 'HTML_DEFAULT'); end;
To create HTML highlight markup for the theme of politics for document 23, enter the following statement:
begin ctx_doc.markup(index_name => 'my_index', textkey => '23', text_query => 'about(politics)', restab => 'markuptab', query_id => '1', tagset => 'HTML_DEFAULT'); end;
Restrictions
CTX_DOC.MARKUP
does not support the use of query templates.
Related Topics
The CTX_DOC.PKENCODE
function converts a composite textkey list into a single string and returns the string.
The string created by PKENCODE
can be used as the primary key parameter textkey in other CTX_DOC
procedures, such as CTX_DOC.THEMES and CTX_DOC.GIST.
Syntax
CTX_DOC.PKENCODE( pk1 IN VARCHAR2, pk2 IN VARCHAR2 DEFAULT NULL, pk4 IN VARCHAR2 DEFAULT NULL, pk5 IN VARCHAR2 DEFAULT NULL, pk6 IN VARCHAR2 DEFAULT NULL, pk7 IN VARCHAR2 DEFAULT NULL, pk8 IN VARCHAR2 DEFAULT NULL, pk9 IN VARCHAR2 DEFAULT NULL, pk10 IN VARCHAR2 DEFAULT NULL, pk11 IN VARCHAR2 DEFAULT NULL, pk12 IN VARCHAR2 DEFAULT NULL, pk13 IN VARCHAR2 DEFAULT NULL, pk14 IN VARCHAR2 DEFAULT NULL, pk15 IN VARCHAR2 DEFAULT NULL, pk16 IN VARCHAR2 DEFAULT NULL) RETURN VARCHAR2;
Each PK argument specifies a column element in the composite textkey list. You can encode at most 16 column elements.
Returns
String that represents the encoded value of the composite textkey.
Example
begin ctx_doc.gist('newsindex',CTX_DOC.PKENCODE('smith', 14), 'CTX_GIST'); end;
In this example, smith and 14 constitute the composite textkey value for the document.
Generates a plain text or an HTML version of a document. With this procedure, no CONTEXT
index is required.
This procedure uses a trusted callout.
Syntax
ctx_doc.policy_filter(policy_name in VARCHAR2, document in [VARCHAR2|CLOB|BLOB|BFILE], restab in out nocopy CLOB, plaintext in BOOLEAN default FALSE, language in VARCHAR2 default NULL, format in VARCHAR2 default NULL, charset in VARCHAR2 default NULL);
Specify the policy name created with CTX_DDL.CREATE_POLICY.
Specify the document to filter.
Specify the name of the CLOB locator.
Specify TRUE
to generate a plaintext version of the document. Specify FALSE
to generate an HTML version of the document if you are using the AUTO_FILTER
filter or indexing HTML documents.
Specify the language of the document. Use an Oracle Text supported language value as you would in the language column of the base table. See BASIC_LEXER in Chapter 2, "Oracle Text Indexing Elements".
Specify the format of the document. Use an Oracle Text supported format value, either TEXT, BINARY or IGNORE as you would specify in the format column of the base table. For more information, see the format column description in CREATE INDEX in Chapter 1, "Oracle Text SQL Statements and Operators".
Specify the character set of the document. Use an Oracle Text supported value as you would specify in the charset column of the base table. See Indexing Mixed-Character Set Columns in Chapter 2, "Oracle Text Indexing Elements".
Generates a Gist or theme summary for document.You can generate paragraph-level or sentence-level gists or theme summaries. With this procedure, no CONTEXT
index is required.
Note:
CTX_DOC.POLICY_GIST
requires an installed knowledge base. A knowledge base may or may not have been installed with Oracle Text. For more information on knowledge bases, see the Oracle Text Application Developer's Guide.Syntax
ctx_doc.policy_gist(policy_name in VARCHAR2, document in [VARCHAR2|CLOB|BLOB|BFILE], restab in out nocopy CLOB, glevel in VARCHAR2 default 'P', pov in VARCHAR2 default 'GENERIC', numParagraphs in VARCHAR2 default NULL, maxPercent in NUMBER default NULL, num_themes in NUMBER default 50 language in VARCHAR2 default NULL, format in VARCHAR2 default NULL, charset in VARCHAR2 default NULL );
Specify the policy name created with CTX_DDL.CREATE_POLICY.
Specify the document for which to generate the Gist or theme summary.
Specify the name of the CLOB locator.
Specify the type of gist or theme summary to produce. The possible values are:
P for paragraph
S for sentence
The default is P.
Specify whether a gist or a single theme summary is generated. The type of gist or theme summary generated (sentence-level or paragraph-level) depends on the value specified for glevel
.
To generate a gist for the entire document, specify a value of 'GENERIC' for pov
. To generate a theme summary for a single theme in a document, specify the theme as the value for pov
.
When using result table storage and you do not specify a value for pov
, this procedure returns the generic gist plus up to 50 theme summaries for the document.
Note:
Thepov
parameter is case sensitive. To return a gist for a document, specify 'GENERIC
' in all uppercase. To return a theme summary, specify the theme exactly as it is generated for the document.
Only the themes generated by THEMES for a document can be used as input for pov
.
Specify the maximum number of document paragraphs (or sentences) selected for the document gist or theme summaries. The default is 16.
Note:
ThenumParagraphs
parameter is used only when this parameter yields a smaller gist or theme summary size than the gist or theme summary size yielded by the maxPercent
parameter.
This means that the system always returns the smallest size gist or theme summary.
Specify the maximum number of document paragraphs (or sentences) selected for the document gist or theme summaries as a percentage of the total paragraphs (or sentences) in the document. The default is 10.
Note:
ThemaxPercent
parameter is used only when this parameter yields a smaller gist or theme summary size than the gist or theme summary size yielded by the numParagraphs
parameter.
This means that the system always returns the smallest size gist or theme summary.
Specify the number of theme summaries to produce when you do not specify a value for pov
. For example, if you specify 10, this procedure returns the top 10 theme summaries. The default is 50.
If you specify 0 or NULL, this procedure returns all themes in a document. If the document contains more than 50 themes, only the top 50 themes show conceptual hierarchy.
Specify the language of the document. Use an Oracle Text supported language value as you would in the language column of the base table. See MULTI_LEXER.
Specify the format of the document. Use an Oracle Text supported format value, either TEXT, BINARY or IGNORE as you would specify in the format column of the base table. For more information, see the format column description in CREATE INDEX.
Specify the character set of the document. Use an Oracle Text supported value as you would specify in the charset column of the base table.
Generates plain text or HTML highlighting offset information for a document.With this procedure, no CONTEXT
index is required.
The offset information is generated for the terms in the document that satisfy the query you specify. These highlighted terms are either the words that satisfy a word query or the themes that satisfy an ABOUT
query.
You can generate highlight offsets for either plaintext or HTML versions of the document. You can apply the offset information to the same documents filtered with CTX_DOC.FILTER.
Syntax
exec ctx_doc.policy_highlight( policy_name in VARCHAR2, document in [VARCHAR2|CLOB|BLOB|BFILE], text_query in VARCHAR2, restab in out nocopy highlight_tab, plaintext in boolean FALSE language in VARCHAR2 default NULL, format in VARCHAR2 default NULL, charset in VARCHAR2 default NULL ); exec ctx_doc.policy_highlight_clob_query( policy_name in VARCHAR2, document in [VARCHAR2|CLOB|BLOB|BFILE], text_query in CLOB, restab in out nocopy highlight_tab, plaintext in boolean FALSE language in VARCHAR2 default NULL, format in VARCHAR2 default NULL, charset in VARCHAR2 default NULL );
Specify the policy name created with CTX_DDL.CREATE_POLICY.
Specify the document to generate highlighting offset information.
Specify the original query expression used to retrieve the document. If NULL, no highlights are generated.
If text_query
includes wildcards, stemming, or fuzzy matching which result in stopwords being returned, this procedure does not highlight the stopwords.
If text_query
contains the threshold operator, the operator is ignored. This procedure always returns highlight information for the entire result set.
Specify the name of the highlight_tab
PL/SQL index-by-table type.
Specify TRUE
to generate a plaintext offsets of the document.
Specify FALSE
to generate HTML offsets of the document if you are using the AUTO_FILTER
filter or indexing HTML documents.
Specify the language of the document. Use an Oracle Text supported language value as you would in the language column of the base table. See MULTI_LEXER in Chapter 2, "Oracle Text Indexing Elements".
Specify the format of the document. Use an Oracle Text supported format value, either TEXT, BINARY or IGNORE as you would specify in the format column of the base table. For more information, see the format column description under CREATE INDEX.
Specify the character set of the document. Use an Oracle Text supported value as you would specify in the charset column of the base table.
Restrictions
CTX_DOC.POLICY_HIGHLIGHT
does not support the use of query templates.
Generates plain text or HTML version of a document with query terms highlighted. With this procedure, no CONTEXT
index is required.
The CTX_DOC.POLICY_MARKUP
procedure takes a query specification and a document and returns a version of the document in which the query terms are marked up. These marked-up terms are either the words that satisfy a word query or the themes that satisfy an ABOUT
query.
You can set the marked-up output to be either plaintext or HTML.
You can use one of the pre-defined tag sets for marking highlighted terms, including a tag sequence that enables HTML navigation.
Syntax
ctx_doc.policy_markup(policy_name in VARCHAR2, document in [VARCHAR2|CLOB|BLOB|BFILE], text_query in VARCHAR2, restab in out nocopy CLOB, plaintext in BOOLEAN default FALSE, tagset in VARCHAR2 default 'TEXT_DEFAULT', starttag in VARCHAR2 default NULL, endtag in VARCHAR2 default NULL, prevtag in VARCHAR2 default NULL, nexttag in VARCHAR2 default NULL language in VARCHAR2 default NULL, format in VARCHAR2 default NULL, charset in VARCHAR2 default NULL ); ctx_doc.policy_markup_clob_query( policy_name in VARCHAR2, document in [VARCHAR2|CLOB|BLOB|BFILE], text_query in CLOB, restab in out nocopy CLOB, plaintext in BOOLEAN default FALSE, tagset in VARCHAR2 default 'TEXT_DEFAULT', starttag in VARCHAR2 default NULL, endtag in VARCHAR2 default NULL, prevtag in VARCHAR2 default NULL, nexttag in VARCHAR2 default NULL language in VARCHAR2 default NULL, format in VARCHAR2 default NULL, charset in VARCHAR2 default NULL );
Specify the policy name created with CTX_DDL.CREATE_POLICY.
Specify the document to generate highlighting offset information.
Specify the original query expression used to retrieve the document.
If text_query
includes a NULL, then this procedure will fail and generate errors.
If text_query
includes wildcards, stemming, or fuzzy matching which result in stopwords being returned, then this procedure does not highlight the stopwords.
If text_query
contains the threshold operator, the operator is ignored. This procedure always returns highlight information for the entire result set.
Specify the name of the CLOB
locator.
Specify TRUE
to generate a plaintext marked-up document. Specify FALSE
to generate a marked-up HTML version of the document if you are using the AUTO_FILTER
filter or indexing HTML documents.
Specify one of the following pre-defined tag sets. The second and third columns show how the four different tags are defined for each tagset:
Tagset | Tag | Tag Value |
---|---|---|
TEXT_DEFAULT |
starttag | <<< |
endtag | >>> |
|
prevtag | ||
nexttag | ||
HTML_DEFAULT |
starttag | <B> |
endtag | </B> |
|
prevtag | ||
nexttag | ||
HTML_NAVIGATE |
starttag | <A NAME=ctx%CURNUM><B> |
endtag | </B></A> |
|
prevtag | <A HREF=#ctx%PREVNUM><</A> |
|
nexttag | <A HREF=#ctx%NEXTNUM>></A> |
Specify the character(s) inserted by MARKUP
to indicate the start of a highlighted term.
The sequence of starttag, endtag, prevtag and nexttag with regard to the highlighted word is as follows:
... prevtag starttag word endtag nexttag...
Specify the character(s) inserted by MARKUP
to indicate the end of a highlighted term.
Specify the markup sequence that defines the tag that navigates the user to the previous highlight.
In the markup sequences prevtag and nexttag, you can specify the following offset variables which are set dynamically:
Offset Variable | Value |
---|---|
%CURNUM |
the current offset number |
%PREVNUM |
the previous offset number |
%NEXTNUM |
the next offset number |
See the description of the HTML_NAVIGATE
tagset for an example.
Specify the markup sequence that defines the tag that navigates the user to the next highlight tag.
Within the markup sequence, you can use the same offset variables you use for prevtag. See the explanation for prevtag and the HTML_NAVIGATE
tagset for an example.
Specify the language of the document. Use an Oracle Text supported language value as you would in the language column of the base table. See MULTI_LEXER in Chapter 2, "Oracle Text Indexing Elements".
Specify the format of the document. Use an Oracle Text supported format value, either TEXT, BINARY or IGNORE as you would specify in the format column of the base table. For more information, see the format column description in CREATE INDEX.
Specify the character set of the document. Use an Oracle Text supported value as you would specify in the charset column of the base table. See Indexing Mixed-Character Set Columns in Chapter 2, "Oracle Text Indexing Elements".
Restrictions
CTX_DOC.POLICY_MARKUP
does not support the use of query templates.
Display marked-up keywords in context. The returned text contains either the words that satisfy a word query or the themes that satisfy an ABOUT
query. This version of the CTX_DOC.SNIPPET procedure does not require an index.
Syntax
Syntax 1
exec CTX_DOC.POLICY_SNIPPET(
policy_name IN VARCHAR2, document IN [VARCHAR2|CLOB|BLOB|BFILE], text_query IN VARCHAR2, language IN VARCHAR2 default NULL, format IN VARCHAR2 default NULL, charset IN VARCHAR2 default NULL, starttag IN VARCHAR2 DEFAULT '<b>', endtag IN VARCHAR2 DEFAULT '</b>', entity_translation IN BOOLEAN DEFAULT TRUE, separator IN VARCHAR2 DEFAULT '<b>...</b>' ) return varchar2;
Syntax 2
exec CTX_DOC.POLICY_SNIPPET_CLOB_QUERY( policy_name IN VARCHAR2, document IN [VARCHAR2|CLOB|BLOB|BFILE], text_query IN VARCHAR2, language IN VARCHAR2 default NULL, format IN VARCHAR2 default NULL, charset IN VARCHAR2 default NULL, starttag IN VARCHAR2 DEFAULT '<b>', endtag IN VARCHAR2 DEFAULT '</b>', entity_translation IN BOOLEAN DEFAULT TRUE, separator IN VARCHAR2 DEFAULT '<b>...</b>' ) return varchar2;
Specify the name of a policy created with CTX_DDL.CREATE_POLICY
.
Specify the document in which to search for keywords.
Specify the original query expression used to retrieve the document. If NULL, no highlights are generated.
If text_query
includes wildcards, stemming, fuzzy matching which result in stopwords being returned, POLICY_SNIPPET
does not highlight the stopwords.
If text_query
contains the threshold operator, the operator is ignored.
Specify the language of the document. Use an Oracle Text supported language value as you would in the language column of the base table. See MULTI_LEXER in Chapter 2, "Oracle Text Indexing Elements".
Specify the format of the document. Use an Oracle Text supported format value, either TEXT, BINARY or IGNORE as you would specify in the format column of the base table. For more information, see the format column description in CREATE INDEX.
Specify the character set of the document. Use an Oracle Text supported value as you would specify in the charset column of the base table. See Indexing Mixed-Character Set Columns in Chapter 2, "Oracle Text Indexing Elements".
Specify the start tag for marking up the query keywords. Default is '<b>'.
Specify the end tag for marking up the query keywords. Default is '</b>'.
Specify if you want HTML entities to be translated. The default is TRUE, which means the special entities (<, >, and &) are translated into their alternate forms ('<', '>', and '&') when output by the procedure. However, special characters in the markup tags generated by CTX_DOC.POLICY_SNIPPET
will not be translated.
Specify the string separating different returned fragments. Default is '<b>...</b>'.
Limitations
CTX_DOC.POLICY_SNIPPET
does not support the use of query templates.
CTX_DOC.POLICY_SNIPPET
displays marked-up keywords in context when used with NULL_SECTION_GROUP
. However, there are limitations when using this procedure with XML documents. When used with XML_SECTION_GROUP
or AUTO_SECTION_GROUP
, the XML structure is ignored and user-specified tags are stripped out, which results in parts of surrounding text to be included in the returned snippet.
Related Topics
Generates a list of themes for a document. With this procedure, no CONTEXT
index is required.
Note:
CTX_DOC.POLICY_THEMES
requires an installed knowledge base. A knowledge base may or may not have been installed with Oracle Text. For more information on knowledge bases, see the Oracle Text Application Developer's Guide.Syntax
ctx_doc.policy_themes(policy_name in VARCHAR2, document in [VARCHAR2|CLOB|BLOB|BFILE], restab in out nocopy theme_tab, full_themes in BOOLEAN default FALSE, num_themes in number default 50 language in VARCHAR2 default NULL, format in VARCHAR2 default NULL, charset in VARCHAR2 default NULL );
Specify the policy you create with CTX_DDL.CREATE_POLICY.
Specify the document for which to generate a list of themes.
Specify the name of the theme_tab
PL/SQL index-by-table type.
Specify whether this procedure generates a single theme or a hierarchical list of parent themes (full themes) for each document theme.
Specify TRUE
for this procedure to write full themes to the THEME
column of the result table.
Specify FALSE
for this procedure to write single theme information to the THEME
column of the result table. This is the default.
Specify the maximum number of themes to retrieve. For example, if you specify 10, up to first 10 themes are returned for the document. The default is 50.
If you specify 0 or NULL
, this procedure returns all themes in a document. If the document contains more than 50 themes, only the first 50 themes show conceptual hierarchy.
Specify the language of the document. Use an Oracle Text supported language value as you would in the language column of the base table. See MULTI_LEXER in Chapter 2, "Oracle Text Indexing Elements".
Specify the format of the document. Use an Oracle Text supported format value, either TEXT, BINARY or IGNORE as you would specify in the format column of the base table. For more information, see the format column description in CREATE INDEX in Chapter 1, "Oracle Text SQL Statements and Operators".
Specify the character set of the document. Use an Oracle Text supported value as you would specify in the charset column of the base table. See Indexing Mixed-Character Set Columns in Chapter 2, "Oracle Text Indexing Elements".
Example
Create a policy:
exec ctx_ddl.create_policy('mypolicy');
Run themes:
declare la varchar2(200); rtab ctx_doc.theme_tab; begin ctx_doc.policy_themes('mypolicy', 'To define true madness, What is''t but to be nothing but mad?', rtab); for i in 1..rtab.count loop dbms_output.put_line(rtab(i).theme||':'||rtab(i).weight); end loop; end;
Generate all index tokens for document.With this procedure, no CONTEXT
index is required.
Syntax
ctx_doc.policy_tokens(policy_name in VARCHAR2, document in [VARCHAR2|CLOB|BLOB|BFILE], restab in out nocopy token_tab, language in VARCHAR2 default NULL, format in VARCHAR2 default NULL, charset in VARCHAR2 default NULL);
Specify the policy name created with CTX_DDL.CREATE_POLICY.
Specify the document for which to generate tokens.
Specify the name of the token_tab
PL/SQL index-by-table type.
The tokens returned are those tokens which are inserted into the index for the document. Stop words are not returned. Section tags are not returned because they are not text tokens.
Specify the language of the document. Use an Oracle Text supported language value as you would in the language column of the base table. See MULTI_LEXER in Chapter 2, "Oracle Text Indexing Elements".
Specify the format of the document. Use an Oracle Text supported format value, either TEXT, BINARY or IGNORE as you would specify in the format column of the base table. For more information, see the format column description in CREATE INDEX.
Specify the character set of the document. Use an Oracle Text supported value as you would specify in the charset column of the base table. See Indexing Mixed-Character Set Columns in Chapter 2, "Oracle Text Indexing Elements".
Example
Get tokens:
declare la varchar2(200); rtab ctx_doc.token_tab; begin ctx_doc.policy_tokens('mypolicy', 'To define true madness, What is''t but to be nothing but mad?',rtab); for i in 1..rtab.count loop dbms_output.put_line(rtab(i).offset||':'||rtab(i).token); end loop; end;
Use this procedure to set the CTX_DOC
procedures to accept either the ROWID
or the PRIMARY_KEY
document identifiers. This setting affects the invoking session only.
Syntax
ctx_doc.set_key_type(key_type in varchar2);
Specify either ROWID
or PRIMARY_KEY
as the input key type (document identifier) for CTX_DOC
procedures.
This parameter defaults to the value of the CTX_DOC_KEY_TYPE
system parameter.
Note:
When your base table has no primary key, setting key_type toPRIMARY_KEY
is ignored. The textkey parameter that you specify for any CTX_DOC
procedure is interpreted as a ROWID
.Example
The following example sets CTX_DOC
procedures to accept primary key document identifiers.
begin ctx_doc.set_key_type('PRIMARY_KEY'); end
Use the CTX_DOC.SNIPPET
procedure to produce a concordance for a document. A concordance is a text fragment that contains a query term with some of its surrounding text. This is also sometimes known as Key Word in Context or KWIC, because it returns query keywords marked up in their surrounding text, which enables the user to evaluate them in context. The returned text can also contain themes that satisfy an ABOUT
query.
For example, a search on brillig and slithey might return one relevant fragment of a document as follows:
'Twas <b>brillig</b>, and the <b>slithey</b> toves did gyre and
CTX_DOC.SNIPPET
returns one or more most relevant fragments for a document that contains the query term. Because CTX_DOC.SNIPPET
returns surrounding text, you can immediately evaluate how useful the returned term is.
See Also:
CTX_DOC.POLICY_SNIPPET in this chapter for a policy-based version of this procedureSyntax
Syntax 1
exec CTX_DOC.SNIPPET(
index_name IN VARCHAR2, textkey IN VARCHAR2, text_query IN VARCHAR2, starttag IN VARCHAR2 DEFAULT '<b>', endtag IN VARCHAR2 DEFAULT '</b>', entity_translation IN BOOLEAN DEFAULT TRUE, separator IN VARCHAR2 DEFAULT '<b>...</b>' ) return varchar2;
Syntax 2
exec CTX_DOC.SNIPPET_CLOB_QUERY( index_name IN VARCHAR2, textkey IN CLOB, text_query IN VARCHAR2, starttag IN VARCHAR2 DEFAULT '<b>', endtag IN VARCHAR2 DEFAULT '</b>', entity_translation IN BOOLEAN DEFAULT TRUE, separator IN VARCHAR2 DEFAULT '<b>...</b>' ) return varchar2;
Specify the name of the index for the text column.
Specify the unique identifier (usually the primary key) for the document.
The textkey parameter can be as follows:
A single column primary key value
An encoded specification for a composite (multiple column) primary key. When textkey is a composite key, you must encode the composite textkey string using the CTX_DOC.PKENCODE
procedure.
The rowid of the row containing the document
Use CTX_DOC.SET_KEY_TYPE
to toggle between primary key and rowid identification.
Specify the original query expression used to retrieve the document. If NULL, no highlights are generated.
If text_query
includes wildcards, stemming, fuzzy matching which result in stopwords being returned, SNIPPET
does not highlight the stopwords.
If text_query
contains the threshold operator, the operator is ignored.
Specify the start tag for marking up the query keywords. Default is '<b>'.
Specify the end tag for marking up the query keywords. Default is '</b>'.
Specify if you want HTML entities to be translated. The default is TRUE, which means that the special entities (<, >, and &) are translated into their alternative forms ('<', '>', and '&') when output by the procedure. However, special characters in the markup tags that are generated by CTX_DOC.SNIPPET
will not be translated.
Specify the string separating different returned fragments. Default is '<b>...</b>'.
Example
create table tdrbhk01 (id number primary key, text varchar2(4000)); insert into tdrbhk01 values (1, 'Oracle Text adds powerful search <title>withintitle</title> and intelligent text management to the Oracle database. Complete. You can search and manage documents, web pages, catalog entries in more than 150 formats in any language. Provides a complete text query language and complete character support. Simple. You can index and search text using SQL. Oracle Text Management can be done using Oracle Enterprise Manager - a GUI tool. Fast. You can search millions of documents, document,web pages, catalog entries using the power and scalability of the database. Intelligent. Oracle Text''s unique knowledge-base enables you to search, classify, manage documents, clusters and summarize text based on its meaning as well as its content. '); exec ctx_ddl.create_section_group('my_sectioner','BASIC_SECTION_GROUP'); exec ctx_ddl.add_field_section('my_sectioner','title','title', false); create index tdrbhk01x on tdrbhk01(text) indextype is ctxsys.context parameters ('filter CTXSYS.NULL_FILTER section group my_sectioner nopopulate'); select ctx_doc.snippet('tdrbhk01x','1', 'search | classify') from dual;
The result looks something like this:
CTX_DOC.SNIPPET('TDRBHK01X','1','SEARCH|CLASSIFY') ------------------------------------------------------------------------ Text's unique knowledge-base enables you to <b>search</b>, <b>classify</b>, manage documents, clusters and summarize
Limitations
CTX_DOC.SNIPPET
does not support the use of query templates.
CTX_DOC.SNIPPET
displays marked-up keywords in context when used with NULL_SECTION_GROUP
. However, there are limitations when using this procedure with XML documents. When used with XML_SECTION_GROUP
or AUTO_SECTION_GROUP
, the XML structure is ignored and user-specified tags are stripped out, which results in parts of surrounding text to be included in the returned snippet.
Related Topics
Use the CTX_DOC.THEMES
procedure to generate a list of themes for a document. You can store each theme as a row in either a result table or an in-memory PL/SQL table that you specify.
Note:
CTX_DOC.THEMES
requires an installed knowledge base. A knowledge base may or may not have been installed with Oracle Text. For more information on knowledge bases, see the Oracle Text Application Developer's Guide.Syntax 1: In-Memory Table Storage
CTX_DOC.THEMES(
index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN OUT NOCOPY THEME_TAB, full_themes IN BOOLEAN DEFAULT FALSE, num_themes IN NUMBER DEFAULT 50);
Syntax 2: Result Table Storage
CTX_DOC.THEMES(
index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN VARCHAR2, query_id IN NUMBER DEFAULT 0, full_themes IN BOOLEAN DEFAULT FALSE, num_themes IN NUMBER DEFAULT 50);
Specify the name of the index for the text column.
Specify the unique identifier (usually the primary key) for the document.
The textkey parameter can be as follows:
A single column primary key value
An encoded specification for a composite (multiple column) primary key. When textkey is a composite key, you must encode the composite textkey string using the CTX_DOC.PKENCODE
procedure.
The rowid of the row containing the document
Toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE
.
You can specify this procedure to store results to either a table or to an in-memory PL/SQL table.
To store results in a table, specify the name of the table.
To store results in an in-memory table, specify the name of the in-memory table of type THEME_TAB
. The THEME_TAB
datatype is defined as follows:
type theme_rec is record ( theme varchar2(2000), weight number ); type theme_tab is table of theme_rec index by binary_integer;
CTX_DOC.THEMES
clears the THEME_TAB
you specify before the operation.
Specify the identifier used to identify the row(s) inserted into restab.
Specify whether this procedure generates a single theme or a hierarchical list of parent themes (full themes) for each document theme.
Specify TRUE
for this procedure to write full themes to the THEME
column of the result table.
Specify FALSE
for this procedure to write single theme information to the THEME
column of the result table. This is the default.
Specify the maximum number of themes to retrieve. For example, if you specify 10, then up to the first 10 themes are returned for the document. The default is 50.
If you specify 0 or NULL
, then this procedure returns all themes in a document. If the document contains more than 50 themes, then only the first 50 themes show conceptual hierarchy.
Examples
In-Memory Themes
The following example generates the first 10 themes for document 1 and stores them in an in-memory table called the_themes
. The example then loops through the table to display the document themes.
declare the_themes ctx_doc.theme_tab; begin ctx_doc.themes('myindex','1',the_themes, numthemes=>10); for i in 1..the_themes.count loop dbms_output.put_line(the_themes(i).theme||':'||the_themes(i).weight); end loop; end;
Theme Table
The following example creates a theme table called CTX_THEMES
:
create table CTX_THEMES (query_id number, theme varchar2(2000), weight number);
Single Themes
To obtain a list of up to the first 20 themes, where each element in the list is a single theme, enter a statement like the following example:
begin
ctx_doc.themes('newsindex','34','CTX_THEMES',1,full_themes => FALSE, num_themes=> 20);
end;
Full Themes
To obtain a list of the top 20 themes, where each element in the list is a hierarchical list of parent themes, enter a statement like the following example:
begin
ctx_doc.themes('newsindex','34','CTX_THEMES',1,full_themes => TRUE, num_ themes=>20);
end;
Use this procedure to identify all text tokens in a document. The tokens returned are those tokens that are inserted into the index. This feature is useful for implementing document classification, routing, or clustering.
Stopwords are not returned. Section tags are not returned because they are not text tokens.
Syntax 1: In-Memory Table Storage
CTX_DOC.TOKENS(index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN OUT NOCOPY TOKEN_TAB);
Syntax 2: Result Table Storage
CTX_DOC.TOKENS(index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN VARCHAR2, query_id IN NUMBER DEFAULT 0);
Specify the name of the index for the text column.
Specify the unique identifier (usually the primary key) for the document.
The textkey parameter can be as follows:
A single column primary key value
Encoded specification for a composite (multiple column) primary key. To encode a composite textkey, use the CTX_DOC.PKENCODE procedure.
The rowid of the row containing the document
Toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE.
You can specify that this procedure store results to either a table or to an in-memory PL/SQL table.
The tokens returned are those tokens that are inserted into the index for the document (or row) named with textkey. Stop words are not returned. Section tags are not returned because they are not text tokens.
Specifying a Token Table
To store results to a table, specify the name of the table. Token tables can be named anything, but must include the columns shown in the following table, with names and datatypes as specified.
Table 8-1 Required Columns for Token Tables
Column Name | Type | Description |
---|---|---|
|
|
The identifier for the results generated by a particular call to |
|
|
The token string in the text. |
|
|
The position of the token in the document, relative to the start of document which has a position of 1. |
|
|
The character length of the token. |
Specifying an In-Memory Table
To store results to an in-memory table, specify the name of the in-memory table of type TOKEN_TAB
. The TOKEN_TAB
datatype is defined as follows:
type token_rec is record (
token varchar2(64), offset number, length number
); type token_tab is table of token_rec index by binary_integer;
CTX_DOC.TOKENS
clears the TOKEN_TAB
you specify before the operation.
Specify the identifier used to identify the row(s) inserted into restab.
Example
In-Memory Tokens
The following example generates the tokens for document 1 and stores them in an in-memory table, declared as the_tokens
. The example then loops through the table to display the document tokens.
declare the_tokens ctx_doc.token_tab; begin ctx_doc.tokens('myindex','1',the_tokens); for i in 1..the_tokens.count loop dbms_output.put_line(the_tokens(i).token); end loop; end;