Oracle9i Data Cartridge Developer's Guide Release 2 (9.2) Part Number A96595-01 |
|
This chapter describes various design considerations, including:
Structured data is data that can be represented to Oracle in the form of an object type. Unstructured data--that is, data of type RAW
or BLOB
--cannot be interpreted by Oracle. The choice whether to model cartridge data as structured or unstructured depends on the following considerations:
Unlike nested tables, VARRAY
s are an ordered set of items. Physically, VARRAY
s are stored as RAW
or LOB
columns, whereas nested tables are stored in tables. Following are some considerations to weigh in choosing which sort of collection better suits your purpose.
NESTED_TABLE_ID
values.NESTED_TABLE_ID
column enhances retrieval of nested tables.VARRAY
s is better suited for retrieval as a value since that is s how it is stored.VARRAY
s is not available.VARRAY
s is sub-optimal since rows have to materialized from collection value.VARRAY
s stored as LOB
s is permitted when the parent table is partitioned.Based on the preceding implications, if the ability to query of update individual collection elements is important, then nested tables are a better choice to model your collection data. On the other hand, if your application is requires fetching the entire collection as a whole and then operating on it, modeling the collection data as a VARRAY
will yield better retrieval performance.
When writing methods for object types, you have the choice of implementing them in PL/SQL, C/C++, or Java. PL/SQL and Java methods run in the address space of the server. C/C++ methods are dispatched as external procedures and run outside the address space of the server.
The best implementation choice varies with the situation. Here are some guidelines:
Until release 8.1.5, stored procedures and SQL methods could only execute with the privileges of the definer. Such definer-rights routines are bound to the schema in which they reside, and this remains the default. Under this condition, a routine executes with the rights of the definer of the function, not the user invoking it. However, this is a limitation if the function statically or dynamically issues SQL statements.
For example, if the function had a static cursor that performs a SELECT
from USER_TABLES
, the USER_TABLES
it would retrieve would be that of the definer irrespective of which user was using the function. For the function to be used against data not owned by the definer, explicit GRANT
s had to be issued from the owner to the definer, or the function needed to be defined in the same schema where the data resided. The former course creates security and administration problems; the latter forces the function to be redefined in each schema that needs to use it.
The invoker-rights mechanism permits a function to execute with the privileges of the invoker. This permits cartridges to live within a schema dedicated to the cartridge and to be used by other schemas without requiring privileges be granted to operate on objects in the schema where the cartridge resides.
You should consider utilizing callouts in the following circumstances:
You should consider utilizing callbacks in the following circumstances:
Consider making a single callout which does multiple callbacks rather than multiple callouts (for example, instead of a factorial callout which takes a single number and computes a the factorial for it, consider making a callout which takes a VARRAY
and repeatedly calls back to get next number to compute the factorial for. You always do performance testing to see at what at point the multi-call back approach out-performs the multi-callout approach.
LOB
type (BFILE
/BLOB
).BFILE
so that no BFILE
-specific logic is required in your callout (other than error recovery from OCILob calls that do not operate on BFILEs
).LOB
s in Oracle8i release 8.1.5, you need to be aware of the deep copy that can occur when assignments and calls are done with temporary LOB
s. Use "NOCOPY
" (BY REFERENCE
) on BLOB
parameters as appropriate.External procedures under Oracle 8.0 have a "state-less" model. All Statement handles opened during the invocation of an external procedure are closed implicitly at the end of the call.
In Oracle9i, we allow "state" (OCI Statement handles and so forth, and associated state in the DBMS) to be saved and used across invocations of external procedures in a session. B y default cartridges are still stateless, however, OCIMemory
services and OCIContext
services can be used with OCI_DURATION_SESSION
or other appropriate duration to save state. Statement handles created in one external procedure invocation can get re-used in another. The Data Cartridge developer needs to explicitly free these handles. It is recommended that this is done as soon as the statement handle is no longer needed. All state maintained for the statement in the OCI handles and in the DBMS would get freed as a result. This should help in improving the scalability of the Data Cartridge.
It is wrong to assume that creating domain index is always the best course. If, after careful consideration, you determine that you need to create domain index, you should keep the following factors in mind. For one, if the domain index is complex, the functional implementation will work better
Judicious use of the extensible optimizer can lead to good performance.
Naming of internal components can be an issue. Naming of internal data objects for a domain index implementation and are typically based on names you provide for table and indexes. The problem is that the derived names for the internal objects should not conflict with any other user defined object or system object. You may have to develop some policy that restricts names, or implement some metadata management scheme to avoid errors during DROP
, CREATE
, and so on.
You can create only one index on IOTs in 8.0.x releases. However, if most of your data is in the index, using an IOT is more efficient than storing your data in both a table and an additional index.
You can create secondary indexes on IOTs in Oracle9i. These offer a big advantage if you are accessing the data multiple ways.
Index structures can be stored in LOBs
but take care to tune the LOB
for best performance. If you are accessing a particular LOB
frequently, create your table with the CACHE
option and place the LOB
index in a separate tablespace. If you are updating a LOB
frequently, TURN
OFF
LOGGING
and read/write in multiples of CHUNK
size. If you are accessing a particular portion of a LOB
frequently, buffer your reads/writes using LOB
buffering or your own buffering scheme.
With the extensible indexing framework, the meaning and representation of a user-defined index is left to the cartridge developer. We do provide basic index implementations such as IOTs. In certain cases, binary or character LOB
s can also be used to store complex index structures. IOTs, BLOB
s and CLOB
s all live within the database. In addition to them, you may also store a user-defined index as a structure external to the database, say in a BFILE
.
The external index structure gives you the most flexibility in representing your index. An external index structure is particularly useful if you have already invested in the development of in-memory indexing structures. For example, an operating system file may store index data, which is read into a memory mapped file at run time. Such a case can be handled as a BFILE
in the external index routines.
External index structures may also provide superior performance, although, this gain comes at some cost. Index structures external to the database do not participate in the transaction semantics of the database, which, in the case of index structures inside the database, make data and concomitant index updates atomic. This means that if an update to the data causes an update for the external index to be invoked through the extensible indexing interface, any failures may cause the data updates to be rolled back but not the index updates. The database can only roll back what is internal to it: external index structures cannot be rolled back in synchronization with a database rollback.
External index structures are perhaps most useful for read-only access. Their semantics become complex if updates to data are involved.
ODCIIndexFetch(self IN [OUT] <impltype>, nrows IN NUMBER, rids OUT ODCIRidList) RETURN NUMBER
When the ODCIIndexFetch
routine is called, the ROWID
s of all the rows that satisfy the operator predicate are returned. The maximum number of rows that can be returned by the ODCIIndexFetch
routine is nrows
(nrows
being an argument to the ODCIIndexFetch
routine). The value of nrows
is decided by Oracle based on some internal factors. If you have a better idea of the number of rows that ought to be returned to achieve optimal query performance, you can determine that this number of rows is returned in the ODCIRidList
VARRAY
instead of nrows
. Note that the number of values in the ODCIRidList
has to be less than or equal to nrows
.
You, as cartridge designer, are in the best position to make a judgement regarding the number of rows to be returned. For example, if in the index the number of (say 1500) rowids are stored together and nrows
= 2000, then it may be optimal to return 1500 rows in lieu of 2000 rows. Otherwise the user would have to retrieve 3000 rowids, return 2000 if them and note which 1000 rowids were not returned.
If you not have any specific optimization in mind, you can use the value of nrows
to determine the number of rows to be returned. Currently the value of nrows
has been set to 2000.
Anyone implementing indexes which use callouts should use multirow fetch to fetch the largest number of rows back to the server. This offsets the cost of making the callout.
All indexes should contain an indexed and functional implementation of the operator, in case the optimizer chooses not to use the indexed implementation. You can, however, use the indexing structures to produce the functional result.
In Oracle9i only the CPU and I/O costs are considered.
The cost of executing a C function can be determined using common profilers or tools. For SQL queries, an explain plan of the query would give a rough estimate of the cost of the query. In addition the tkprof
utility can be used to gather information about the CPU and the I/O cost involved in the operation. The cost of executing a callout could also be determined by using it in a SQL query which "selects from dual" and then estimating its cost from the tkprof
utility.
The cost of the index is a function of the selectivity of the predicate (which is passed as an argument to the cost function) * the total number of data blocks in the index structures. Hence the index cost function should be one which increases with the increase in selectivity of the predicate. With a selectivity of 100%, the cost of accessing the index should be the cost of accessing all the data in all the structures that comprise the domain index.
The total cost of accessing the index is the cost of performing the ODCIIndexStart
, N * ODCIIndexFetch
and ODCIIndexClose
operators, where N is the number of times the ODCIIndexFetch
routine will be called based on the selectivity of the predicate. The cost of ODCIIndexStart
, ODCIIndexFetch
and ODCIIndexClose
functions can be determined as discussed in the previous section.
The selectivity of a predicate is the percentage of rows returned by the predicate divided by the total number of rows in the table(s).
The selectivity function should use the statistics collected for the table to determine what percentage of rows of the table will be returned by the predicate with the given list of arguments. For example, to compute the selectivity of a predicate IMAGE_GREATER_THAN
(Image
SelectedImage
) which determines the images that are greater than the Image
SelectedImage
, a histogram of the sizes of the images in the database can be a useful statistics to compute the selectivity.
Statistics can affect the calculation of selectivity for predicates and also the cost of domain indexes.
The statistics collected for a table can affect the computation of selectivity of a predicate. So statistics that can help the user make a better judgement about the selectivity of a predicate should be collected for a table/column. Knowing the predicates that would operate on the data will be helpful to determine what statistics would be good to collect.
Some example of statistics that can be useful in spatial domain for example could be the average/min/max number of elements in a VARRAY
that contains the nodes of the spatial objects.
Note that standard statistics are collected in addition to the user defined statistics when the ANALYZE
command is invoked.
When a domain index is analyzed statistics for the underlying objects which constitute the domain index should be analyzed. For example if the domain index is comprised of tables, the statistics collection function should ANALYZE
the tables when the domain index is analyzed. The cost of accessing the domain index can be influenced by the statistics that have been collected for the index. For example the cost of accessing a domain index could be approximated to the selectivity * the total number of data blocks (in the various tables) being accessed when the domain index is accessed.
Note: Oracle Corporation recommends that you use the See Oracle9i Supplied PL/SQL Packages and Types Reference for information about |
To accurately define cost, selectivity and statistics functions, a good understanding of the domain is required. The preceding guidelines are meant to help you understand some of the issues you need to take into account while working on the cost, selectivity and statistics functions. In general it may be a good idea to start of by using the default cost and selectivity and observe how the queries of interest behave.
OIDs
in all of your object types so users can import/export data easily across databases.README
with your cartridge to tell users how to install the cartridgesqlplus @imginst
listener
if you are using callouts.extproc
. Most users have never heard of extproc
and many users have never set up a listener. This is the primary problem when deploying cartridges.'instantiate_file'
action. This is a great feature that enables you to substitute variables in your files when they are installed and it leaves your user with scripts and files that are customized for their machine.You should:
You should avoid:
int
<-> size_t
implicit casts on a 64 bit platform
|
Copyright © 1996, 2002 Oracle Corporation. All Rights Reserved. |
|