Oracle® Database Semantic Technologies Developer's Guide 11g Release 1 (11.1) Part Number B28397-01 |
|
|
View PDF |
This chapter describes concepts related to the support for a subset of the Web Ontology Language (OWL). It builds on the information in Chapter 1, and it assumes that you are familiar with the major concepts associated with OWL, such as ontologies, properties, and relationships. For detailed information about OWL, see the OWL Web Ontology Language Reference at http://www.w3.org/TR/owl-ref/
.
An ontology is a shared conceptualization of knowledge in a particular domain. It consists of a collection of classes, properties, and optionally instances. Classes are typically related by class hierarchy (subclass/ superclass relationship). Similarly, the properties can be related by property hierarchy (subproperty/ superproperty relationship). Properties can be symmetric or transitive, or both. Properties can also have domain, ranges, and cardinality constraints specified for them.
RDFS-based ontologies only allow specification of class hierarchies, property hierarchies, instanceOf
relationships, and a domain and a range for properties.
OWL ontologies build on RDFS-based ontologies by additionally allowing specification of property characteristics. OWL ontologies can be further classified as OWL-Lite, OWL-DL, and OWL Full. OWL-Lite restricts the cardinality minimum and maximum values to 0 or 1. OWL-DL relaxes this restriction by allowing arbitrary values for minimum and maximum values. OWL Full allows instances to be also defined as a class, which is not allowed in OWL-DL and OWL-Lite ontologies.
Section 2.1.2 describes OWL capabilities that are supported and not supported with semantic data.
Figure 2-1 shows part of a cancer ontology, which describes the classes and properties related to cancer. One requirement is to have a PATIENTS data table with a column named DIAGNOSIS, which must contain a value from the Diseases_and_Disorders
class hierarchy.
In the cancer ontology shown in Figure 2-1, the diagnosis Immune_System_Disorder
includes two subclasses, Autoimmune_Disease
and Immunodeficiency_Syndrome
. The Autoimmune_Disease
diagnosis includes the subclass Rheumatoid_Arthritis
; and the Immunodeficiency_Syndrome
diagnosis includes the subclass T_Cell_Immunodeficiency
, which includes the subclass AIDS
.
The data in the PATIENTS table might include the PATIENT_ID and DIAGNOSIS column values shown in Table 2-1.
Table 2-1 PATIENTS Table Example Data
PATIENT_ID | DIAGNOSIS |
---|---|
1234 |
Rheumatoid_Arthritis |
2345 |
Immunodeficiency_Syndrome |
3456 |
AIDS |
To query ontologies, you can use the SEM_MATCH table function (described in Section 1.6) or the SEM_RELATED operator and its ancillary operators (described in Section 2.3).
This section describes OWL vocabulary subsets that are supported.
Oracle Database supports the RDFS++, OWLSIF, and OWLPrime vocabularies, which have increasing expressivity. Each supported vocabulary has a corresponding rulebase; however, these rulebases do not need to be populated because the underlying entailment rules of these three vocabularies are internally implemented. The supported vocabularies are as follows:
RDFS++: A minimal extension to RDFS; which is RDFS plus owl:sameAs
and owl:InverseFunctionalProperty
.
OWLSIF: OWL with IF Semantic, with the vocabulary and semantics proposed for pD* semantics in Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary, by H.J. Horst, Journal of Web Semantics 3, 2 (2005), 79–115.
OWLPrime: The following OWL capabilities:
Basics: class, subclass, property, subproperty, domain, range, type
Property characteristics: transitive, symmetric, functional, inverse functional, inverse
Class comparisons: equivalence, disjointness
Property comparisons: equivalence
Individual comparisons: same, different
Class expressions: complement
Property restrictions: hasValue
, someValuesFrom
, allValuesFrom
As with pD*, the supported semantics for these value restrictions are only intensional (IF semantics).
The following OWL capabilities are not yet supported in any Oracle-supported OWL subset:
Property restrictions: cardinality
Class expressions: set operations (union, intersection), enumeration
You can use entailment rules to perform native OWL inferencing. This section creates a simple ontology, performs native inferencing, and illustrates some more advanced features.
Example 2-1 creates a simple OWL ontology, inserts one statement that two URIs refer to the same entity, and performs a query using the SEM_MATCH table function
Example 2-1 Creating a Simple OWL Ontology
SQL> CREATE TABLE owltst(id number, triple sdo_rdf_triple_s); Table created. SQL> EXECUTE sem_apis.create_sem_model('owltst','owltst','triple'); PL/SQL procedure successfully completed. SQL> INSERT INTO owltst VALUES (1, sdo_rdf_triple_s('owltst', 'http://foo.com/name/John', 'http://www.w3.org/2002/07/owl#sameAs', 'http://foo.com/name/JohnQ')); 1 row created. SQL> commit; SQL> -- Use SEM_MATCH to perform a simple query. SQL> select s,p,o from table(SEM_MATCH('(?s ?p ?o)', SEM_Models('OWLTST'), null, null, null ));
Example 2-2 calls the SEM_APIS.CREATE_ENTAILMENT procedure. You do not need to create the rulebase and add rules to it, because the OWL rules are already built into the Oracle semantic technologies inferencing engine.
Example 2-2 Performing Native OWL Inferencing
SQL> -- Invoke the following command to run native OWL inferencing that SQL> -- understands the vocabulary defined in the preceding section. SQL> SQL> EXECUTE sem_apis.create_entailment('owltst_idx', sem_models('owltst'), sem_rulebases('OWLPRIME')); PL/SQL procedure successfully completed. SQL> -- The following view is generated to represent the entailed graph (rules index). SQL> desc mdsys.semi_owltst_idx; SQL> -- Run the preceding query with an additional rulebase parameter to list SQL> -- the original graph plus the inferred triples. SQL> SELECT s,p,o FROM table(SEM_MATCH('(?s ?p ?o)', SEM_MODELS('OWLTST'), SEM_RULEBASES('OWLPRIME'), null, null ));
Example 2-3 creates a user-defined rulebase, inserts a deliberately oversimplified uncleOf
rule (stating that the brother of one's father is one's uncle), and calls the SEM_APIS.CREATE_ENTAILMENT procedure.
Example 2-3 Performing OWL and User-Defined Rules Inferencing
SQL> -- First, insert the following assertions. SQL> INSERT INTO owltst VALUES (1, sdo_rdf_triple_s('owltst', 'http://foo.com/name/John', 'http://foo.com/rel/fatherOf', 'http://foo.com/name/Mary')); SQL> INSERT INTO owltst VALUES (1, sdo_rdf_triple_s('owltst', 'http://foo.com/name/Jack', 'http://foo.com/rel/botherOf', 'http://foo.com/name/John')); SQL> -- Create a user-defined rulebase. SQL> EXECUTE sem_apis.create_rulebase('user_rulebase'); SQL> -- Insert a simple "uncle" rule. SQL> INSERT INTO mdsys.semr_user_rulebase VALUES ('uncle_rule', '(?x <http://foo.com/rel/botherOf> ?y)(?y <http://foo.com/rel/fatherOf> ?z)', NULL, '(?x <http://foo.com/rel/uncleOf> ?z)', null); SQL> -- In the following statement, 'USER_RULES=T' is required, to SQL> -- include the original graph plus the inferred triples. SQL> EXECUTE sem_apis.create_entailment('owltst2_idx', sem_models('owltst'), sem_rulebases('OWLPRIME','USER_RULEBASE'), SEM_APIS.REACH_CLOSURE, null, 'USER_RULES=T'); SQL> -- In the result of the following query, :Jack :uncleOf :Mary is inferred. SQL> SELECT s,p,o FROM table(SEM_MATCH('(?s ?p ?o)', SEM_MODELS('OWLTST'), SEM_RULEBASES('OWLPRIME','USER_RULEBASE'), null, null ));
OWL inferencing can be complex, depending on the size of the ontology, the actual vocabulary (set of language constructs) used, and the interactions among those language constructs. The question arises, how can we trust inferred results? The answer involves using proof generation during inference. (Proof generation does require additional CPU time and disk resources.)
To generate the information required for proof, specify PROOF=T
in the call to the SEM_APIS.CREATE_ENTAILMENT procedure, as shown in the following example:
EXECUTE sem_apis.create_entailment('owltst_idx', sem_models('owltst'), sem_rulebases('owlprime'), SEM_APIS.REACH_CLOSURE, 'SAM', 'PROOF=T');
Specifying PROOF=T
causes a view to be created containing proof for each inferred triple. The view name is the entailment name prefixed by MDSYS.SEMI_
. Two relevant columns in this view are LINK_ID and EXPLAIN (the proof). The following example displays the LINK_ID value and proof of each generated triple (with LINK_ID values shortened for simplicity):
SELECT link_id || ' generated by ' || explain as triple_and_its_proof FROM mdsys.semi_owltst_idx; TRIPLE_AND_ITS_PROOF -------------------------------------------------------------------- 8_5_5_4 generated by 4_D_5_5 : SYMM_SAMH_SYMM 8_4_5_4 generated by 8_5_5_4 4_D_5_5 : SAM_SAMH . . .
A proof consists of one or more triple (link) ID values and the name of the rule that is applied on those triples:
link-id1
[link-id2
... link-idn
] : rule-name
To get the full subject, predicate, and object URIs for proofs, you can query the model view and the entailment (rules index) view. Example 2-4 displays the LINK_ID value and associated triple contents using the model view MDSYS.SEMM_OWLTST and the entailment view MDSYS.SEMI_OWLTST_IDX.
Example 2-4 Displaying Proof Information
SELECT to_char(x.triple.rdf_m_id, 'FMXXXXXXXXXXXXXXXX') ||'_'|| to_char(x.triple.rdf_s_id, 'FMXXXXXXXXXXXXXXXX') ||'_'|| to_char(x.triple.rdf_p_id, 'FMXXXXXXXXXXXXXXXX') ||'_'|| to_char(x.triple.rdf_c_id, 'FMXXXXXXXXXXXXXXXX'), x.triple.get_triple() FROM ( SELECT sdo_rdf_triple_s( t.canon_end_node_id, t.model_id, t.start_node_id, t.p_value_id, t.end_node_id) triple FROM (select * from mdsys.semm_owltst union all select * from mdsys.semi_owltst_idx ) t WHERE t.link_id IN ('4_D_5_5','8_5_5_4') ) x; LINK_ID X.TRIPLE.GET_TRIPLE()(SUBJECT, PROPERTY, OBJECT) ---------- -------------------------------------------------------------- 4_D_5_5 SDO_RDF_TRIPLE('<http://foo.com/name/John>', '<http://www.w3.org/2002/07/owl#sameAs>', '<http://foo.com/name/JohnQ>') 8_5_5_4 SDO_RDF_TRIPLE('<http://foo.com/name/JohnQ>', '<http://www.w3.org/2002/07/owl#sameAs>', '<http://foo.com/name/John>')
In Example 2-4, for the proof entry 8_5_5_4 generated by 4_D_5_5 : SYMM_SAMH_SYMM for the triple with LINK_ID = 8_5_5_4, it is inferred from the triple with 4_D_5_5 using the symmetricity of owl:sameAs
.
An OWL ontology may contain errors, such as unsatisfiable classes, instances belonging to unsatisfiable classes, and two individuals asserted to be same and different at the same time. You can use the SEM_APIS.VALIDATE_MODEL and SEM_APIS.VALIDATE_ENTAILMENT functions to detect inconsistencies in the original data model and in the entailment, respectively.
Example 2-5 shows uses the SEM_APIS.VALIDATE_ENTAILMENT function, which returns a null value if no errors are detected or a VARRAY of strings if any errors are detected.
Example 2-5 Validating an Entailment
SQL> -- Insert an offending triple.
SQL> insert into owltst values (1, sdo_rdf_triple_s('owltst',
'urn:C1', 'http://www.w3.org/2000/01/rdf-schema#subClassOf', 'http://www.w3.org/2002/07/owl#Nothing'));
SQL> -- Drop entailment first.
SQL> exec sem_apis.drop_entailment('owltst_idx');
PL/SQL procedure successfully completed.
SQL> -- Perform OWL inferencing.
SQL> exec sem_apis.create_entailment('owltst_idx', sem_models('OWLTST'), sem_rulebases('OWLPRIME'));
PL/SQL procedure successfully completed.
SQL > set serveroutput on;
SQL > -- Now invoke validation API: sem_apis.validate_entailment
SQL >
declare
lva mdsys.rdf_longVarcharArray;
idx int;
begin
lva := sem_apis.validate_entailment(sem_models('OWLTST'), sem_rulebases('OWLPRIME')) ;
if (lva is null) then
dbms_output.put_line('No errors found.');
else
for idx in 1..lva.count loop
dbms_output.put_line('Offending entry := ' || lva(idx)) ;
end loop ;
end if;
end ;
/
SQL> -- NOTE: The LINK_ID value and the numbers in the following
SQL> -- line are shortened for simplicity in this example. --
Offending entry := 1 10001 (4_2_4_8 2 4 8) Unsatisfiable class.
Each item in the validation report array includes the following information:
Number of triples that cause this error (1
in Example 2-5)
Error code (10001
Example 2-5)
One or more triples (shown in parentheses in the output; (4_2_4_8 2 4 8)
in Example 2-5).
These numbers are the LINK_ID value and the ID values of the subject, predicate, and object.
Descriptive error message (Unsatisfiable class.
in Example 2-5)
The output in Example 2-5 indicates that the error is caused by one triple that asserts that a class is a subclass of an empty class owl:Nothing
.
In addition to accepting OWL vocabularies, the SEM_APIS.CREATE_ENTAILMENT procedure accepts RDFS rulebases. The following example shows RDFS inference (all standard RDFS rules are defined in http://www.w3.org/TR/rdf-mt/
):
EXECUTE sem_apis.create_entailment('rdfstst_idx', sem_models('my_model'), sem_rulebases('RDFS'));
Because rules RDFS4A, RDFS4B, RDFS6, RDFS8, RDFS10, RDFS13 may not generate meaningful inference for your applications, you can deselect those components for faster inference. The following example deselects these rules.
EXECUTE sem_apis.create_entailment('rdfstst_idx', sem_models('my_model'), sem_rulebases('RDFS'), SEM_APIS.REACH_CLOSURE, 'RDFS4A-, RDFS4B-, RDFS6-, RDFS8-, RDFS10-, RDFS13-');
This section describes suggestions for improving the performance of inference operations.
Collect statistics before inferencing. After you load a large RDF/OWL data model, you should execute the SEM_PERF.GATHER_STATS procedure. See the Usage Notes for that procedure (in Chapter 4) for important usage information.
Allocate sufficient temporary tablespace for inference operations. OWL inference support in Oracle relies heavily on table joins, and therefore uses significant temporary tablespace.
You can try the following statement before running the SEM_APIS.CREATE_ENTAILMENT procedure, to avoid sort merge joins that might affect inference performance:
ALTER SESSION SET "_optimizer_sortmerge_join_enabled" = false;
If you do this, be sure to reset the value to true
after calling the SEM_APIS.CREATE_ENTAILMENT procedure.
To improve inference performance with user defined rules, enter the following statement:
ALTER SESSION SET "_with_subquery"=INLINE;
This setting instructs the optimizer to inline a WITH
subquery instead of materializing it.
Selective inferencing is component-based inferencing, in which you limit the inferencing to specific OWL components that you are interested in. To perform selective inferencing, use the inf_components_in
parameter to the SEM_APIS.CREATE_ENTAILMENT procedure to specify a comma-delimited list of components. The final inferencing is determined by the union of rulebases specified and the components specified.
Example 2-6 limits the inferencing to the class hierarchy from subclass (SCOH) relationship and the property hierarchy from subproperty (SPOH) relationship. This example creates an empty rulebase and then specifies the two components ('SCOH,SPOH'
) in the call to the SEM_APIS.CREATE_ENTAILMENT procedure.
Example 2-6 Performing Selective Inferencing
EXECUTE sem_apis.create_rulebase('my_rulebase');
EXECUTE sem_apis.create_entailment('owltst_idx', sem_models('owltst'), sem_rulebases('my_rulebase'), SEM_APIS.REACH_CLOSURE, 'SCOH,SPOH');
The following component codes are available: SCOH, COMPH, DISJH, SYMMH, INVH, SPIH, MBRH, SPOH, DOMH, RANH, EQCH, EQPH, FPH, IFPH, DOM, RAN, SCO, DISJ, COMP, INV, SPO, FP, IFP, SYMM, TRANS, DIF, SAM, RDFP1, RDFP2, RDFP3 , RDFP4, RDFP6, RDFP7, RDFP8AX, RDFP8BX, RDFP9, RDFP10, RDFP11, RDFP12A, RDFP12B, RDFP12C, RDFP13A, RDFP13B, RDFP13C, RDFP14A, RDFP14BX, RDFP15, RDFP16, RDFS2, RDFS3, RDFS4a, RDFS4b, RDFS5, RDFS6, RDFS7, RDFS8, RDFS9, RDFS10, RDFS11, RDFS12, RDFS13
The rules corresponding to components with a prefix of RDFP can be found in Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary, by H.J. Horst.
The syntax for deselecting a component is component_name followed by a minus (-) sign. For example, the following statement performs OWLPrime inference without calculating the subClassOf
hierarchy:
EXECUTE sem_apis.create_entailment('owltst_idx', sem_models('owltst'), sem_rulebases('OWLPRIME'), SEM_APIS.REACH_CLOSURE, 'SCOH-');
By default, the OWLPrime rulebase implements the transitive semantics of owl:sameAs. OWLPrime does not include the following rules (semantics):
U owl:sameAs V . U p X . ==> V p X . U owl:sameAs V . X p U . ==> X p V .
The reason for not including these rules is that they tend to generate many assertions. If you need to include these assertions, you can include the SAM
component code in the call to the SEM_APIS.CREATE_ENTAILMENT procedure.
You can use semantic operators to query relational data in an ontology-assisted manner, based on the semantic relationship between the data in a table column and terms in an ontology. The SEM_RELATED semantic operator retrieves rows based on semantic relatedness. The SEM_DISTANCE semantic operator returns distance measures for the semantic relatedness, so that rows returned by the SEM_RELATED operator can be ordered or restricted using the distance measure. The index type MDSYS.SEM_INDEXTYPE allows efficient execution of such queries, enabling scalable performance over large data sets.
Referring to the cancer ontology example in Section 2.1.1, consider the following query that requires semantic matching: Find all patients whose diagnosis is of the type 'Immune_System_Disorder'. A typical database query of the PATIENTS table (described in Section 2.1.1) involving syntactic match will not return any rows, because no rows have a DIAGNOSIS column containing the exact value Immune_System_Disorder
. For example the following query will not return any rows:
SELECT diagnosis FROM patients WHERE diagnosis = 'Immune_System_Disorder';
However, many rows in the patient data table are relevant, because their diagnoses fall under this class. Example 2-7 uses the SEM_RELATED operator (instead of lexical equality) to retrieve all the relevant rows from the patient data table. (In this example, the term Immune_System_Disorder
is prefixed with a namespace, and the default assumption is that the values in the table column also have a namespace prefix. However, that might not always be the case, as explained in Section 2.3.5.)
Example 2-7 SEM_RELATED Operator
SELECT diagnosis FROM patients WHERE SEM_RELATED (diagnosis, '<http://www.w3.org/2000/01/rdf-schema#subClassOf>', '<http://www.example.org/medical_terms/Immune_System_Disorder>', sem_models('medical_ontology'), sem_rulebases('owlprime')) = 1;
The SEM_RELATED operator has the following attributes:
SEM_RELATED( sub VARCHAR2, predExpr VARCHAR2, obj VARCHAR2, ontologyName SEM_MODELS, ruleBases SEM_RULEBASES, index_status VARCHAR2, lower_bound INTEGER, upper_bound INTEGER ) RETURN INTEGER;
The sub
attribute is the name of table column that is being searched. The terms in the table column are typically the subject in a <subject, predicate, object> triple pattern.
The predExpr
attribute represents the predicate that can appear as a label of the edge on the path from the subject node to the object node.
The obj
attribute represents the term in the ontology for which related terms (related by the predExpr
attribute) have to be found in the table (in the column specified by the sub
attribute). This term is typically the object in a <subject, predicate, object> triple pattern. (In a query with the equality operator, this would be the query term.)
The ontologyName
attribute is the name of the ontology that contains the relationships between terms.
The rulebases
attribute identifies one or more rulebases whose rules have been applied to the ontology to infer new relationships. The query will be answered based both on relationships from the ontology and the inferred new relationships when this attribute is specified.
The index_status
optional attribute lets you query the data even when the relevant rules index (created when the specified rulebase was applied to the ontology) does not have a valid status. If this attribute is null, the query returns an error if the rules index does not have a valid status. If this attribute is not null, it must be the string VALID
, INCOMPLETE
, or INVALID
, to specify the minimum status of the rules index for the query to succeed. Because OWL does not guarantee monotonicity, the value INCOMPLETE
should not be used when an OWL Rulebase is specified.
The lower_bound
and upper_bound
optional attributes let you specify a bound on the distance measure of the relationship between terms that are related. See Section 2.3.2 for the description of the distance measure.
The SEM_RELATED operator returns 1 if the two input terms are related with respect to the specified predExpr
relationship within the ontology, and it returns 0 if the two input terms are not related. If the lower and upper bounds are specified, it returns 1 if the two input terms are related with a distance measure that is greater than or equal to lower_bound
and less than or equal to upper_bound
.
The SEM_DISTANCE ancillary operator computes the distance measure for the rows filtered using the SEM_RELATED operator. The SEM_DISTANCE operator has the following format:
SEM_DISTANCE (number) RETURN NUMBER;
The number
attribute can be any number, as long as it matches the number that is the last attribute specified in the call to the SEM_RELATED operator (see Example 2-8). The number is used to match the invocation of the ancillary operator SEM_DISTANCE with a specific SEM_RELATED (primary operator) invocation, because a query can have multiple invocations of primary and ancillary operators.
Example 2-8 expands Example 2-7 to show several statements that include the SEM_DISTANCE ancillary operator, which gives a measure of how closely the two terms (here, a patient's diagnosis and the term Immune_System_Disorder
) are related by measuring the distance between the terms. Using the cancer ontology described in Section 2.1.1, the distance between AIDS
and Immune_System_Disorder
is 3.
Example 2-8 SEM_DISTANCE Ancillary Operator
SELECT diagnosis, SEM_DISTANCE(123) FROM patients WHERE SEM_RELATED (diagnosis, '<http://www.w3.org/2000/01/rdf-schema#subClassOf>', '<http://www.example.org/medical_terms/Immune_System_Disorder>', sem_models('medical_ontology'), sem_rulebases('owlprime'), 123) = 1; SELECT diagnosis FROM patients WHERE SEM_RELATED (diagnosis, '<http://www.w3.org/2000/01/rdf-schema#subClassOf>', '<http://www.example.org/medical_terms/Immune_System_Disorder>', sem_models('medical_ontology'), sem_rulebases('owlprime'), 123) = 1 ORDER BY SEM_DISTANCE(123); SELECT diagnosis, SEM_DISTANCE(123) FROM patients WHERE SEM_RELATED (diagnosis, '<http://www.w3.org/2000/01/rdf-schema#subClassOf>', '<http://www.example.org/medical_terms/Immune_System_Disorder>', sem_models('medical_ontology'), sem_rulebases('owlprime'), 123) = 1 WHERE SEM_DISTANCE(123) <= 3;
Example 2-9 uses distance information to restrict the number of rows returned by the primary operator. All rows with a term related to the object attribute specified in the SEM_RELATED invocation, but with a distance of greater than or equal to 2 and less than or equal to 4, are retrieved.
Example 2-9 Using SEM_DISTANCE to Restrict the Number of Rows Returned
SELECT diagnosis FROM patients WHERE SEM_RELATED (diagnosis, '<http://www.w3.org/2000/01/rdf-schema#subClassOf>', '<http://www.example.org/medical_terms/Immune_System_Disorder>', sem_models('medical_ontology'), sem_rulebases('owlprime'), 2, 4) = 1;
In Example 2-9, the lower and upper bounds are specified using the lower_bound
and upper_bound
parameters in the SEM_RELATED operator instead of using the SEM_DISTANCE operator. The SEM_DISTANCE operator can be also be used for restricting the rows returned, as shown in the last SELECT statement in Example 2-8.
Distances are generated for the following properties during inference (entailment): OWL properties defined as transitive properties, and RDFS subClassOf
and RDFS subPropertyOf
properties. The distance between two terms linked through these properties is computed as the shortest distance between them in a hierarchical class structure. Distances of two terms linked through other properties are undefined and therefore set to null.
Each transitive property link in the original model (viewed as a hierarchical class structure) has a distance of 1, and the distance of an inferred triple is generated according to the number of links between the two terms. Consider the following hypothetical sample scenarios:
If the original graph contains C1 rdfs:subClassOf C2
and C2 rdfs:subClassOf C3
, then C1 rdfs:subClassof of C3
will be derived. In this case:
C1 rdfs:subClassOf C2
: distance = 1, because it exists in the model.
C2 rdfs:subClassOf C3
: distance = 1, because it exists in the model.
C1 rdfs:subClassOf C3
: distance = 2, because it is generated during inference.
If the original graph contains P1 rdfs:subPropertyOf P2
and P2 rdfs:subPropertyOf P3
, then P1 rdfs:subPropertyOf P3 will be derived. In this case:
P1 rdfs:subPropertyOf P2
: distance = 1, because it exists in the model.
P2 rdfs:subPropertyOf P3
: distance = 1, because it exists in the model.
P1 rdfs:subPropertyOf P3
: distance = 2, because it is generated during inference.
If the original graph contains C1 owl:equivalentClass C2
and C2 owl:equivalentClass C3
, then C1 owl:equivalentClass C3
will be derived. In this case:
C1 owl:equivalentClass C2
: distance = 1, because it exists in the model.
C2 owl:equivalentClass C3
: distance = 1, because it exists in the model.
C1 owl:equivalentClass C3
: distance = 2, because it is generated during inference.
The SEM_RELATED operator works with user-defined rulebases. However, using the SEM_DISTANCE operator with a user-defined rulebase is not yet supported, and will raise an error.
When using the SEM_RELATED operator, you can create a semantic index of type MDSYS.SEM_INDEXTYPE on the column that contains the ontology terms. Creating such an index will result in more efficient execution of the queries. The CREATE INDEX statement must contain the INDEXTYPE IS MDSYS.SEM_INDEXTYPE
clause, to specify the type of index being created.
Example 2-10 creates a semantic index named DIAGNOSIS_SEM_IDX on the DIAGNOSIS column of the PATIENTS table using the Cancer_Ontology
ontology.
Example 2-10 Creating a Semantic Index
CREATE INDEX diagnosis_sem_idx ON patients (diagnosis) INDEXTYPE IS MDSYS.SEM_INDEXTYPE;
The column on which the index is built (DIAGNOSIS in Example 2-10) must be the first parameter to the SEM_RELATED operator, in order for the index to be used. If it not the first parameter, the index is not used during the execution of the query.
To improve the performance of certain semantic queries, you can cause statistical information to be generated for the semantic index by specifying one or more models and rulebases when you create the index. Example 2-11 creates an index that will also generate statistics information for the specified model and rulebase. The index can be used with other models and rulebases during query, but the statistical information will be used only if the model and rulebase specified during the creation of the index are the same model and rulebase specified in the query.
Example 2-11 Creating a Semantic Index Specifying a Model and Rulebase
CREATE INDEX diagnosis_sem_idx ON patients (diagnosis) INDEXTYPE IS MDSYS.SEM_INDEXTYPE('ONTOLOGY_MODEL(medical_ontology), RULEBASE(OWLPrime)');
The statistical information is useful for queries that return top-k results sorted by semantic distance. Example 2-12 shows such a query.
Example 2-12 Query Benefitting from Generation of Statistical Information
SELECT /*+ FIRST_ROWS */ diagnosis FROM patients WHERE SEM_RELATED (diagnosis, '<http://www.w3.org/2000/01/rdf-schema#subClassOf>', '<http://www.example.org/medical_terms/Immune_System_Disorder>', sem_models('medical_ontology'), sem_rulebases('owlprime'), 123) = 1 ORDER BY SEM_DISTANCE(123);
If an index of type MDSYS.SEM_INDEXTYPE has been created on a table column that is the first parameter to the SEM_RELATED operator, the index will be used. For example, the following query retrieves all rows that have a value in the DIAGNOSIS column that is a subclass of (rdfs:subClassOf
) Immune_System_Disorder
.
SELECT diagnosis FROM patients WHERE SEM_RELATED (diagnosis, '<http://www.w3.org/2000/01/rdf-schema#subClassOf>', '<http://www.example.org/medical_terms/Immune_System_Disorder>', sem_models('medical_ontology'), sem_rulebases('owlprime')) = 1;
Assume, however, that this query instead needs to retrieve all rows that have a value in the DIAGNOSIS column for which Immune_System_Disorder
is a subclass. You could rewrite the query as follows:
SELECT diagnosis FROM patients WHERE SEM_RELATED ('<http://www.example.org/medical_terms/Immune_System_Disorder>', '<http://www.w3.org/2000/01/rdf-schema#subClassOf>', diagnosis, sem_models('medical_ontology'), sem_rulebases('owlprime')) = 1;
However, in this case a semantic index on the DIAGNOSIS column will not be used, because it is not the first parameter to the SEM_RELATED operator. To cause the index to be used, you can change the preceding query to use the inverseOf
keyword, as follows:
SELECT diagnosis FROM patients WHERE SEM_RELATED (diagnosis, 'inverseOf(http://www.w3.org/2000/01/rdf-schema#subClassOf)', '<http://www.example.org/medical_terms/Immune_System_Disorder>', sem_models('medical_ontology'), sem_rulebases('owlprime')) = 1;
This form causes the table column (on which the index is built) to be the first parameter to the SEM_RELATED operator, and it retrieves all rows that have a value in the DIAGNOSIS column for which Immune_System_Disorder
is a subclass.
By default, the semantic operator support assumes that the values stored in the table are URIs. These URIs can be from different namespaces. However, if the values in the table do not have URIs, you can use the URIPREFIX keyword to specify a URI when you create the semantic index. In this case, the specified URI is prefixed to the value in the table and stored in the index structure. (One implication is that multiple URIs cannot be used).
Example 2-13 creates a semantic index that uses a URI prefix.
Example 2-13 Specifying a URI Prefix During Semantic Index Creation
CREATE INDEX diagnosis_sem_idx ON patients (diagnosis) INDEXTYPE IS MDSYS.SEM_INDEXTYPE PARAMETERS('URIPREFIX(<http://www.example.org/medical/>)');
Note that the slash (/) character at the end of the URI is important, because the URI is prefixed to the table value (in the index structure) without any parsing.