Oracle9i Database Utilities Release 2 (9.2) Part Number A96652-01 |
|
This chapter describes the field-list portion of the SQL*Loader control file. The following topics are included:
The field-list portion of a SQL*Loader control file provides information about fields being loaded, such as position, datatype, conditions, and delimiters.
Example 6-1 shows the field list section of the sample control file that was introduced in Chapter 5.
. . . 1 (hiredate SYSDATE, 2 deptno POSITION(1:2) INTEGER EXTERNAL(2) NULLIF deptno=BLANKS, 3 job POSITION(7:14) CHAR TERMINATED BY WHITESPACE NULLIF job=BLANKS "UPPER(:job)", mgr POSITION(28:31) INTEGER EXTERNAL TERMINATED BY WHITESPACE, NULLIF mgr=BLANKS, ename POSITION(34:41) CHAR TERMINATED BY WHITESPACE "UPPER(:ename)", empno POSITION(45) INTEGER EXTERNAL TERMINATED BY WHITESPACE, sal POSITION(51) CHAR TERMINATED BY WHITESPACE "TO_NUMBER(:sal,'$99,999.99')", 4 comm INTEGER EXTERNAL ENCLOSED BY '(' AND '%' ":comm * 100" )
In this sample control file, the numbers that appear to the left would not appear in a real control file. They are keyed in this sample to the explanatory notes in the following list:
SYSDATE
sets the column to the current system date. See Setting a Column to the Current Date.POSITION
specifies the position of a data field. See Specifying the Position of a Data Field.
INTEGER
EXTERNAL
is the datatype for the field. See Specifying the Datatype of a Data Field and Numeric EXTERNAL.
The NULLIF
clause is one of the clauses that can be used to specify field conditions. See Using the WHEN, NULLIF, and DEFAULTIF Clauses.
In this sample, the field is being compared to blanks, using the BLANKS
parameter. See Comparing Fields to BLANKS.
TERMINATED
BY
WHITESPACE
clause is one of the delimiters it is possible to specify for a field. See TERMINATED Fields.ENCLOSED
BY
clause is another possible field delimiter. See Enclosed Fields.To load data from the datafile, SQL*Loader must know the length and location of the field. To specify the position of a field in the logical record, use the POSITION
clause in the column specification. The position may either be stated explicitly or relative to the preceding field. Arguments to POSITION
must be enclosed in parentheses. The start, end, and integer values are always in bytes, even if character-length semantics are used for a datafile.
The syntax for the position specification (pos_spec) clause is as follows:
Text description of the illustration pos_spec.gif
Table 6-1 describes the parameters for the position specification clause.
You may omit POSITION
entirely. If you do, the position specification for the data field is the same as if POSITION(*)
had been used.
When you are determining field positions, be alert for tabs in the datafile. The following situation is highly likely when you use the SQL*Loader advanced SQL string capabilities to load data from a formatted report:
These kinds of errors occur when the data contains tabs. When printed, each tab expands to consume several columns on the paper. In the datafile, however, each tab is still only one character. As a result, when SQL*Loader reads the datafile, the POSITION
specifications are wrong.
To fix the problem, inspect the datafile for tabs and adjust the POSITION
specifications, or else use delimited fields.
In a multiple table load, you specify multiple INTO
TABLE
clauses. When you specify POSITION(*)
for the first column of the first table, the position is calculated relative to the beginning of the logical record. When you specify POSITION(*)
for the first column of subsequent tables, the position is calculated relative to the last column of the last table loaded.
Thus, when a subsequent INTO
TABLE
clause begins, the position is not set to the beginning of the logical record automatically. This allows multiple INTO
TABLE
clauses to process different parts of the same physical record. For an example, see Extracting Multiple Logical Records.
A logical record might contain data for one of two tables, but not both. In this case, you would reset POSITION
. Instead of omitting the position specification or using POSITION(*+
n
)
for the first field in the INTO TABLE
clause, use POSITION(1)
or POSITION(
n
)
.
siteid POSITION (*) SMALLINT siteloc POSITION (*) INTEGER
If these were the first two column specifications, siteid
would begin in column 1, and siteloc
would begin in the column immediately following.
ename POSITION (1:20) CHAR empno POSITION (22-26) INTEGER EXTERNAL allow POSITION (*+2) INTEGER EXTERNAL TERMINATED BY "/"
Column ename
is character data in positions 1 through 20, followed by column empno
, which is presumably numeric data in columns 22 through 26. Column allow
is offset from the next position (27) after the end of empno
by +2, so it starts in column 29 and continues until a slash is encountered.
You may load any number of a table's columns. Columns defined in the database, but not specified in the control file, are assigned null values.
A column specification is the name of the column, followed by a specification for the value to be put in that column. The list of columns is enclosed by parentheses and separated with commas as follows:
(
columnspec,columnspec,
...
)
Each column name must correspond to a column of the table named in the INTO TABLE
clause. A column name must be enclosed in quotation marks if it is a SQL or SQL*Loader reserved word, contains special characters, or is case sensitive.
If the value is to be generated by SQL*Loader, the specification includes the RECNUM
, SEQUENCE
, or CONSTANT
parameter. See Using SQL*Loader to Generate Data for Input.
If the column's value is read from the datafile, the data field that contains the column's value is specified. In this case, the column specification includes a column name that identifies a column in the database table, and a field specification that describes a field in a data record. The field specification includes position, datatype, null restrictions, and defaults.
It is not necessary to specify all attributes when loading column objects. Any missing attributes will be set to NULL
.
A filler field, specified by FILLER
, is a datafile mapped field that does not correspond to a database column. Filler fields are assigned values from the data fields to which they are mapped.
Keep the following in mind with regard to filler fields:
FILLER.
init_specs
(for example, NULLIF
and DEFAULTIF
).SID
, OID
, REF
, BFILE
).NULLIF
, DEFAULTIF
, and WHEN
clauses. However, they cannot be used in SQL strings.NULLIF
or DEFAULTIF
clause.NULL
if TRAILING NULLCOLS
is specified and applicable. If another field references a nullified filler field, an error is generated.VARRAY.
A sample filler field specification looks as follows:
field_1_count FILLER char, field_1 varray count(field_1_count) ( filler_field1 char(2), field_1 column object ( attr1 char(2), filler_field2 char(2), attr2 char(2), ) filler_field3 char(3), ) filler_field4 char(6)
The datatype specification of a field tells SQL*Loader how to interpret the data in the field. For example, a datatype of INTEGER
specifies binary data, while INTEGER
EXTERNAL
specifies character data that represents a number. A CHAR
field can contain any character data.
Only one datatype can be specified for each field; if a datatype is not specified, CHAR
is assumed.
SQL*Loader Datatypes describes how SQL*Loader datatypes are converted into Oracle datatypes and gives detailed information on each SQL*Loader datatype.
Before you specify the datatype, you must specify the position of the field.
SQL*Loader datatypes can be grouped into portable and nonportable datatypes. Within each of these two groups, the datatypes are subgrouped into value datatypes and length-value datatypes.
Portable versus nonportable refers to whether or not the datatype is platform dependent. Platform dependency can exist for a number of reasons, including differences in the byte ordering schemes of different platforms (big endian versus little endian), differences in the number of bits in a platform (16-bit, 32-bit, 64-bit), differences in signed number representation schemes (2's complement versus 1's complement), and so on. In some cases, such as with byte ordering schemes and platform word length, SQL*Loader provides mechanisms to help overcome platform dependencies. These mechanisms are discussed in the descriptions of the appropriate datatypes.
Both portable and nonportable datatypes can be values or length-values. Value datatypes assume that a data field has a single part. Length-value datatypes require that the data field consist of two subfields where the length subfield specifies how long the value subfield can be.
Nonportable datatypes are grouped into value datatypes and length-value datatypes. The nonportable value datatypes are as follows:
The nonportable length-value datatypes are as follows:
The syntax for the nonportable datatypes is shown in the syntax diagram for datatype_spec.
The data is a full-word binary integer, where n
is an optionally supplied length of 1, 2, 4, or 8. If no length specification is given, then the length, in bytes, is based on the size of a LONG
INT
in the C programming language on your particular platform.
INTEGER
s are not portable because their byte size, their byte order, and the representation of signed values may be different between systems. However, if the representation of signed values is the same between systems, SQL*Loader may be able to access INTEGER
data with correct results. If INTEGER
is specified with a length specification (n
), and the appropriate technique is used (if necessary) to indicate the byte order of the data, then SQL*Loader can access the data with correct results between systems. If INTEGER
is specified without a length specification, then SQL*Loader can access the data with correct results only if the size of a LONG
INT
in the C programming language is the same length in bytes on both systems. In that case, the appropriate technique must still be used (if necessary) to indicated the byte order of the data.
Specifying an explicit length for binary integers is useful in situations where the input data was created on a platform whose word length differs from that on which SQL*Loader is running. For instance, input data containing binary integers might be created on a 64-bit platform and loaded into a database using SQL*Loader on a 32-bit platform. In this case, use INTEGER(8)
to instruct SQL*Loader to process the integers as 8-byte quantities, not as 4-byte quantities.
By default, INTEGER
is treated as a SIGNED
quantity. If you want SQL*Loader to treat it as an unsigned quantity, specify UNSIGNED
. To return to the default behavior, specify SIGNED
.
The data is a half-word binary integer. The length of the field is the length of a half-word integer on your system. By default, it is treated as a SIGNED
quantity. If you want SQL*Loader to treat it as an unsigned quantity, specify UNSIGNED
. To return to the default behavior, specify SIGNED
.
SMALLINT
can be loaded with correct results only between systems where a SHORT INT
has the same length in bytes. If the byte order is different between the systems, use the appropriate technique to indicate the byte order of the data. See Byte Ordering.
The data is a single-precision, floating-point, binary number. If you specify end
in the POSITION
clause, end
is ignored. The length of the field is the length of a single-precision, floating-point binary number on your system. (The datatype is FLOAT
in C.) This length cannot be overridden in the control file.
FLOAT
can be loaded with correct results only between systems where the representation of a FLOAT
is compatible and of the same length. If the byte order is different between the two systems, use the appropriate technique to indicate the byte order of the data. See Byte Ordering.
The data is a double-precision, floating-point binary number. If you specify end
in the POSITION
clause, end
is ignored. The length of the field is the length of a double-precision, floating-point binary number on your system. (The datatype is DOUBLE
or LONG FLOAT
in C.) This length cannot be overridden in the control file.
DOUBLE
can be loaded with correct results only between systems where the representation of a DOUBLE
is compatible and of the same length. If the byte order is different between the two systems, use the appropriate technique to indicate the byte order of the data. See Byte Ordering.
The decimal value of the binary representation of the byte is loaded. For example, the input character x"1C" is loaded as 28. The length of a BYTEINT
field is always 1 byte. If POSITION
(start:end)
is specified, end
is ignored. (The datatype is UNSIGNED CHAR
in C.)
An example of the syntax for this datatype is:
(column1 position(1) BYTEINT, column2 BYTEINT, ... )
ZONED
data is in zoned decimal format: a string of decimal digits, one per byte, with the sign included in the last byte. (In COBOL, this is a SIGN TRAILING
field.) The length of this field is equal to the precision (number of digits) that you specify.
The syntax for the ZONED
datatype is:
Text description of the illustration zoned.gif
In this syntax, precision
is the number of digits in the number, and scale
(if given) is the number of digits to the right of the (implied) decimal point. The following example specifies an 8-digit integer starting at position 32:
sal POSITION(32) ZONED(8),
The Oracle database server uses the VAX/VMS zoned decimal format when the zoned data is generated on an ASCII-based platform. It is also possible to load zoned decimal data that is generated on an EBCDIC-based platform. In this case, Oracle uses the IBM format as specified in the ESA/390 Principles of Operations, version 8.1 manual. The format that is used depends on the character set encoding of the input datafile. See CHARACTERSET Parameter for more information.
DECIMAL
data is in packed decimal format: two digits per byte, except for the last byte, which contains a digit and sign. DECIMAL
fields allow the specification of an implied decimal point, so fractional values can be represented.
The syntax for the DECIMAL
datatype is:
Text description of the illustration decimal.gif
The precision
parameter is the number of digits in a value. The length of the field in bytes, as computed from digits, is (N+1)/2 rounded up.
The scale
parameter is the scaling factor, or number of digits to the right of the decimal point. The default is zero (indicating an integer). The scaling factor can be greater than the number of digits but cannot be negative.
An example is:
sal DECIMAL (7,2)
This example would load a number equivalent to +12345.67. In the data record, this field would take up 4 bytes. (The byte length of a DECIMAL
field is equivalent to (N+1)/2, rounded up, where N
is the number of digits in the value, and 1 is added for the sign.)
The data is a varying-length, double-byte character string. It consists of a length subfield followed by a string of double-byte characters (DBCS). The Oracle database server does not support DBCS; however, SQL*Loader reads DBCS as single bytes and loads it as RAW
data. Like RAW
data, VARGRAPHIC
fields are stored without modification in whichever column you specify.
Note: The size of the length subfield is the size of the SQL*Loader |
VARGRAPHIC
data can be loaded with correct results only between systems where a SHORT INT
has the same length in bytes. If the byte order is different between the systems, use the appropriate technique to indicate the byte order of the length subfield. See Byte Ordering.
The syntax for the VARGRAPHIC
datatype is:
Text description of the illustration vargraph.gif
The length of the current field is given in the first 2 bytes. A maximum length specified for the VARGRAPHIC
datatype does not include the size of the length subfield. The maximum length specifies the number of graphic (double-byte) characters. It is multiplied by 2 to determine the maximum length of the field in bytes.
The default maximum field length is 2 KB graphic characters, or 4 KB
(2 * 2KB). To minimize memory requirements, specify a maximum length for such fields whenever possible.
If a position specification is specified (using pos_spec
) before the VARGRAPHIC
statement, it provides the location of the length subfield, not of the first graphic character. If you specify pos_spec
(start:end),
the end location determines a maximum length for the field. Both start
and end
identify single-character (byte) positions in the file. Start
is subtracted from (end + 1)
to give the length of the field in bytes. If a maximum length is specified, it overrides any maximum length calculated from the position specification.
If a VARGRAPHIC
field is truncated by the end of the logical record before its full length is read, a warning is issued. Because the length of a VARGRAPHIC
field is embedded in every occurrence of the input data for that field, it is assumed to be accurate.
VARGRAPHIC
data cannot be delimited.
A VARCHAR
field is a length-value datatype. It consists of a binary length subfield followed by a character string of the specified length. The length is in bytes unless character-length semantics are used for the datafile. In that case, the length is in characters. See Character-Length Semantics.
VARCHAR
fields can be loaded with correct results only between systems where a SHORT
data field INT
has the same length in bytes. If the byte order is different between the systems, or if the VARCHAR
field contains data in the UTF16 character set, use the appropriate technique to indicate the byte order of the length subfield and of the data. The byte order of the data is only an issue for the UTF16 character set. See Byte Ordering.
Note: The size of the length subfield is the size of the SQL*Loader |
The syntax for the VARCHAR
datatype is:
Text description of the illustration varchar.gif
A maximum length specified in the control file does not include the size of the length subfield. If you specify the optional maximum length for a VARCHAR
datatype, then a buffer of that size, in bytes, is allocated for these fields. However, if character-length semantics are used for the datafile, the buffer size in bytes is the max
_length
times the size in bytes of the largest possible character in the character set. See Character-Length Semantics.
The default maximum size is 4 KB. Specifying the smallest maximum length that is needed to load your data can minimize SQL*Loader's memory requirements, especially if you have many VARCHAR
fields.
The POSITION
clause, if used, gives the location, in bytes, of the length subfield, not of the first text character. If you specify POSITION
(start:end),
the end location determines a maximum length for the field. Start
is subtracted from (end + 1)
to give the length of the field in bytes. If a maximum length is specified, it overrides any length calculated from POSITION
.
If a VARCHAR
field is truncated by the end of the logical record before its full length is read, a warning is issued. Because the length of a VARCHAR
field is embedded in every occurrence of the input data for that field, it is assumed to be accurate.
VARCHAR
data cannot be delimited.
VARRAW
is made up of a 2-byte binary length subfield followed by a RAW
string value subfield.
VARRAW
results in a VARRAW
with a 2-byte length subfield and a maximum size of 4 KB (that is, the default). VARRAW(65000)
results in a VARRAW
with a length subfield of 2 bytes and a maximum size of 65000 bytes.
VARRAW
fields can be loaded between systems with different byte orders if the appropriate technique is used to indicate the byte order of the length subfield. See Byte Ordering.
LONG VARRAW
is a VARRAW
with a 4-byte length subfield instead of a 2-byte length subfield.
LONG VARRAW
results in a VARRAW
with 4-byte length subfield and a maximum size of 4 KB (that is, the default). LONG VARRAW(300000)
results in a VARRAW
with a length subfield of 4 bytes and a maximum size of 300000 bytes.
LONG VARRAW
fields can be loaded between systems with different byte orders if the appropriate technique is used to indicate the byte order of the length subfield. See Byte Ordering.
The portable datatypes are grouped into value datatypes and length-value datatypes. The portable value datatypes are as follows:
CHAR
GRAPHIC
GRAPHIC
EXTERNAL
EXTERNAL
(INTEGER
, FLOAT
, DECIMAL
, ZONED
)RAW
The portable length-value datatypes are as follows:
The syntax for these datatypes is shown in the diagram for datatype_spec.
The character datatypes are CHAR
, DATE
, and the numeric EXTERNAL
datatypes. These fields can be delimited and can have lengths (or maximum lengths) specified in the control file.
The data field contains character data. The length, which is optional, is a maximum length. Note the following with regard to length:
POSITION
specification.POSITION
specification.CHAR
data is assumed to have a length of 1, unless the field is delimited:
CHAR
field, if a length is specified, that length is used as a maximum.CHAR
field for which no length is specified, the default is 255 bytes.CHAR
field that is greater than 255 bytes, you must specify a maximum length. Otherwise you will receive an error stating that the field in the datafile exceeds maximum length.The syntax for the CHAR
datatype is:
Text description of the illustration char.gif
The datetime datatypes are:
Values of datetime datatypes are sometimes called datetimes.
The interval datatypes are:
Values of interval datatypes are sometimes called intervals.
Both datetimes and intervals are made up of fields. The values of these fields determine the value of the datatype.
See Also:
Oracle9i SQL Reference for more detailed information about datetime and interval datatypes |
The DATE
field contains character data that should be converted to an Oracle date using the specified date mask. The syntax for the DATE
field is:
Text description of the illustration date.gif
For example:
LOAD DATA INTO TABLE dates (col_a POSITION (1:15) DATE "DD-Mon-YYYY") BEGINDATA 1-Jan-1991 1-Apr-1991 28-Feb-1991
Whitespace is ignored and dates are parsed from left to right unless delimiters are present. (A DATE
field that consists entirely of whitespace is loaded as a NULL
field.)
The length specification is optional, unless a varying-length date mask is specified. The length is in bytes unless character-length semantics are used for the datafile. In that case, the length is in characters. See Character-Length Semantics.
In the preceding example, the date mask, "DD-Mon-YYYY"
contains 11 bytes, with byte-length semantics. Therefore, SQL*Loader expects a maximum of 11 bytes in the field, so the specification works properly. But, suppose a specification such as the following is given:
DATE "Month dd, YYYY"
In this case, the date mask contains 14 bytes. If a value with a length longer than 14 bytes is specified, such as "September 30, 1991"
, a length must be specified.
Similarly, a length is required for any Julian dates (date mask "J"). A field length is required any time the length of the date string could exceed the length of the mask (that is, the count of bytes in the mask).
If an explicit length is not specified, it can be derived from the POSITION
clause. It is a good idea to specify the length whenever you use a mask, unless you are absolutely sure that the length of the data is less than, or equal to, the length of the mask.
An explicit length specification, if present, overrides the length in the POSITION
clause. Either of these overrides the length derived from the mask. The mask may be any valid Oracle date mask. If you omit the mask, the default Oracle date mask of "dd-mon-yy" is used.
The length must be enclosed in parentheses and the mask in quotation marks. Case Study 3: Loading a Delimited, Free-Format File provides an example of the DATE
datatype.
A field of datatype DATE
may also be specified with delimiters. For more information, see Specifying Delimiters.
The TIME
datatype stores hour, minute, and second values. For example:
09:26:50
The TIMESTAMP
datatype is an extension of the DATE
datatype. It stores the year, month, and day of the DATE
datatype, plus the hour, minute, and second values of the TIME
datatype. An example TIMESTAMP
is as follows:
TIMESTAMP '1999-01-31 09:26:50'
If you specify a date value without a time component, the default time is 12:00:00 AM (midnight).
The TIME
WITH
TIME
ZONE
datatype is a variant of TIME
that includes a time zone displacement in its value. The time zone displacement is the difference (in hours and minutes) between local time and UTC (coordinated universal time, formerly Greenwich mean time).
If the LOCAL
option is specified, then data stored in the database is normalized to the database time zone, and time zone displacement is not stored as part of the column data. When the data is retrieved, it is returned in the user's local session time zone.
The TIMESTAMP WITH TIME ZONE
datatype is a variant of TIMESTAMP
that includes a time zone displacement in its value. The time zone displacement is the difference (in hours and minutes) between local time and UTC (coordinated universal time, formerly Greenwich mean time).
If the LOCAL
option is specified, then data stored in the database is normalized to the database time zone, and time zone displacement is not stored as part of the column data. When the data is retrieved, it is returned in the user's local session time zone.
The INTERVAL
YEAR
TO
MONTH
datatype stores a period of time using the YEAR
and MONTH
datetime fields.
The INTERVAL
DAY
TO
SECOND
datatype stores a period of time using the DAY
and SECOND
datetime fields.
The data is a string of double-byte characters (DBCS). The Oracle database server does not support DBCS; however, SQL*Loader reads DBCS as single bytes. Like RAW
data, GRAPHIC
fields are stored without modification in whichever column you specify.
The syntax for the GRAPHIC
datatype is:
Text description of the illustration graphic.gif
For GRAPHIC
and GRAPHIC
EXTERNAL
, specifying POSITION
(start:end)
gives the exact location of the field in the logical record.
If you specify a length for the GRAPHIC
(EXTERNAL)
datatype, however, then you give the number of double-byte graphic characters. That value is multiplied by 2 to find the length of the field in bytes. If the number of graphic characters is specified, then any length derived from POSITION
is ignored. No delimited data field specification is allowed with GRAPHIC
datatype specification.
If the DBCS field is surrounded by shift-in and shift-out characters, use GRAPHIC EXTERNAL
. This is identical to GRAPHIC
, except that the first and last characters (the shift-in and shift-out) are not loaded.
The syntax for the GRAPHIC
EXTERNAL
datatype is:
Text description of the illustration graf_ext.gif
GRAPHIC
indicates that the data is double-byte characters. EXTERNAL
indicates that the first and last characters are ignored. The graphic_char_length
value specifies the length in DBCS (see GRAPHIC).
For example, let [ ] represent shift-in and shift-out characters, and let # represent any double-byte character.
To describe ####, use POSITION(1:4) GRAPHIC
or POSITION(1) GRAPHIC(2)
.
To describe [####], use POSITION(1:6) GRAPHIC EXTERNAL
or POSITION(1) GRAPHIC EXTERNAL(2)
.
The numeric EXTERNAL
datatypes are the numeric datatypes (INTEGER, FLOAT,
DECIMAL,
and ZONED
) specified as EXTERNAL
, with optional length and delimiter specifications. The length is in bytes unless character-length semantics are used for the datafile. In that case, the length is in characters. See Character-Length Semantics.
These datatypes are the human-readable, character form of numeric data. The same rules that apply to CHAR
data with regard to length, position, and delimiters apply to numeric EXTERNAL
data. See CHAR for a complete description of these rules.
The syntax for the numeric EXTERNAL
datatypes is shown as part of datatype_spec.
Note: The data is a number in character form, not binary representation. Therefore, these datatypes are identical to |
FLOAT EXTERNAL
data can be given in either scientific or regular notation. Both "5.33" and "533E-2" are valid representations of the same value.
When raw, binary data is loaded "as is" into a RAW
database column, it is not converted by the Oracle database server. If it is loaded into a CHAR
column, the Oracle database server converts it to hexadecimal. It cannot be loaded into a DATE
or number column.
The syntax for the RAW
datatype is as follows:
Text description of the illustration raw.gif
The length of this field is the number of bytes specified in the control file. This length is limited only by the length of the target column in the database and by memory resources. The length is always in bytes, even if character-length semantics are used for the datafile. RAW
data fields cannot be delimited.
The datatype VARCHARC
consists of a character length subfield followed by a character string value-subfield.
The declaration for VARCHARC
specifies the length of the length subfield, optionally followed by the maximum size of any string. If byte-length semantics are in use for the datafile, then the length and the maximum size are both in bytes. If character-length semantics are in use for the datafile, then the length and maximum size are in characters. If a maximum size is not specified, 4 KB is the default regardless of whether byte-length semantics or character-length semantics are in use.
For example:
VARCHARC
results in an error because you must at least specify a value for the length subfield.VARCHARC(7)
results in a VARCHARC
whose length subfield is 7 bytes long and whose maximum size is 4 KB (the default) if byte-length semantics are used for the datafile. If character-length semantics are used, it results in a VARCHARC
with a length subfield that is 7 characters long and a maximum size of 4 KB (the default). Remember that when a maximum size is not specified, the default of 4 KB is always used, regardless of whether byte-length or character-length semantics are in use.VARCHARC(3,500)
results in a VARCHARC
whose length subfield is 3 bytes long and whose maximum size is 500 bytes if byte-length semantics are used for the datafile. If character-length semantics are used, it results in a VARCHARC
with a length subfield that is 3 characters long and a maximum size of 500 characters.See Character-Length Semantics.
The datatype VARRAWC
consists of a RAW
string value subfield.
For example:
VARRAWC
results in an error.VARRAWC(7)
results in a VARRAWC
whose length subfield is 7 bytes long and whose maximum size is 4 KB (that is, the default).VARRAWC(3,500)
results in a VARRAWC
whose length subfield is 3 bytes long and whose maximum size is 500 bytes.There are several ways to specify a length for a field. If multiple lengths are specified and they conflict, then one of the lengths takes precedence. A warning is issued when a conflict exists. The following rules determine which field length is used:
SMALLINT
, FLOAT
, and DOUBLE
data is fixed, regardless of the number of bytes specified in the POSITION
clause.DECIMAL
, INTEGER
, ZONED
, GRAPHIC
, GRAPHIC EXTERNAL
, or RAW
field conflicts with the size calculated from a POSITION
(start:end)
specification, then the specified length (or precision) is used.VARGRAPHIC
field conflicts with the size calculated from a POSITION
(start:end)
specification, then the specified maximum is used.For example, assume that the native datatype INTEGER
is 4 bytes long and the following field specification is given:
column1 POSITION(1:6) INTEGER
In this case, a warning is issued, and the proper length (4) is used. The log file shows the actual length used under the heading "Len" in the column table:
Column Name Position Len Term Encl Datatype ----------------------- --------- ----- ---- ---- --------- COLUMN1 1:6 4 INTEGER
A control file can specify a maximum length for the following length-value datatypes: VARCHAR
, VARCHARC
, VARGRAPHIC
, VARRAW
, and VARRAWC
. The specified maximum length is in bytes if byte-length semantics are used for the field, and in characters if character-length semantics are used for the field. If no length is specified, the maximum length defaults to 4096 bytes. If the length of the field exceeds the maximum length, the record is rejected with the following error:
Variable length field exceed maximum length
The datatype specifications in the control file tell SQL*Loader how to interpret the information in the datafile. The server defines the datatypes for the columns in the database. The link between these two is the column name specified in the control file.
SQL*Loader extracts data from a field in the input file, guided by the datatype specification in the control file. SQL*Loader then sends the field to the server to be stored in the appropriate column (as part of an array of row inserts).
SQL*Loader or the server does any necessary data conversion to store the data in the proper internal format. This includes converting data from the datafile character set to the database character set when they differ.
The datatype of the data in the file does not need to be the same as the datatype of the column in the Oracle table. The Oracle database server automatically performs conversions, but you need to ensure that the conversion makes sense and does not generate errors. For instance, when a datafile field with datatype CHAR
is loaded into a database column with datatype NUMBER
, you must make sure that the contents of the character field represent a valid number.
Table 6-2 shows which conversions between Oracle database datatypes and SQL*Loader control file datetime and interval datatypes are supported and which are not.
In the table, the abbreviations for the Oracle Database Datatypes are as follows:
N = NUMBER
C = CHAR
or VARCHAR2
D = DATE
T = TIME
and TIME
WITH
TIME
ZONE
TS = TIMESTAMP
and TIMESTAMP
WITH
TIME
ZONE
YM = INTERVAL
YEAR
TO
MONTH
DS = INTERVAL
DAY
TO
SECOND
For the SQL*Loader datatypes, the definitions for the abbreviations in the table are the same for D, T, TS, YM, and DS. However, as noted in the previous section, SQL*Loader does not contain datatype specifications for Oracle internal datatypes such as NUMBER,CHAR
, and VARCHAR2
. However, any data that the Oracle database server is capable of converting can be loaded into these or other database columns.
For an example of how to read this table, look at the row for the SQL*Loader datatype DATE
(abbreviated as D). Reading across the row, you can see that datatype conversion is supported for the Oracle database datatypes of CHAR
, VARCHAR2
, DATE
, TIMESTAMP
, and TIMESTAMP
WITH
TIMEZONE
datatypes. However, conversion is not supported for the Oracle database datatypes NUMBER
, TIME
, TIME
WITH
TIME
ZONE
, INTERVAL
YEAR
TO
MONTH
, or INTERVAL
DAY
TO
SECOND
datatypes.
The boundaries of CHAR
, datetime, interval, or numeric EXTERNAL
fields may also be marked by specific delimiter characters contained in the input data record. The RAW
datatype may also be marked by delimiters, but only if it is in an input LOBFILE, and only if the delimiter is TERMINATED BY EOF
(end of file). You indicate how the field is delimited by using a delimiter specification after specifying the datatype.
Delimited data can be terminated or enclosed, as shown in the following syntax:
Text description of the illustration delim.gif
You can specify a TERMINATED BY
clause, an ENCLOSED BY
clause, or both. If both are used, the TERMINATED BY
clause must come first.
TERMINATED
fields are read from the starting position of the field up to, but not including, the first occurrence of the delimiter character. If the terminator delimiter is found in the first column position, the field is null.
If TERMINATED BY WHITESPACE
is specified, data is read until the first occurrence of a whitespace character (spaces, tabs, blanks, line feeds, form feeds, or carriage returns). Then the current position is advanced until no more adjacent whitespace characters are found. This allows field values to be delimited by varying amounts of whitespace. For more information about the syntax, see Syntax for Termination and Enclosure Specification.
ENCLOSED
fields are read by skipping whitespace until a nonwhitespace character is encountered. If that character is the delimiter, then data is read up to the second delimiter. Any other character causes an error.
If two delimiter characters are encountered next to each other, a single occurrence of the delimiter character is used in the data value. For example, 'DON''T' is stored as DON'T. However, if the field consists of just two delimiter characters, its value is null. For more information about the syntax, see Syntax for Termination and Enclosure Specification.
Text description of the illustration terminat.gif
Text description of the illustration enclose.gif
Table 6-3 describes the syntax for the termination and enclosure specification.
Here are some examples, with samples of the data they describe:
TERMINATED BY ',' a data string, ENCLOSED BY '"' "a data string" TERMINATED BY ',' ENCLOSED BY '"' "a data string", ENCLOSED BY '(' AND ')' (a data string)
Sometimes the punctuation mark that is a delimiter must also be included in the data. To make that possible, two adjacent delimiter characters are interpreted as a single occurrence of the character, and this character is included in the data. For example, this data:
(The delimiters are left parentheses, (, and right parentheses, )).)
with this field specification:
ENCLOSED BY "(" AND ")"
puts the following string into the database:
The delimiters are left parentheses, (, and right parentheses, ).
For this reason, problems can arise when adjacent fields use the same delimiters. For example, with the following specification:
field1 TERMINATED BY "/" field2 ENCLOSED by "/"
the following data will be interpreted properly:
This is the first string/ /This is the second string/
But if field1
and field2
were adjacent, then the results would be incorrect, because
This is the first string//This is the second string/
would be interpreted as a single character string with a "/" in the middle, and that string would belong to field1
.
The default maximum length of delimited data is 255 bytes. Therefore, delimited fields can require significant amounts of storage for the bind array. A good policy is to specify the smallest possible maximum value if the fields are shorter than 255 bytes. If the fields are longer than 255 bytes, then you must specify a maximum length for the field, either with a length specifier or with the POSITION
clause.
Trailing blanks are not loaded with nondelimited datatypes unless you specify PRESERVE
BLANKS
. If a data field is 9 characters long and contains the value DANIEL
bbb
, where bbb
is three blanks, it is loaded into the Oracle database as "DANIEL"
if declared as CHAR(9)
.
If you want the trailing blanks, you could declare it as CHAR(9)
TERMINATED
BY
':'
, and add a colon to the datafile so that the field is DANIEL
bbb
:
. This field is loaded as "DANIEL "
, with the trailing blanks. You could also specify PRESERVE
BLANKS
without the TERMINATED
BY
clause and obtain the same results.
A control file can specify multiple lengths for the character-data fields CHAR
, DATE
, and numeric EXTERNAL
. If conflicting lengths are specified, one of the lengths takes precedence. A warning is also issued when a conflict exists. This section explains which length is used.
If you specify a starting position and ending position for one of these fields, then the length of the field is determined by these specifications. If you specify a length as part of the datatype and do not give an ending position, the field has the given length. If starting position, ending position, and length are all specified, and the lengths differ, then the length given as part of the datatype specification is used for the length of the field, as follows:
POSITION(1:10) CHAR(15)
In this example, the length of the field is 15.
If a delimited field is specified with a length, or if a length can be calculated from the starting and ending positions, then that length is the maximum length of the field. The specified maximum length is in bytes if byte-length semantics are used for the field, and in characters if character-length semantics are used for the field. If no length is specified or can be calculated from the start and end positions, the maximum length defaults to 255 bytes. The actual length can vary up to that maximum, based on the presence of the delimiter.
If starting and ending positions are specified for the field, as well as delimiters, then only the position specification has any effect. Any enclosure or termination delimiters are ignored.
If the expected delimiter is absent, then the end of record terminates the field. If TRAILING NULLCOLS
is specified, remaining fields are null. If either the delimiter or the end of record produces a field that is longer than the maximum, SQL*Loader rejects the record and returns and error.
The length of a date field depends on the mask, if a mask is specified. The mask provides a format pattern, telling SQL*Loader how to interpret the data in the record. For example, assume the mask is specified as follows:
"Month dd, yyyy"
Then "May 3, 1991" would occupy 11 bytes in the record (with byte-length semantics), while "January 31, 1992" would occupy 16.
If starting and ending positions are specified, however, then the length calculated from the position specification overrides a length derived from the mask. A specified length such as DATE(12)
overrides either of those. If the date field is also specified with terminating or enclosing delimiters, then the length specified in the control file is interpreted as a maximum length for the field.
See Also:
Datetime and Interval Datatypes for more information on the |
A field condition is a statement about a field in a logical record that evaluates as true or false. It is used in the NULLIF
and DEFAULTIF
clauses, as well as in the WHEN
clause.
A field condition is similar to the condition in the CONTINUEIF
clause, with two important differences. First, positions in the field condition refer to the logical record, not to the physical record. Second, you can specify either a position in the logical record or the name of a field in the datafile (including filler fields).
The syntax for the field_condition
clause is as follows:
Text description of the illustration fld_cond.gif
The syntax for the pos_spec
clause is as follows:
Text description of the illustration pos_spec.gif
Table 6-4 describes the parameters used for the field condition clause. For a full description of the position specification parameters, see Table 6-1.
The BLANKS
parameter makes it possible to determine if a field of unknown length is blank.
For example, use the following clause to load a blank field as null:
full_fieldname
... NULLIFcolumn_name
=
BLANKS
The BLANKS
parameter recognizes only blanks, not tabs. It can be used in place of a literal string in any field comparison. The condition is true whenever the column is entirely blank.
The BLANKS
parameter also works for fixed-length fields. Using it is the same as specifying an appropriately sized literal string of blanks. For example, the following specifications are equivalent:
fixed_field
CHAR(2) NULLIFfixed_field
=BLANKSfixed_field
CHAR(2) NULLIFfixed_field
=" "
There can be more than one blank in a multibyte character set. It is a good idea to use the BLANKS
parameter with these character sets instead of specifying a string of blank characters.
The character string will match only a specific sequence of blank characters, while the BLANKS
parameter will match combinations of different blank characters. For more information on multibyte character sets, see Multibyte (Asian) Character Sets.
When a data field is compared to a literal string that is shorter than the data field, the string is padded. Character strings are padded with blanks, for example:
NULLIF (1:4)=" "
This example compares the data in position 1:4 with 4 blanks. If position 1:4 contains 4 blanks, then the clause evaluates as true.
Hexadecimal strings are padded with hexadecimal zeros, as in the following clause:
NULLIF (1:4)=X'FF'
This clause compares position 1:4 to hexadecimal 'FF000000'.
The following information applies to scalar fields. For nonscalar fields (column objects, LOBs, and collections), the WHEN
, NULLIF
, and DEFAULTIF
clauses are processed differently because nonscalar fields are more complex.
The results of a WHEN
, NULLIF
, or DEFAULTIF
clause can be different depending on whether the clause specifies a field name or a position.
If the WHEN
, NULLIF
, or DEFAULTIF
clause specifies a field name, SQL*Loader compares the clause to the evaluated value of the field. The evaluated value takes trimmed whitespace into consideration. See Trimming Whitespace for information about trimming blanks and tabs.
If the WHEN
, NULLIF
, or DEFAULTIF
clause specifies a position, SQL*Loader compares the clause to the original logical record in the datafile. No whitespace trimming is done on the logical record in that case.
Different results are more likely if the field has whitespace that is trimmed, or if the WHEN
, NULLIF
, or DEFAULTIF
clause contains blanks or tabs or uses the BLANKS
parameter. If you require the same results for a field specified by name and for the same field specified by position, use the PRESERVE
BLANKS
option. The PRESERVE
BLANKS
option instructs SQL*Loader not to trim whitespace when it evaluates the values of the fields.
The results of a WHEN
, NULLIF
, or DEFAULTIF
clause are also affected by the order in which SQL*Loader operates, as described in the following steps. SQL*Loader performs these steps in order, but it does not always perform all of them. Once a field is set, any remaining steps in the process are ignored. For example, if the field is set in step 5, SQL*Loader does not move on to step 6.
WHEN
clauses for the table.WHEN
clauses for the table, or no WHEN
clauses are specified, SQL*Loader checks each field for a NULLIF
clause.NULLIF
clause exists, SQL*Loader evaluates it.NULLIF
clause is satisfied, SQL*Loader sets the field to NULL
.NULLIF
clause is not satisfied, or if there is no NULLIF
clause, SQL*Loader checks the length of the field from field evaluation. If the field has a length of 0 from field evaluation (for example, it was a null field, or whitespace trimming resulted in a null field), SQL*Loader sets the field to NULL
. In this case, any DEFAULTIF
clause specified for the field is not evaluated.NULLIF
clause is false or there is no NULLIF
clause, and if the field does not have a length of 0 from field evaluation, SQL*Loader checks the field for a DEFAULTIF
clause.DEFAULTIF
clause exists, SQL*Loader evaluates it.DEFAULTIF
clause is satisfied, then the field is set to 0 if the field in the datafile is a numeric field. It is set to NULL
if the field is not a numeric field. The following fields are numeric fields and will be set to 0 if they satisfy the DEFAULTIF
clause:
DEFAULTIF
clause is not satisfied, or if there is no DEFAULTIF
clause, SQL*Loader sets the field with the evaluated value from step 1.The order in which SQL*Loader operates could cause results that you do not expect. For example, the DEFAULTIF
clause may look like it is setting a numeric field to NULL
rather than to 0.
Example 6-2 through Example 6-5 clarify the results for different situations. In the examples, a blank or space is indicated with a period (.). Assume that col1
and col2
are VARCHAR2(5)
columns in the database.
The control file specifies:
(col1 POSITION (1:5), col2 POSITION (6:8) CHAR INTEGER EXTERNAL DEFAULTIF col1 = 'aname')
The datafile contains:
aname...
In Example 6-2, col1
for the row evaluates to aname
. col2
evaluates to NULL
with a length of 0 (it is "..." but the trailing blanks are trimmed for a positional field).
When SQL*Loader determines the final loaded value for col2
, it finds no WHEN
clause and no NULLIF
clause. It then checks the length of the field, which is 0 from field evaluation. Therefore, SQL*Loader sets the final value for col2
to NULL
. The DEFAULTIF
clause is not evaluated, and the row is loaded as aname
for col1
and NULL
for col2
.
The control file specifies:
. . . PRESERVE BLANKS . . . (col1 POSITION (1:5), col2 POSITION (6:8) INTEGER EXTERNAL DEFAULTIF col1 = 'aname'
The datafile contains:
aname...
In Example 6-3, col1
for the row again evaluates to 'aname
'. col2
evaluates to '...
' because trailing blanks are not trimmed when PRESERVE BLANKS
is specified.
When SQL*Loader determines the final loaded value for col2
, it finds no WHEN
clause and no NULLIF
clause. It then checks the length of the field from field evaluation, which is 3, not 0.
Then SQL*Loader evaluates the DEFAULTIF
clause, which evaluates to true because col1
is 'aname
', which is the same as 'aname
'.
Because col2
is a numeric field, SQL*Loader sets the final value for col2
to '0
'. The row is loaded as 'aname
' for col1
and as '0
' for col2
.
The control file specifies:
(col1 POSITION (1:5), col2 POSITION (6:8) INTEGER EXTERNAL DEFAULTIF (1:5) = BLANKS)
The datafile contains:
.....123
In Example 6-4, col1
for the row evaluates to NULL
with a length of 0 (it is .....
but the trailing blanks are trimmed). col2
evaluates to 123
.
When SQL*Loader sets the final loaded value for col2
, it finds no WHEN
clause and no NULLIF
clause. It then checks the length of the field from field evaluation, which is 3, not 0.
Then SQL*Loader evaluates the DEFAULTIF
clause. It compares (1:5)
which is .....
to BLANKS
, which evaluates to true. Therefore, because col2
is a numeric field (integer EXTERNAL
is numeric), SQL*Loader sets the final value for col2
to 0
. The row is loaded as NULL
for col1
and 0
for col2
.
The control file specifies:
(col1 POSITION (1:5), col2 POSITION(6:8) INTEGER EXTERNAL DEFAULTIF col1 = BLANKS)
The datafile contains:
.....123
In Example 6-5, col1
for the row evaluates to NULL
with a length of 0
(it is .....
, but the trailing blanks are trimmed). col2
evaluates to 123
.
When SQL*Loader determines the final value for col2
, it finds no WHEN
clause and no NULLIF
clause. It then checks the length of the field from field evaluation, which is 3, not 0.
Then SQL*Loader evaluates the DEFAULTIF
clause. As part of the evaluation, it checks to see that col1
is NULL
from field evaluation. It is NULL
, so the DEFAULTIF
clause evaluates to false. Therefore, SQL*Loader sets the final value for col2
to 123
, its original value from field evaluation. The row is loaded as NULL
for col1
and 123
for col2
.
When a datafile created on one platform is to be loaded on a different platform, the data must be written in a form that the target system can read. For example, if the source system has a native, floating-point representation that uses 16 bytes, and the target system's floating-point numbers are 12 bytes, the target system cannot directly read data generated on the source system.
The best solution is to load data across an Oracle Net database link, taking advantage of the automatic conversion of datatypes. This is the recommended approach, whenever feasible, and means that SQL*Loader must be run on the source system.
Problems with interplatform loads typically occur with native datatypes. In some situations, it is possible to avoid problems by lengthening a field by padding it with zeros, or to read only part of the field to shorten it (for example, when an 8-byte integer is to be read on a system that uses 4-byte integers, or the reverse). Note, however, that incompatible datatype implementation may prevent this.
If you cannot use an Oracle Net database link and the datafile must be accessed by SQL*Loader running on the target system, it is advisable to use only the portable SQL*Loader datatypes (for example, CHAR
, DATE
, VARCHARC
, and numeric EXTERNAL
). Datafiles written using these datatypes may be longer than those written with native datatypes. They may take more time to load, but they transport more readily across platforms.
If you know in advance that the byte ordering schemes or native integer lengths differ between the platform on which the input data will be created and the platform on which SQL*loader will be run, then investigate the possible use of the appropriate technique to indicate the byte order of the data or the length of the native integer. Possible techniques for indicating the byte order are to use the BYTEORDER
parameter or to place a byte-order mark (BOM) in the file. Both methods are described in Byte Ordering. It may then be possible to eliminate the incompatibilities and achieve a successful cross-platform data load. If the byte order is different from the SQL*Loader default, then you must indicate a byte order.
SQL*Loader can load data from a datafile that was created on a system whose byte ordering is different from the byte ordering on the system where SQL*Loader is running, even if the datafile contains certain nonportable datatypes.
By default, SQL*Loader uses the byte order of the system where it is running as the byte order for all datafiles. For example, on a Sun Solaris system, SQL*Loader uses big endian byte order. On an Intel or an Intel-compatible PC, SQL*Loader uses little endian byte order.
Byte order affects the results when data is written and read an even number of bytes at a time (typically 2 bytes, 4 bytes, or 8 bytes). The following are some examples of this:
Byte order also affects character data in the UTF16 character set if it is written and read as 2-byte entities. For example, the character 'a' (0x61 in ASCII) is written as 0x0061 in UTF16 on a big endian system, but as 0x6100 on a little endian system.
All Oracle-supported character sets, except UTF16, are written one byte at a time. So, even for multibyte character sets such as UTF8, the characters are written and read the same way on all systems, regardless of the byte order of the system. Therefore, data in the UTF16 character set is nonportable because it is byte-order dependent. Data in all other Oracle-supported character sets is portable.
Byte order in a datafile is only an issue if the datafile that contains the byte-order-dependent data is created on a system that has a different byte order from the system on which SQL*Loader is running. If SQL*Loader knows the byte order of the data, it swaps the bytes as necessary to ensure that the data is loaded correctly in the target database. Byte swapping means that data in big endian format is converted to little endian format, or the reverse.
To indicate byte order of the data to SQL*Loader, you can use the BYTEORDER
parameter, or you can place a byte-order mark (BOM) in the file. If you do not use one of these techniques, SQL*Loader will not correctly load the data into the datafile.
See Also:
Case Study 11: Loading Data in the Unicode Character Set for an example of how SQL*Loader handles byte swapping |
To specify the byte order of data in the input datafiles, use the following syntax in the SQL*Loader control file:
Text description of the illustration byteorder.gif
The BYTEORDER
parameter has the following characteristics:
BYTEORDER
is placed after the LENGTH
parameter in the SQL*Loader control file.BYTEORDER
specification before the INFILE
parameters applies to the entire list of primary datafiles.BYTEORDER
specification for the primary datafiles is also used as the default for LOBFILEs and SDFs. To override this default, specify BYTEORDER
with the LOBFILE or SDF specification.BYTEORDER
parameter is not applicable to data contained within the control file itself.BYTEORDER
parameter applies to the following:
INTEGER
and SMALLINT
dataVARCHAR
, VARGRAPHIC
, VARRAW
, and LONG
VARRAW
datatypes)FLOAT
and DOUBLE
datatypes, if the system where the data was written has a compatible floating-point representation with that on the system where SQL*Loader is runningBYTEORDER
parameter does not apply to any of the following:
Datafiles that use a Unicode encoding (UTF-16 or UTF-8) may contain a byte-order mark (BOM) in the first few bytes of the file. For a datafile that uses the character set UTF16, the value 0xFEFF in the first two bytes of the file is the BOM indicating that the file contains big endian data. A value of 0xFFFE is the BOM indicating that the file contains little endian data.
If the first primary datafile uses the UTF16 character set and it also begins with a BOM, that mark is read and interpreted to determine the byte order for all primary datafiles. SQL*Loader reads and interprets the BOM, skips it, and begins processing data with the byte immediately after the BOM. The BOM setting overrides any BYTEORDER
specification for the first primary datafile. BOMs in datafiles other than the first primary datafile are read and used for checking for byte-order conflicts only. They do not change the byte-order setting that SQL*Loader uses in processing the datafile.
In summary, the precedence of the byte-order indicators for the first primary datafile is as follows:
BYTEORDER
parameter value, if specified before the INFILE
parametersFor a datafile that uses a UTF8 character set, a BOM of 0xEFBBBF in the first 3 bytes indicates that the file contains UTF8 data. It does not indicate the byte order of the data, because data in UTF8 is not byte-order dependent. If SQL*Loader detects a UTF8 BOM, it skips it but does not change any byte-order settings for processing the datafiles.
SQL*Loader first establishes a byte-order setting for the first primary datafile using the precedence order just defined. This byte-order setting is used for all primary datafiles. If another primary datafile uses the character set UTF16 and also contains a BOM, the BOM value is compared to the byte-order setting established for the first primary datafile. If the BOM value matches the byte-order setting of the first primary datafile, SQL*Loader skips the BOM, and uses that byte-order setting to begin processing data with the byte immediately after the BOM. If the BOM value does not match the byte-order setting established for the first primary datafile, then SQL*Loader issues an error message and stops processing.
If any LOBFILEs or secondary datafiles are specified in the control file, SQL*Loader establishes a byte-order setting for each LOBFILE and secondary datafile (SDF) when it is ready to process the file. The default byte-order setting for LOBFILEs and SDFs is the byte-order setting established for the first primary datafile. This is overridden if the BYTEORDER
parameter is specified with a LOBFILE or SDF. In either case, if the LOBFILE or SDF uses the UTF16 character set and contains a BOM, the BOM value is compared to the byte-order setting for the file. If the BOM value matches the byte-order setting for the file, SQL*Loader skips the BOM, and uses that byte-order setting to begin processing data with the byte immediately after the BOM. If the BOM value does not match, then SQL*Loader issues an error message and stops processing.
In summary, the precedence of the byte-order indicators for LOBFILEs and SDFs is as follows:
BYTEORDER
parameter value specified with the LOBFILE or SDFA datafile in a Unicode character set may contain binary data that matches the BOM in the first bytes of the file. For example the integer(2) value 0xFEFF = 65279 decimal matches the big endian BOM in UTF16. In that case, you can tell SQL*Loader to read the first bytes of the datafile as data and not check for a BOM by specifying the BYTEORDERMARK
parameter with the value NOCHECK
. The syntax for the BYTEORDERMARK
parameter is:
Text description of the illustration byteordermark.gif
BYTEORDERMARK
NOCHECK
indicates that SQL*Loader should not check for a BOM and should read all the data in the datafile as data.
BYTEORDERMARK
CHECK
tells SQL*Loader to check for a BOM. This is the default behavior for a datafile in a Unicode character set. But this specification may be used in the control file for clarification. It is an error to specify BYTEORDERMARK
CHECK
for a datafile that uses a non-Unicode character set.
The BYTEORDERMARK
parameter has the following characteristics:
BYTEORDER
parameter in the SQL*Loader control file.BYTEORDERMARK
value for different datafiles; however, the BYTEORDERMARK
specification before the INFILE
parameters applies to the entire list of primary datafiles.BYTEORDERMARK
specification for the primary datafiles is also used as the default for LOBFILEs and SDFs, except that the value CHECK
is ignored in this case if the LOBFILE or SDF uses a non-Unicode character set. This default setting for LOBFILEs and secondary datafiles can be overridden by specifying BYTEORDERMARK
with the LOBFILE or SDF specification.Fields that are numeric or totally blank cause the record to be rejected. To load one of these fields as NULL
, use the NULLIF
clause with the BLANKS
parameter.
If an all-blank CHAR
field is surrounded by enclosure delimiters, then the blanks within the enclosures are loaded. Otherwise, the field is loaded as NULL
.
A DATE
field that consists entirely of blanks is loaded as a NULL
field.
See Also:
|
Blanks, tabs, and other nonprinting characters (such as carriage returns and line feeds) constitute whitespace. Leading whitespace occurs at the beginning of a field. Trailing whitespace occurs at the end of a field. Depending on how the field is specified, whitespace may or may not be included when the field is inserted into the database. This is illustrated in Figure 6-1 where two CHAR
fields are defined for a data record.
The field specifications are contained in the control file. The control file CHAR
specification is not the same as the database CHAR
specification. A data field defined as CHAR
in the control file merely tells SQL*Loader how to create the row insert. The data could then be inserted into a CHAR
, VARCHAR2
, NCHAR
, NVARCHAR
, or even a NUMBER
or DATE
column in the database, with the Oracle database server handling any necessary conversions.
By default, SQL*Loader removes trailing spaces from CHAR
data before passing it to the database. So, in Figure 6-1, both field 1 and field 2 are passed to the database as 3-byte fields. However, when the data is inserted into the table, there is a difference.
Text description of the illustration sut81003.gif
Column 1 is defined in the database as a fixed-length CHAR
column of length 5
. So the data (aaa
) is left-justified in that column, which remains 5 bytes wide. The extra space on the right is padded with blanks. Column 2, however, is defined as a varying-length field with a maximum length of 5 bytes. The data for that column (bbb
) is left-justified as well, but the length remains 3 bytes.
Table 6-5 summarizes when and how whitespace is removed from input data fields when PRESERVE
BLANKS
is not specified. See Preserving Whitespace for details on how to prevent whitespace trimming.
Specification | Data | Result | Leading Whitespace PresentFoot 1 | Trailing Whitespace PresentFootref 1 |
---|---|---|---|---|
Predetermined size |
__aa__ |
__aa |
Yes |
No |
Terminated |
__aa__, |
__aa__ |
Yes |
YesFoot 2 |
Enclosed |
"__aa__" |
__aa__ |
Yes |
Yes |
Terminated and enclosed |
"__aa__", |
__aa__ |
Yes |
Yes |
Optional enclosure (present) |
"__aa__", |
__aa__ |
Yes |
Yes |
Optional enclosure (absent) |
__aa__, |
aa__ |
No |
Yes |
Previous field terminated by whitespace |
__aa__ |
aaFoot 3 |
No |
The rest of this section discusses the following topics with regard to trimming whitespace:
The information in this section applies only to fields specified with one of the character-data datatypes:
There are two ways to specify field length. If a field has a constant length that is defined in the control file with a position specification or the datatype and length, then it has a predetermined size. If a field's length is not known in advance, but depends on indicators in the record, then the field is delimited, using either enclosure or termination delimiters.
If a position specification with start and end values is defined for a field that also has enclosure or termination delimiters defined, only the position specification has any effect. The enclosure and termination delimiters are ignored.
Fields that have a predetermined size are specified with a starting position and ending position, or with a length, as in the following examples:
loc POSITION(19:31) loc CHAR(14)
In the second case, even though the exact position of the field is not specified, the length of the field is predetermined.
Delimiters are characters that demarcate field boundaries.
Enclosure delimiters surround a field, like the quotation marks in the following example, where "__" represents blanks or tabs:
"__aa__"
Termination delimiters signal the end of a field, like the comma in the following example:
__aa__,
Delimiters are specified with the control clauses TERMINATED
BY
and ENCLOSED
BY
, as shown in the following example:
loc TERMINATED BY "." OPTIONALLY ENCLOSED BY '|'
This section describes how SQL*Loader determines the starting position of a field in the following situations:
When a starting position is not specified for a field, it begins immediately after the end of the previous field. Figure 6-2 illustrates this situation when the previous field (Field 1) has a predetermined size.
Text description of the illustration sut81005.gif
If the previous field (Field 1) is terminated by a delimiter, then the next field begins immediately after the delimiter, as shown in Figure 6-3.
Text description of the illustration sut81006.gif
When a field is specified with both enclosure delimiters and a termination delimiter, then the next field starts after the termination delimiter, as shown in Figure 6-4. If a nonwhitespace character is found after the enclosure delimiter, but before the terminator, then SQL*Loader generates an error.
Text description of the illustration sut81007.gif
In Figure 6-4, both fields are stored with leading whitespace. Fields do not include leading whitespace in the following cases:
These cases are illustrated in the following sections.
If the previous field is TERMINATED
BY
WHITESPACE
, then all whitespace after the field acts as the delimiter. The next field starts at the next nonwhitespace character. Figure 6-5 illustrates this case.
Text description of the illustration sut81008.gif
This situation occurs when the previous field is explicitly specified with the TERMINATED
BY
WHITESPACE
clause, as shown in the example. It also occurs when you use the global FIELDS
TERMINATED
BY
WHITESPACE
clause.
Leading whitespace is also removed from a field when optional enclosure delimiters are specified but not present.
Whenever optional enclosure delimiters are specified, SQL*Loader scans forward, looking for the first enclosure delimiter. If an enclosure delimiter is not found, SQL*Loader skips over whitespace, eliminating it from the field. The first nonwhitespace character signals the start of the field. This situation is shown in Field 2 in Figure 6-6. (In Field 1 the whitespace is included because SQL*Loader found enclosure delimiters for the field.)
Text description of the illustration sut81009.gif
Unlike the case when the previous field is TERMINATED
BY
WHITESPACE
, this specification removes leading whitespace even when a starting position is specified for the current field.
Note: If enclosure delimiters are present, leading whitespace after the initial enclosure delimiter is kept, but whitespace before this delimiter is discarded. See the first quotation mark in Field 1, Figure 6-6. |
Trailing whitespace is always trimmed from character-data fields that have a predetermined size. These are the only fields for which trailing whitespace is always trimmed.
If a field is enclosed, or terminated and enclosed, like the first field shown in Figure 6-6, then any whitespace outside the enclosure delimiters is not part of the field. Any whitespace between the enclosure delimiters belongs to the field, whether it is leading or trailing whitespace.
To prevent whitespace trimming in all CHAR
, DATE
, and numeric EXTERNAL
fields, you specify PRESERVE
BLANKS
in the control file. Whitespace trimming is described in Trimming Whitespace.
The PRESERVE
BLANKS
option:
For example, consider the following field, where underscores represent blanks:
__aa__,
If this field is loaded with the following control clause, then both the leading whitespace and the trailing whitespace are retained if PRESERVE
BLANKS
is specified. Otherwise, the leading whitespace is trimmed.
TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
When the previous field is terminated by whitespace, then PRESERVE
BLANKS
does not preserve the space at the beginning of the next field, unless that field is specified with a POSITION
clause that includes some of the whitespace. Otherwise, SQL*Loader scans past all whitespace at the end of the previous field until it finds a nonblank, nontab character.
A wide variety of SQL operators can be applied to field data with the SQL string. This string can contain any combination of SQL expressions that are recognized by the Oracle database server as valid for the VALUES
clause of an INSERT
statement. In general, any SQL function that returns a single value that is compatible with the target column's datatype can be used. SQL strings can be applied to simple scalar column types as well as to user-defined complex types such as column object and collections. See the information about expressions in the Oracle9i SQL Reference.
The column name and the name of the column in the SQL string must match exactly, including the quotation marks, as in the following example of specifying the control file:
LOAD DATA INFILE * APPEND INTO TABLE XXX ( "Last" position(1:7) char "UPPER(:\"Last\")" FIRST position(8:15) char "UPPER(:FIRST)" ) BEGINDATA Phil Locke Jason Durbin
The following requirements and restrictions apply when you are using SQL strings:
In the preceding example, Last
is enclosed in double quotation marks to preserve the mixed case, and the double quotation marks necessitate the use of the backslash (escape) character.
NULLIF
or DEFAULTIF
clauses, but before a date mask.EXPRESSION
parameter in a field specification.OID
, SID
, REF
, or BFILE
. Also, it cannot reference filler fields.VARRAY
, nested table, or LOB
column. This also includes a VARRAY
, nested table, or LOB
column that is an attribute of a column object.RECNUM
, SEQUENCE
, CONSTANT
, or SYSDATE
fields.BFILE
s, XML
columns, or a file that is an element of a collection.To refer to fields in the record, precede the field name with a colon (:). Field values from the current record are substituted. A field name preceded by a colon (:) in a SQL string is also referred to as a bind variable. The following example illustrates how a reference is made to both the current field and to other fields in the control file:
LOAD DATA INFILE * APPEND INTO TABLE YYY ( field1 POSITION(1:6) CHAR "LOWER(:field1)" field2 CHAR TERMINATED BY ',' NULLIF ((1) = 'a') DEFAULTIF ((1)= 'b') "RTRIM(:field2)" field3 CHAR(7) "TRANSLATE(:field3, ':field1', ':1')", field4 COLUMN OBJECT ( attr1 CHAR(3) "UPPER(":\"FIELD4.ATTR3\")", attr2 CHAR(2), attr3 CHAR(3) ":\"FIELD4.ATTR1\" + 1" ), field5 EXPRESSION "MYFUNC(:FIELD4, SYSDATE)" ) BEGINDATA ABCDEF1234511 ,:field1500YYabc abcDEF67890 ,:field2600ZZghl
Note the following about the preceding example:
:field1
that is not in single quotation marks is interpreted as a bind variable; ':field1'
and ':1'
are text literals that are passed unchanged to the TRANSLATE
function. For more information on the use of quotation marks inside quoted strings, see Specifying Filenames and Object Names.ABCDEF
in the first record is mapped to the first field :field1
. This value is then passed as an argument to the LOWER
function.:\"FIELD4.ATTR1\"
and :\"FIELD4.ATTR3\"
.FIELD4.ATTR1
references field FIELD4.ATTR3
. The field FIELD4.ATTR1
is still mapped to the values 500 and 600 in the input records, but the final values stored in its corresponding columns are ABC and GHL.field5
is not mapped to any field in the input record. The value that is stored in the target column is the result of executing the MYFUNC
PL/SQL function, which takes two arguments. The use of the EXPRESSION
parameter requires that a SQL string be used to compute the final value of the column because no input data is mapped to the field.SQL operators are commonly used for the following tasks:
field1 POSITION(1:9) DECIMAL EXTERNAL(8) ":field1/1000"
field1 CHAR TERMINATED BY "," "SUBSTR(:field1, 1, 10)"
Multiple operators can also be combined, as in the following examples:
field1 POSITION(*+3) INTEGER EXTERNAL "TRUNC(RPAD(:field1,6,'0'), -2)" field1 POSITION(1:8) INTEGER EXTERNAL "TRANSLATE(RTRIM(:field1),'N/A', '0')" field1 CHAR(10) "NVL( LTRIM(RTRIM(:field1)), 'unknown' )"
When a SQL string is used with a date mask, the date mask is evaluated after the SQL string. Consider a field specified as follows:
field1 DATE "dd-mon-yy" "RTRIM(:field1)"
SQL*Loader internally generates and inserts the following:
TO_DATE(RTRIM(<field1_value>), 'dd-mon-yyyy')
Note that when using the DATE
field datatype, it is not possible to have a SQL string without a date mask. This is because SQL*Loader assumes that the first quoted string it finds after the DATE
parameter is a date mask. For instance, the following field specification would result in an error (ORA-01821: date format not recognized):
field1 DATE "RTRIM(TO_DATE(:field1, 'dd-mon-yyyy'))"
In this case, a simple workaround is to use the CHAR
datatype.
It is possible to use the TO_CHAR
operator to store formatted dates and numbers. For example:
field1 ... "TO_CHAR(:field1, '$09999.99')"
This example could store numeric input data in formatted form, where field1
is a character column in the database. This field would be stored with the formatting characters (dollar sign, period, and so on) already in place.
You have even more flexibility, however, if you store such values as numeric quantities or dates. You can then apply arithmetic functions to the values in the database, and still select formatted values for your reports.
The SQL string is used in Case Study 7: Extracting Data from a Formatted Report to load data from a formatted report.
The parameters described in this section provide the means for SQL*Loader to generate the data stored in the database record, rather than reading it from a datafile. The following parameters are described:
It is possible to use SQL*Loader to generate data by specifying only sequences, record numbers, system dates, constants, and SQL string expressions as field specifications.
SQL*Loader inserts as many records as are specified by the LOAD
statement. The SKIP
parameter is not permitted in this situation.
SQL*Loader is optimized for this case. Whenever SQL*Loader detects that only generated specifications are used, it ignores any specified datafile--no read I/O is performed.
In addition, no memory is required for a bind array. If there are any WHEN
clauses in the control file, SQL*Loader assumes that data evaluation is necessary, and input records are read.
This is the simplest form of generated data. It does not vary during the load or between loads.
To set a column to a constant value, use CONSTANT
followed by a value:
CONSTANT value
CONSTANT
data is interpreted by SQL*Loader as character input. It is converted, as necessary, to the database column type.
You may enclose the value within quotation marks, and you must do so if it contains whitespace or reserved words. Be sure to specify a legal value for the target column. If the value is bad, every record is rejected.
Numeric values larger than 2^32 - 1 (4,294,967,295) must be enclosed in quotation marks.
Use the EXPRESSION
parameter after a column name to set that column to the value returned by a SQL operator or specially written PL/SQL function. The operator or function is indicated in a SQL string that follows the EXPRESSION
parameter. Any arbitrary expression may be used in this context provided that any parameters required for the operator or function are correctly specified and that the result returned by the operator or function is compatible with the datatype of the column being loaded.
The combination of column name, EXPRESSION
parameter, and a SQL string is a complete field specification.
column_name EXPRESSION "SQL string"
Use the RECNUM
parameter after a column name to set that column to the number of the logical record from which that record was loaded. Records are counted sequentially from the beginning of the first datafile, starting with record 1. RECNUM
is incremented as each logical record is assembled. Thus it increments for records that are discarded, skipped, rejected, or loaded. If you use the option SKIP=10
, the first record loaded has a RECNUM
of 11.
The combination of column name and RECNUM
is a complete column specification.
column_name
RECNUM
A column specified with SYSDATE
gets the current system date, as defined by the SQL language SYSDATE
parameter. See the section on the DATE
datatype in Oracle9i SQL Reference.
The combination of column name and the SYSDATE
parameter is a complete column specification.
column_name
SYSDATE
The database column must be of type CHAR
or DATE
. If the column is of type CHAR
, then the date is loaded in the form 'dd-mon-yy.' After the load, it can be loaded only in that form. If the system date is loaded into a DATE
column, then it can be loaded in a variety of forms that include the time and the date.
A new system date/time is used for each array of records inserted in a conventional path load and for each block of records loaded during a direct path load.
The SEQUENCE
parameter ensures a unique value for a particular column. SEQUENCE
increments for each record that is loaded or rejected. It does not increment for records that are discarded or skipped.
The combination of column name and the SEQUENCE
parameter is a complete column specification.
Text description of the illustration sequence.gif
Table 6-6 describes the parameters used for column specification.
If a record is rejected (that is, it has a format error or causes an Oracle error), the generated sequence numbers are not reshuffled to mask this. If four rows are assigned sequence numbers 10, 12, 14, and 16 in a particular column, and the row with 12 is rejected, the three rows inserted are numbered 10, 14, and 16, not 10, 12, and 14. This allows the sequence of inserts to be preserved despite data errors. When you correct the rejected data and reinsert it, you can manually set the columns to agree with the sequence.
Case Study 3: Loading a Delimited, Free-Format File provides an example of the SEQUENCE
parameter.
Because a unique sequence number is generated for each logical input record, rather than for each table insert, the same sequence number can be used when inserting data into multiple tables. This is frequently useful.
Sometimes, however, you might want to generate different sequence numbers for each INTO
TABLE
clause. For example, your data format might define three logical records in every input record. In that case, you can use three INTO
TABLE
clauses, each of which inserts a different part of the record into the same table. When you use SEQUENCE(MAX)
, SQL*Loader will use the maximum from each table, which can lead to inconsistencies in sequence numbers.
To generate sequence numbers for these records, you must generate unique numbers for each of the three inserts. Use the number of table-inserts per record as the sequence increment and start the sequence numbers for each insert with successive numbers.
Suppose you want to load the following department names into the dept
table. Each input record contains three department names, and you want to generate the department numbers automatically.
Accounting Personnel Manufacturing Shipping Purchasing Maintenance ...
You could use the following control file entries to generate unique department numbers:
INTO TABLE dept (deptno SEQUENCE(1, 3), dname POSITION(1:14) CHAR) INTO TABLE dept (deptno SEQUENCE(2, 3), dname POSITION(16:29) CHAR) INTO TABLE dept (deptno SEQUENCE(3, 3), dname POSITION(31:44) CHAR)
The first INTO
TABLE
clause generates department number 1, the second number 2, and the third number 3. They all use 3 as the sequence increment (the number of department names in each record). This control file loads Accounting as department number 1, Personnel as 2, and Manufacturing as 3.
The sequence numbers are then incremented for the next record, so Shipping loads as 4, Purchasing as 5, and so on.
|
Copyright © 1996, 2002 Oracle Corporation. All Rights Reserved. |
|