Oracle Internet Directory Administrator's Guide Release 9.2 Part Number A96574-01 |
|
Oracle Internet Directory uses Globalization Support to store, process and retrieve data in native languages. It ensures that Oracle Internet Directory utilities and error messages automatically adapt to the native language and locale.
This chapter discusses Globalization Support as used by Oracle Internet Directory and tells you the required NLS_LANG environment variables for the various components and tools in an Oracle Internet Directory environment.
See Also:
"Globalization Support" prior to configuring Globalization Support |
This chapter contains these topics:
The NLS_LANG parameter has three components--language
, territory
, and charset
--in the form:
NLS_LANG = language_territory.charset
Each component controls the operation of a subset of Globalization Support features.
Component | Description |
---|---|
language |
Specifies conventions such as the language used for Oracle messages, day names, and month names. Each supported language has a unique name--for example, American English, French, or German. The language argument specifies default values for the territory and character set arguments, so either (or both) If language is not specified, the value defaults to American English. See Also: Oracle9i Database Globalization Support Guide in the Oracle Database Documentation Library for a complete list of languages |
territory |
Specifies conventions such as the default calendar, collation, date, monetary, and numeric formats. Each supported territory has a unique name; for example, America, France, or Canada. If territory is not specified, the value defaults to America. See Also: Oracle9i Database Globalization Support Guide in the Oracle Database Documentation Library for a complete list of territories |
charset |
Specifies the character set used by the client application (normally that of the user's terminal). Each supported character set has a unique acronym, for example, US7ASCII, WE8ISO8859P1, WE8DEC, WE8EBCDIC500, or JA16EUC. Each language has a default character set associated with it. Default values for the languages available on your system are listed in your operating system installation guide or administrator's guide. See Also: Oracle9i Database Globalization Support Guide in the Oracle Database Documentation Library for a complete list of character sets |
You can set NLS_LANG as an environment variable at the command line. The following are examples of legal values for NLS_LANG:
You can run the Oracle directory server and database tools on a non-UTF-8--that is, neither UTF8 nor AL31UTF8--database, but be sure that all characters in the client character set are included in the database character set (with the same or different codes). Otherwise, you can lose data during ldapadd, ldapdelete, ldapmodify, or ldapmodifydn operations. For example, suppose that you perform an ldapadd operation using a multibyte character set on an underlying database that uses only single-byte characters. You will lose data because not all of the bytes you enter will be accepted by the database.
Attribute types are always ASCII strings that cannot contain multibyte characters. Oracle Internet Directory does not support multibyte characters in attribute type names. However, Oracle Internet Directory does support attribute values containing multibyte characters such as those in the simplified Chinese (.ZHS16GBK) character set.
Attribute values can be encoded in different ways to allow Oracle Internet Directory tools to interpret them properly. There are two scenarios:
In this scenario, character strings for attribute values are also in ASCII.
Because all tools use the UTF-8 character set by default, and ASCII is a proper subset of UTF-8, all tools can interpret these files. The same is true of keyboard input of values that are simply ASCII strings.An LDIF file Containing UTF-8 Encoded Strings
In this scenario, character strings for attribute values are also in UTF-8.
Because, by default, all tools use the UTF-8 character set (the Oracle character set name is AL32UTF8), all tools can interpret these files. The same is true of keyboard input of values which are UTF-8 strings.
In such a file, some characters may be multibyte. Multibyte characters strings can be present in the LDIF files as attribute values or given as keyboard input. They can be encoded in their native character set or in UTF-8. They can also be BASE64 encoded representations of either the native or the UTF-8 string.
Consider the following cases:
Because the directory server understands and expects only UTF-8 encoded strings, cases 1, 3, and 4 need to undergo conversion to UTF-8 strings before they can be sent to the LDAP server.
Use the -E
argument in the command-line tools, ldifwrite, and bulkmodify. Use the -encode
argument in the bulkload and bulkdelete tools.
This example converts simplified Chinese native strings to UTF-8. The baseDN can be a simplified Chinese string:
ldapsearch -h my_host -p 389 -E ".ZHS16GBK" -b base_DN -s base "objectclass=*"
No conversion is required.
You need to use neither the -E
argument in the command-line tools, ldifwrite, and bulkmodify, nor the -encode
argument in bulkload and bulkdelete. Oracle Internet Directory tools automatically decode BASE64 encoded UTF-8 strings to UTF-8 strings.
Use the -E
argument in the command-line tools, ldifwrite, and bulkmodify. Use the -encode
argument in the bulkload and bulkdelete tools.
Oracle Internet Directory tools automatically decode BASE64 encoded native strings to simple native strings. The native strings are then converted to the equivalent UTF-8 strings.
The Oracle Internet Directory command-line tools read keyboard input or LDIF file input in the following ways:
If the character set being given as input from an LDIF file or keyboard is not UTF-8, then the command-line tools need to convert the input into UTF-8 format before sending it to the LDAP server.
You enable the command-line tools to convert the input into UTF-8 by specifying the -E
argument when using each tool.
This section contains these topics:
The client tools always assume UTF-8 (the Oracle character set name is AL32UTF8) to be the character set unless otherwise specified by the -E
argument. The BASE64-encoded values are decoded, and then the decoded buffer is converted to UTF-8 if the -E
argument is specified. For example, if you specify -E ".ZHS16GBK"
, then the decoded buffer is converted from simplified Chinese to UTF-8 before being sent to the LDAP server.
Specifying the -E
argument ensures that proper character set conversion can occur from the character set you specify for the -E
argument (-E ".
character_set"
) to the.UTF-8 character set.
The command-line tools use the -E
argument to process the input in the character set specified for the -E
argument. They display their output in the character set specified in the NLS_LANG environment variable.
For example, to add entries from an LDIF file encoded in the simplified Chinese character set (.ZHS16GBK) by using ldapadd, type:
ldapadd -h myhost -p 389 -E ".ZHS16GBK" -f my_ldif_file
In this example, the ldapadd tool converts the characters from ".ZHS16GBK"
(simplified Chinese character set) to ".AL32UTF8
" (UTF-8 character set) before they are sent across the wire to the LDAP server.
The following table provides additional examples of how to use the -E
argument correctly for each command-line tool. In each example, the command converts data from simplified Chinese, as specified by the value ".ZHS16GBK"
, to UTF-8. For example, in each command, the values for the -D
and -w
options are in simplified Chinese. Specifying the -E
argument converts them to UTF-8.
Note that, in the examples in the following table, we do not show any actual characters belonging to .ZHS16GBK
character set. These examples would, therefore, work without the -E
argument. However, if the argument values contained actual characters in the .ZHS16GBK
character set, then we would need to use the -E
argument.
See Also:
Appendix A, "Syntax for LDIF and Command-Line Tools" for syntax and usage notes for each of the command-line tools |
If the output required by the client is UTF-8, then you do not need to set the NLS_LANG environment variable. In this case, the NLS_LANG environment variable defaults to .AL32UTF8
, and both the input path from client to server, and the output path from server to client, do not require any character set conversion.
If the output required by the client is not UTF-8, then you must set the NLS_LANG environment variable. This ensures that proper character set conversion can occur from the UTF-8 character set to the character set required by the client.
For example, if the NLS_LANG environment variable is set to the simplified Chinese character set, then the command-line tool displays output in that character set. Otherwise the output defaults to the UTF-8 character set.
Note: If you are using Windows NT, then, to use the command-line tools after server startup, you must reset NLS_LANG in an MS-DOS window. Set it to the character set that matches the code page of your MS-DOS session. UTF-8 cannot be used.See the Oracle9i Database Installation Guide for Windowsin the Oracle Database Documentation Library for more information on which character set to use for command-line tools in an MS-DOS session. If you are using a pre-installed Oracle9i release 9.2 database with Oracle Internet Directory, then you must also set the database character set to UTF-8. See the Oracle9i Database Globalization Support Guidein the Oracle Database Documentation Library and Oracle9i Database Installation Guide for Windows for more information. Be careful not to change the NLS_LANG parameter value in the registry. |
Oracle Internet Directory ensures that the reading and writing of text data from and to LDIF files are done in UTF-8 encoding as specified by the LDAP standard.
This section provides an example of the argument you use for each of the following bulk tools:
See Also:
"Bulk Operations Command-Line Tools" for a list of arguments for each bulk tool |
Add to the command the argument -encode "
character_set"
where the input LDIF file is encoded in "character_set".
For example:
bulkload.sh -connect net_service_name -encode ".ZHS16GBK" my_ldif_file
Note: To run shell script tools on the Windows operating system, you need one of the following UNIX emulation utilities:
|
The ldifwrite utility always writes BASE64 encoded values for multibyte strings.
The BASE64 encoding could be of the UTF-8 strings as they are stored in the directory server, or of native strings as specified by the NLS_LANG environment variable setting when running ldifwrite.
For example:
ldifwrite -c net_service_name -b baseDN -f output_file
In this example, if the NLS_LANG environment variable is not set, or is set to language_territory.AL32UTF8
, then the output LDIF file will contain BASE64-encoded UTF-8 strings for any multibyte characters.
To reload this LDIF file into the directory by using ldapaddmt, use the following syntax:
ldapaddmt -h my_host -p port_number -f output_file
In this case, the -E
argument is not required because the decoded BASE64 strings are already UTF-8-encoded and can be readily sent to the server.
If the NLS_LANG environment variable is set to a character set other than UTF-8--for example, ".ZHS16GBK"
--then the output LDIF file will contain a BASE64 encoded value of simplified Chinese (.ZHS16GBK
) strings.
To reload this LDIF file into the directory using ldapaddmt, use the following syntax:
ldapaddmt -h host -p port -E ".ZHS16GBK" -f my_input_file.LDIF
In this case the -E
argument is required because the decoded BASE64 strings are simplified Chinese, which need to be converted to UTF-8 strings before being sent to the server.
Add -encode ".
character_set"
to the command.
For example:
bulkdelete.sh -connect net_service_name -encode ".ZHS16GBK" -base "ou=manufacturing,o=acme,c=us"
In this case the value for the -base
option could be in the ZHS16GBK
native character set, that is, simplified Chinese.
Note: To run shell script tools on the Windows operating system, you need one of the following UNIX emulation utilities:
|
Add -E ".
character_set"
to the command the argument.
For example:
bulkmodify.sh -c my_service_name -E ".ZHS16GBK" -b "ou=manufacturing,o=acme,c=us" -r title -v Foreman -f "objectclass=*"
In this example, values for the -b
, -v,
and -f
arguments
can be specified using the simplified Chinese character set.
Note: To run shell script tools on the Windows operating system, you need one of the following UNIX emulation utilities:
|
|
Copyright © 1999, 2002 Oracle Corporation. All Rights Reserved. |
|