Skip Headers

Oracle9i Database Globalization Support Guide
Release 2 (9.2)

Part Number A96529-01
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback

Go to previous page Go to next page

B
Unicode Character Code Assignments

This appendix offers an introduction to how Unicode assigns characters. This appendix contains:

Unicode Code Ranges

Table B-1 contains code ranges that have been allocated in Unicode for UTF-16 character codes.

Table B-1 Unicode Character Code Ranges for UTF-16 Character Codes  
Types of Characters First 16 Bits Second 16 Bits

ASCII

0000-007F

-

European (except ASCII), Arabic, Hebrew

0080-07FF

-

indic, Thai, certain symbols (such as the euro symbol), Chinese, Japanese, Korean

0800-0FFF

1000 - CFFF

D000 - D7FF

F900 - FFFF

-

Private Use Area #1

E000 - EFFF

F000 - F8FF

-

Supplementary characters: Additional Chinese, Japanese, and Korean characters; historic characters; musical symbols; mathematical symbols

D800 - D8BF

D8CO - DABF

DAC0 - DB7F

DC00 - DFFF

DC00 - DFFF

DC00 - DFFF

rivate Use Area #2

DB80 - DBBF

DBC0 - DBFF

DC00 - DFFF

DC00 - DFFF

Table B-2 contains code ranges that have been allocated in Unicode for UTF-8 character codes.

Table B-2 Unicode Character Code Ranges for UTF-8 Character Codes  
Types of Characters First Byte Second Byte Third Byte Fourth Byte

ASCII

00 - 7F

-

-

-

European (except ASCII), Arabic, Hebrew

C2 - DF

80 - BF

-

-

Iindic, Thai, certain symbols (such as the euro symbol), Chinese, Japanese, Korean

E0

E1 - EC

ED

EF

A0 - BF

80 - BF

80 - 9F

A4 - BF

80 - BF

80 - BF

80 - BF

80 - BF

-

Private Use Area #1

EE

EF

80 - BF

80 - A3

80 - BF

80 - BF

-

Supplementary characters: Additional Chinese, Japanese, and Korean characters; historic characters; musical symbols; mathematical symbols

F0

F1 - F2

F3

90 - BF

80 - BF

80 - AF

80 - BF

80 - BF

80 - BF

80 - BF

80 - BF

80 - BF

Private Use Area #2

F3

F4

B0 - BF

80 - 8F

80 - BF

80 - BF

80 - BF

80 - BF


Note:

Blank spaces represent non-applicable code assignments. Character codes are shown in hexadecimal representation.


UTF-16 Encoding

As shown in Table B-1, UTF-16 character codes for some characters (Additional Chinese/Japanese/Korean characters and Private Use Area #2) are represented in two units of 16-bits. These are supplementary characters. A supplementary character consists of two 16-bit values. The first 16-bit value is encoded in the range from 0xD800 to 0xDBFF. The second 16-bit value is encoded in the range from 0xDC00 to 0xDFFF. With supplementary characters, UTF-16 character codes can represent more than one million characters. Without supplementary characters, only 65,536 characters can be represented. Oracle's AL16UTF16 character set supports supplementary characters.

See Also:

"Supplementary Characters"

UTF-8 Encoding

The UTF-8 character codes in Table B-2 show that the following conditions are true:

Oracle's AL32UTF8 character set supports 1-byte, 2-byte, 3-byte, and 4-byte values. Oracle's UTF8 character set supports 1-byte, 2-byte, and 3-byte values, but not 4-byte values.


Go to previous page Go to next page
Oracle
Copyright © 1996, 2002 Oracle Corporation.

All Rights Reserved.
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback