Oracle® Database Performance Tuning Guide 10g Release 1 (10.1) Part Number B10752-01 |
|
|
View PDF |
This chapter describes different connection models and introduces networking issues that affect tuning.
This chapter contains the following sections:
The techniques used to determine the source of problems vary depending on the configuration. You can have a shared server configuration or a dedicated server configuration.
LSNRCTL
services lists dispatchers
.LSNRCTL
services lists dedicated
servers
.It is possible to connect to dedicated server with a database configured for shared servers by placing the parameter (SERVER
= DEDICATED
) in the connect descriptor.
This section discusses the setups for the shared server configuration.
The LSNRCTL
control utility's services
statement lists every dispatcher registered with it. This list includes the dispatchers process ID. You can check the alert log to confirm that the dispatchers have been started successfully.
LSNRCTL> services Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=helios)(PORT=1521))) Services Summary... Service "sales.us.acme.com" has 1 instance(s). Instance "sales", status READY, has 3 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 state:ready LOCAL SERVER "D000" established:0 refused:0 current:0 max:10000 state:ready DISPATCHER <machine: helios, pid: 1689> (ADDRESS=(PROTOCOL=tcp)(HOST=helios)(PORT=52414)) "D001" established:0 refused:0 current:0 max:10000 state:ready DISPATCHER <machine: helios, pid: 1691> (ADDRESS=(PROTOCOL=tcp)(HOST=helios)(PORT=52415)) The command completed successfully.
See Also:
Oracle Net Services Administrator's Guide for information on setting the output mode |
The following list provides information on configuring initialization parameters for shared servers.
DISPATCHERS
line is correctly set. For example:
DISPATCHERS = "(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP) (HOST=hostname)(PORT=1492)(queuesize=32))) (DISPATCHERS = 1) (LISTENER = alias) (SERVICE = servicename) (SESSIONS = 1000) (CONNECTIONS = 1000) (MULTIPLEX = ON) (POOL = ON) (TICK = 5)"
One, and only one, of the following attributes is required:
ADDRESS
and DESCRIPTION
provide support for the specification of additional network attributes beyond PROTOCOL
. In the previous example, the entire DISPATCHERS
line can be (PROTOCOL=TCP
). The attributes DISPATCHERS
, LISTENER
, SERVICE
, SESSIONS
, CONNECTIONS
, MULTIPLEX
, POOL
, and TICKS
are all optional.
MAX_DISPATCHERS
line is correctly set. For example:
MAX_DISPATCHERS = 4
This line should reflect the total number of dispatchers you want to start.
MAX_SHARED_SERVERS
line is correctly set. For example:
MAX_SHARED_SERVERS = 5
This line sets the upper bound on the total number of shared servers PMON can create, based on the peak load of the system. This should be set high enough so that all requests can be serviced, but not so high that the system swaps if they are reached. The purpose of this parameter is to prevent the server from swapping. Run the following script to see what the highwater mark is for the number of servers running, and then set MAX_SHARED_SERVERS
to more then this.
SELECT maximum_connections "MAX CONN", servers_started "STARTED", servers_ terminated "TERMINATED", servers_highwater "HIGHWATER" FROM V$SHARED_SERVER_ MONITOR;
SHARED_SERVERS
line is correctly set. For example:
SHARED_SERVERS = 5
This is the total number of shared servers started when the database is started. It also represents the total number of shared servers PMON tries to keep. It should be the total number of servers expected to be used when the database is active. MAX_SHARED_SERVERS
is intended to handle peak load.
Use the LSNRCTL
control utility's services
command to see if there are excessive connection refusals. Check the listener's log file to see if this is a connection problem. For example:
LSNRCTL> services Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=helios)(PORT=1521))) Services Summary... Service "sales.us.acme.com" has 1 instance(s). Instance "sales", status READY, has 2 handler(s) for this service... Handler(s): "DEDICATED" established:11 refused:0 state:ready LOCAL SERVER "D000" established:565 refused:4 current:155 max:10000 state:ready DISPATCHER <machine: helios, pid: 5673> (ADDRESS=(PROTOCOL=tcp)(HOST=helios)(PORT=38411)) The command completed successfully.
Under normal conditions, the number refused should be zero. Shut down the listener and restart it to erase these statistics. If the refused count is increasing after the listener restarts, then the connections are being refused. If the refused count stays at zero, and if the problem you are troubleshooting is occurring, then your problem is not with the connections being refused.
Connection refusals can occur for many reasons. Examine the listener log to see what the connect rate is. Run the listener log analyzer script to check.
The listener is a queue-based process. It receives connect requests from the lower level protocol stack. It has a limited queue stack which is configurable to the operating system maximum. It can only process one connection at a time, and there is a limit to the number of connections a second the process can handle.
If the rate at which the connect requests arrive exceeds that limit, then the requests are queued. The queue stack is also limited, but you can configure it. If there are more listener processes, then the requests made against each individual process are fewer and are handled more quickly.
Increasing the listener queue is done in the listener
.ora
file. The listener
.ora
file can contain many listeners, each by a different name. It is assumed that only one of those listed is having a problem. If not, then apply this method to all applicable listeners. To increase the listener queue, add (queuesize
= number
) to the listener
.ora
file. For example:
listener = (address = (protocol = tcp) (host = sales-pc) (port = 1521) (queuesize = 20) )
Stop and restart the listener to initialize this new parameter. If you are not currently running a shared server configuration, then consider doing so. It is faster for the listener to handle a client request in a shared server configuration than it is in a dedicated server configuration.
This section encompasses local area network (LAN) and wide area network (WAN) troubleshooting methods.
Networks entail overhead that adds a certain amount of delay to processing. To optimize performance, you must ensure that your network throughput is fast, and you should try to reduce the number of messages that must be sent over the network. It can be difficult to measure the delay the network adds.
Three dynamic performance views are useful for measuring the network delay:
In V$SESSION_EVENT
, the AVERAGE_WAIT
column indicates the amount of time that Oracle waits between messages. You can use this statistic as a yardstick to evaluate the effectiveness of the network.
In V$SESSION_WAIT
, the EVENT
column lists the events for which active sessions are waiting. The "sqlnet
message
from
client
" wait event indicates that the shared or foreground process is waiting for a message from a client. If this wait event has occurred, then you can check to see whether the message has been sent by the user or received by Oracle.
You can investigate hang-ups by looking at V$SESSION_WAIT
to see what the sessions are waiting for. If a client has sent a message, then you can determine whether Oracle is responding to it or is still waiting for it.
In V$SESSTAT
you can see the number of bytes that have been received from the client, the number of bytes sent to the client, and the number of calls the client has made.
The most critical aspects of a network that contribute to performance are latency and bandwidth.
Local Area Network Topologies:
Wide Area Network Topologies:
Table 11-1 lists the most common ratings for various topologies.
This section describes several techniques for enhancing performance and solving network problems.
The first step in solving network problem is to understand the overall topology. Gather as much information about the network that you can. This kind of information usually manifests itself as a network diagram. Your diagram should contain the types of network technology used in the Local Area Network and the Wide Area Network. It should also contain addresses of the various network segments.
Examine this information. Obvious network bottlenecks include the following:
There are many problems that can cause a performance breakdown. Follow this checklist:
If nothing is revealed, then find the network route from the client to the data server. Understanding the travel times on a network gives you an idea as to the time a transaction will take. Client-server communication requires many small packets. High latency on a network slows the transaction down due to the time interval between sending a request and getting the response.
Use trace route (traceroute
or equivalent) from the client to the server to get address information for each device in the path.
For example:
traceroute usmail05 Tracing route to usmail05.us.oracle.com [144.25.88.200]over a maximum of 30 hops: 1 <10 ms <10 ms 10 ms whq1davis-rtr-749-f1-0-a.us.oracle.com [144.25.216.1] 2 <10 ms <10 ms <10 ms whq4op3-rtr-723-f0-0.us.oracle.com [144.25.252.23] 3 220 ms 210 ms 231 ms usmail05.us.oracle.com [144.25.88.200] Trace complete.
Ping each device in turn to get the timings. Use large packets to get the slowest times. Make sure you set the "don't fragment bit" so that routers do not spend time disassembling and reassembling the packet. Also note that the packet size is 1472. This is for Ethernet. Ethernet packets are 1536 octets (actual 8 bit bytes) in size. ICPM packets (this is what ping is designed to use) have 64 octets of header. Evaluate the area where the slowness seems to occur.
For example:
ping -l 1472 -n 1 -f 144.25.216.1 Pinging 144.25.216.1 with 1472 bytes of data: Reply from 144.25.216.1: bytes=1472 time<10ms TTL=255 ping -l 1472 -n 1 -f 144.25.252.23 Pinging 144.25.252.23 with 1472 bytes of data: Reply from 144.25.252.23: bytes=1472 time=10ms TTL=254 ping -l 1472 -n 1 -f 144.25.88.200 Pinging 144.25.88.200 with 1472 bytes of data: Reply from 144.25.88.200: bytes=1472 time=271ms TTL=253
The previous example validates trace route. Ideally, you ping from the workstation to 144.25.216.1, from 144.25.216.1 to 144.25.252.23, then from 144.25.252.23 to 144.25.88.200. This would show the exact latency on each segment traveled.
This section helps you determine the problem with your network bottleneck.
Oracle Net tracing reveals whether an error is Oracle-specific or due to conditions that the operating system is passing to the Transparent Network Substrate (Oracle TNS layer).
Enable Oracle Net tracing at the Oracle server, the listener, and at a client suspected of having the problem you are trying to resolve.
To enable tracing at the server, find the sqlnet
.ora
file for the server and create the following lines in it:
TRACE_TIMESTAMP_SERVER = ON TRACE_LEVEL_SERVER = 16 TRACE_UNIQUE_SERVER = ON
To enable tracing at the client, find the sqlnet
.ora
file for the client and create the following lines in it:
TRACE_TIMESTAMP_CLIENT = ON TRACE_LEVEL_CLIENT = 16 TRACE_UNIQUE_CLIENT = ON
To enable tracing at the listener, find the listener
.ora
file and create the following line in it:
TRACE_TIMESTAMP_listener_name = ON TRACE_LEVEL_listener_name = 16
Note: The |
Reproduce the problem, so that you generate traces on the client and server. Now analyze the traces generated.
See Also:
|
If the problem is with the network and not Oracle Net, then you must determine the following:
For example, perhaps the system is fine in the building where the Data Center is located, but it is slow in other buildings that are several miles away.
Not all Oracle error codes represent pure Oracle troubles. ORA-3113
is the most common error that points to an underlying network problem.
If you are getting an Oracle error message, then look into the trace file to find the error. For troubleshooting bugs, Oracle Net trace analysis takes some time to fully find the problem. However, high-level simple trace analysis is rather simple.
If the problem is with Oracle Net, then use Oracle Net tracing to show you where the problem lies. If there are errors in the trace files, then do they appear in only the client traces, only in the server traces, or in both?
The problem is on the client. However, if you are getting ORA-3113
or ORA-3114
errors, then the problem is on the server.
The problem is on the server. However, if you are getting ORA-3113
or ORA-3114
errors, then the problem is on the client.
If you are getting ORA-3113
or ORA-3114
errors, then the problem is on the Network. Troubleshoot the server first. If it is fine, then the client is at fault.
The shared server architecture can be more complex to troubleshoot. Check the initialization parameter file for any shared server parameters. Look at the operating system to see if any of the shared server processes are present.
Check for dispatchers by looking for names such as ora_d000
, ora_d001
, and so on. For example:
ps -ef | grep ora_d
Check for shared servers by looking for names such as ora_s000
, ora_s001
, and so on. For example:
ps -ef | grep ora_s
See Also:
|
Reduce network calls by using array interfaces. Instead of fetching one row at a time, it is more efficient to fetch 10 rows with a single network round trip.
See Also:
Oracle Call Interface Programmer's Guide for more information on array interfaces |
Before sending data across the network, Oracle Net buffers data into the Session Data Unit (SDU). It sends the data stored in this buffer when the buffer is full or when an application tries to read the data. When large amounts of data are being retrieved and when packet size is consistently the same, it might speed retrieval to adjust the default SDU size.
Optimal SDU size depends on the normal transport size. Use a sniffer to find out the frame size, or set tracing on to its highest level to check the number of packets sent and received and to determine whether they are fragmented. Tune your system to limit the amount of fragmentation.
Use Oracle Net Configuration Assistant to configure a change to the default SDU size on both the client and the server; SDU size is generally the same on both.
When a session is established, Oracle Net packages and sends data between server and client using packets. The TCP
.NODELAY
parameter, which causes packets to be flushed on to the network more frequently, is enabled by default. Although Oracle Net supports many networking protocols, TCP tends to have the best scalability.
In Oracle Net, you can use the Connection Manager to conserve system resources by multiplexing. Multiplexing means funneling many client sessions through a single transport connection to a server destination. This way, you can increase the number of sessions that a process can handle. This applies only to shared server configurations. Alternately, you can use Connection Manager to control client access to dedicated servers. Connection Manager provides multiple protocol support allowing a client and server with different networking protocols to communicate.
See Also:
Oracle Net Services Administrator's Guide for more information on Connection Manager |