23
Tuning Networks

This chapter describes different connection models and introduces networking issues that affect tuning.

This chapter contains the following sections:

Understanding Connection Models

The techniques used to determine the source of problems vary depending on the configuration. You can have a shared server configuration or a dedicated server configuration.

If you have a shared server configuration, then LSNRCTL services lists dispatchers.
If you have a dedicated server configuration, then LSNRCTL services lists dedicated servers.

It is possible to connect to dedicated server with a database configured for shared servers by placing the parameter (SERVER = DEDICATED) in the connect descriptor.

Shared Server Configuration

This section discusses the setups for the shared server configuration.

Registering the Dispatchers

The LSNRCTL control utility's services statement lists every dispatcher registered with it. This list includes the dispatchers process ID. You can check the alert log to confirm that the dispatchers have been started successfully.

Note:

Remember that PMON can take a minute to register the dispatcher with the listener.

LSNRCTL> services
Connecting to
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=helios)(PORT=1521)))
Services Summary...
Service "sales.us.acme.com" has 1 instance(s).
  Instance "sales", status READY, has 3 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         LOCAL SERVER
      "D000" established:0 refused:0 current:0 max:10000 state:ready
         DISPATCHER <machine: helios, pid: 1689>
         (ADDRESS=(PROTOCOL=tcp)(HOST=helios)(PORT=52414))
      "D001" established:0 refused:0 current:0 max:10000 state:ready
         DISPATCHER <machine: helios, pid: 1691>
         (ADDRESS=(PROTOCOL=tcp)(HOST=helios)(PORT=52415))
The command completed successfully.

See Also:

Oracle9i Net Services Administrator's Guide for information on setting the output mode

Configuring the Initialization Parameter File

Make sure that the DISPATCHERS line is correctly set. For example:

DISPATCHERS = "(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)
               (HOST=hostname)(PORT=1492)(queuesize=32))) 
          (DISPATCHERS = 1) 
          (LISTENER = alias) 
          (SERVICE = servicename) 
          (SESSIONS = 1000) 
          (CONNECTIONS = 1000) 
          (MULTIPLEX = ON) 
          (POOL = ON) 
          (TICK = 5)"

One, and only one, of the following attributes is required:

PROTOCOL
ADDRESS
DESCRIPTION

ADDRESS and DESCRIPTION provide support for the specification of additional network attributes beyond PROTOCOL. In the previous example, the entire DISPATCHERS line can be (PROTOCOL=TCP). The attributes DISPATCHERS, LISTENER, SERVICE, SESSIONS, CONNECTIONS, MULTIPLEX, POOL, and TICKS are all optional.

Make sure that the optional MAX_DISPATCHERS line is correctly set. For example:
```
MAX_DISPATCHERS = 4
```
This line should reflect the total number of dispatchers you want to start.
Make sure that the optional MAX_SHARED_SERVERS line is correctly set. For example:
```
MAX_SHARED_SERVERS = 5
```
This line sets the upper bound on the total number of shared servers PMON can create, based on the peak load of the system. This should be set high enough so that all requests can be serviced, but not so high that the system swaps if they are reached. The purpose of this parameter is to prevent the server from swapping. Run the following script to see what the highwater mark is for the number of servers running, and then set MAX_SHARED_SERVERS to more then this.
```
SELECT maximum_connections "MAX CONN", servers_started "STARTED", servers_
terminated "TERMINATED", servers_highwater "HIGHWATER" FROM V$SHARED_SERVER_
MONITOR;
```

Make sure that the optional SHARED_SERVERS line is correctly set. For example:
```
SHARED_SERVERS = 5
```
This is the total number of shared servers started when the database is started. It also represents the total number of shared servers PMON tries to keep. It should be the total number of servers expected to be used when the database is active. MAX_SHARED_SERVERS is intended to handle peak load.

Checking the Connections

Use the LSNRCTL control utility's services command to see if there are excessive connection refusals. Check the listener's log file to see if this is a connection problem. For example:

LSNRCTL> services
Connecting to
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=helios)(PORT=1521)))
Services Summary...
Service "sales.us.acme.com" has 1 instance(s).
  Instance "sales", status READY, has 2 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:11 refused:0 state:ready
         LOCAL SERVER
      "D000" established:565 refused:4 current:155 max:10000 state:ready
         DISPATCHER <machine: helios, pid: 5673>
         (ADDRESS=(PROTOCOL=tcp)(HOST=helios)(PORT=38411))
The command completed successfully.

Under normal conditions, the number refused should be zero. Shut down the listener and restart it to erase these statistics. If the refused count is increasing after the listener restarts, then the connections are being refused. If the refused count stays at zero, and if the problem you are troubleshooting is occurring, then your problem is not with the connections being refused.

Checking the Connect/Second Rate

Connection refusals can occur for many reasons. Examine the listener log to see what the connect rate is. Run the listener log analyzer script to check.

The listener is a queue-based process. It receives connect requests from the lower level protocol stack. It has a limited queue stack which is configurable to the operating system maximum. It can only process one connection at a time, and there is a limit to the number of connections a second the process can handle.

If the rate at which the connect requests arrive exceeds that limit, then the requests are queued. The queue stack is also limited, but you can configure it. If there are more listener processes, then the requests made against each individual process are fewer and are handled more quickly.

Increasing the listener queue is done in the listener.ora file. The listener.ora file can contain many listeners, each by a different name. It is assumed that only one of those listed is having a problem. If not, then apply this method to all applicable listeners. To increase the listener queue, add (queuesize = number) to the listener.ora file. For example:

listener = 
     (address = 
          (protocol = tcp)
          (host = sales-pc) 
          (port = 1521) 
          (queuesize = 20)
     )

Stop and restart the listener to initialize this new parameter. If you are not currently running a shared server configuration, then consider doing so. It is faster for the listener to handle a client request in a shared server configuration than it is in a dedicated server configuration.

Note:

Shared server dispatchers also receive connect requests and can also benefit from tuning the queue size.

The maximum queue size is subject to the maximum size possible for a particular operating system.

Detecting Network Problems

This section encompasses local area network (LAN) and wide area network (WAN) troubleshooting methods.

Using Dynamic Performance Views for Network Performance

Networks entail overhead that adds a certain amount of delay to processing. To optimize performance, you must ensure that your network throughput is fast, and you should try to reduce the number of messages that must be sent over the network. It can be difficult to measure the delay the network adds.

Three dynamic performance views are useful for measuring the network delay:

V$SESSION_EVENT
V$SESSION_WAIT
V$SESSTAT

In V$SESSION_EVENT, the AVERAGE_WAIT column indicates the amount of time that Oracle waits between messages. You can use this statistic as a yardstick to evaluate the effectiveness of the network.

In V$SESSION_WAIT, the EVENT column lists the events for which active sessions are waiting. The "sqlnet message from client" wait event indicates that the shared or foreground process is waiting for a message from a client. If this wait event has occurred, then you can check to see whether the message has been sent by the user or received by Oracle.

You can investigate hang-ups by looking at V$SESSION_WAIT to see what the sessions are waiting for. If a client has sent a message, then you can determine whether Oracle is responding to it or is still waiting for it.

In V$SESSTAT you can see the number of bytes that have been received from the client, the number of bytes sent to the client, and the number of calls the client has made.

Understanding Latency and Bandwidth

The most critical aspects of a network that contribute to performance are latency and bandwidth.

Latency refers to a time delay; for example, the gap between the time a device requests access to a network and the time it receives permission to transmit.
Bandwidth is the throughput capacity of a network medium or protocol. Variations in the network signals can cause degradation on the network. Sources of degradation can be cables that are too long or wrong cable type. External noise sources, such as elevators, air handlers, or florescent lights, can also cause problems.

Common Network Topologies

Local Area Network Topologies:

Ethernet
Fast Ethernet
1 Gigabit Ethernet
Token Ring
FDDI
ATM

Wide Area Network Topologies:

DSL
ISDN
Frame Relay
T-1, T-3, E-1, E-3
ATM
SONAT

Table 23-1 lists the most common ratings for various topologies.

Table 23-1 Bandwidth Ratings

Topology or Carrier	Bandwidth
Ethernet	10 Megabits/second
Fast Ethernet	100 Megabits/second
1 Gigabit Ethernet	1 Gigabits/second
Token Ring	16 Megabits/second
FDDI	100 Megabits/second
ATM	155 Megabits/second (OC3), 622 Megabits/second (OC12)
T-1 (US only)	1.544 Megabits/second
T-3 (US only)	44.736 Megabits/second
E-1 (non-US)	2.048 Megabits/second
E-3 (non-US)	34.368 Megabits/second
Frame Relay	Committed Information Rate, which can be up to the carrier speed, but usually is not.
DSL	This can be up to the carrier speed.
ISDN	This can be up to the carrier speed. Usually, it is used with slower modems.
Dial Up Modems	56 Kilobits/second. Usually, it is accompanied with data compression for faster throughput.

Solving Network Problems

This section describes several techniques for enhancing performance and solving network problems.

Finding Network Bottlenecks

The first step in solving network problem is to understand the overall topology. Gather as much information about the network that you can. This kind of information usually manifests itself as a network diagram. Your diagram should contain the types of network technology used in the Local Area Network and the Wide Area Network. It should also contain addresses of the various network segments.

Examine this information. Obvious network bottlenecks include the following:

Using a dial-up modem (normal modem or ISDN) to access time critical data.
A frame relay link is running on a T-1, but has a 9.6 Kilobits CIR so that it only reliably transmits up to 9.6 Kilobit's a second and if the rest of the bandwidth is used, then there is a possibly that the data will be lost.
Data from high speed networks channels through low speed networks.
There are too many network hops. A router constitutes one hop.
A 10 Megabit network for a Web site.

There are many problems that can cause a performance breakdown. Follow this checklist:

Get a network sniffer trace.
Check the following:
- Is the bandwidth being exceeded on the network, the client, or the server?
- Ethernet collisions.
- Token ring or FDDI ring beacons.
- Are there many runt frames?
- The stability of the WAN links.
Get a bandwidth utilization chart for frame relay, and see if CIR is being exceeded.
Is any quality of service or packet prioritizing going on?
Is a firewall in the way somewhere?

If nothing is revealed, then find the network route from the client to the data server. Understanding the travel times on a network gives you an idea as to the time a transaction will take. Client-server communication requires many small packets. High latency on a network slows the transaction down due to the time interval between sending a request and getting the response.

Use trace route (trcroute or equivalent) from the client to the server to get address information for each device in the path. For example:

tracert usmail05 
Tracing route to usmail05.us.oracle.com [144.25.88.200]over a maximum of 30 
hops: 
  1   <10 ms   <10 ms    10 ms  whq1davis-rtr-749-f1-0-a.us.oracle.com 
[144.25.216.1] 
  2   <10 ms   <10 ms   <10 ms  whq4op3-rtr-723-f0-0.us.oracle.com 
[144.25.252.23] 
  3   220 ms   210 ms   231 ms  usmail05.us.oracle.com [144.25.88.200] 

Trace complete.

Ping each device in turn to get the timings. Use large packets to get the slowest times. Make sure you set the "don't fragment bit" so that routers do not spend time disassembling and reassembling the packet. Also note that the packet size is 1472. This is for Ethernet. Ethernet packets are 1536 octets (actual 8 bit bytes) in size. ICPM packets (this is what ping is designed to use) have 64 octets of header. Evaluate the area where the slowness seems to occur. For example:

ping -l 1472 -n 1 -f 144.25.216.1 
Pinging 144.25.216.1 with 1472 bytes of data: 
Reply from 144.25.216.1: bytes=1472 time<10ms TTL=255 

ping -l 1472 -n 1 -f 144.25.252.23 
Pinging 144.25.252.23 with 1472 bytes of data: 
Reply from 144.25.252.23: bytes=1472 time=10ms TTL=254 

ping -l 1472 -n 1 -f 144.25.88.200 
Pinging 144.25.88.200 with 1472 bytes of data: 
Reply from 144.25.88.200: bytes=1472 time=271ms TTL=253

The previous example validates trace route. Ideally, you ping from the workstation to 144.25.216.1, from 144.25.216.1 to 144.25.252.23, then from 144.25.252.23 to 144.25.88.200. This would show the exact latency on each segment traveled.

Dissecting Network Bottlenecks

This section helps you determine the problem with your network bottleneck.

Determining if the Problem is with Oracle Net or the Network

Oracle Net tracing reveals whether an error is Oracle-specific or due to conditions that the operating system is passing to the Transparent Network Substrate (Oracle TNS layer).

Enable Oracle Net tracing at the Oracle server, the listener, and at a client suspected of having the problem you are trying to resolve.

To enable tracing at the server, find the sqlnet.ora file for the server and create the following lines in it:

TRACE_TIMESTAMP_SERVER = ON
TRACE_LEVEL_SERVER = 16 
TRACE_UNIQUE_SERVER = ON

To enable tracing at the client, find the sqlnet.ora file for the client and create the following lines in it:

TRACE_TIMESTAMP_CLIENT = ON
TRACE_LEVEL_CLIENT = 16 
TRACE_UNIQUE_CLIENT = ON

To enable tracing at the listener, find the listener.ora file and create the following line in it:

TRACE_TIMESTAMP_listener_name = ON
TRACE_LEVEL_listener_name = 16

Note:

The TRACE_TIMESTAMP_x parameters are optional, but they should be included for better debugging

Reproduce the problem, so that you generate traces on the client and server. Now analyze the traces generated.

See Also:

Oracle9i Net Services Administrator's Guide for detailed directions on enabling Oracle Net tracing
Oracle9i Database Error Messages for definitions to Oracle Net errors noted in the trace file

If the problem is with the network and not Oracle Net, then you must determine the following:

Does the problem only occur in one location on the local network?
Does the problem only occur in one area on the WAN?

For example, perhaps the system is fine in the building where the Data Center is located, but it is slow in other buildings that are several miles away.

Not all Oracle error codes represent pure Oracle troubles. ORA-3113 is the most common error that points to an underlying network problem.

Note:

Enabling tracing on the server can generate a large amount of trace files. To prevent this, set up a separate environment that traces itself. This configuration works for dedicated connections. First, log in to the server's operating system as the Oracle software owner. Create a temporary directory to keep configuration files and trace files that will be created. Copy the sqlnet.ora, listener.ora, and tnsnames.ora to that directory. Edit the sqlnet.ora file to enable tracing. Add to the sqlnet.ora file the following line:

TRACE_DIRECTORY_SERVER = temporary_directory_just_created

Now, modify the listener.ora file and change the listening port (for TCP, other protocols, use a similar technique) to an unused port. You need to make a similar modification to the client's tnsnames.ora file for the connect string you will be using for this test.

Set the TNS_ADMIN environment to point to the temporary directory. Start the listener. Now all new connections to the new listener send Server traces to this directory. Reproduce the problem.

If you are getting an Oracle error message, then look into the trace file to find the error. For troubleshooting bugs, Oracle Net trace analysis takes some time to fully find the problem. However, high-level simple trace analysis is rather simple.

Determining if the Problem is on the Client or the Server (on Oracle Net)

If the problem is with Oracle Net, then use Oracle Net tracing to show you where the problem lies. If there are errors in the trace files, then do they appear in only the client traces, only in the server traces, or in both?

Errors Only in the Client Trace

The problem is on the client. However, if you are getting ORA-3113 or ORA-3114 errors, then the problem is on the server.

Errors Only in the Server Trace or Listener Trace

The problem is on the server. However, if you are getting ORA-3113 or ORA-3114 errors, then the problem is on the client.

Errors in All: Client, Server, and Listener Trace

If you are getting ORA-3113 or ORA-3114 errors, then the problem is on the Network. Troubleshoot the server first. If it is fine, then the client is at fault.

Checking if the Server is Configured for Shared Servers

The shared server architecture can be more complex to troubleshoot. Check the initialization parameter file for any shared server parameters. Look at the operating system to see if any of the shared server processes are present.

Check for dispatchers by looking for names such as ora_d000, ora_d001, and so on. For example:

ps -ef | grep ora_d

Check for shared servers by looking for names such as ora_s000, ora_s001, and so on. For example:

ps -ef | grep ora_s

See Also:

"Shared Server Configuration" for more information on tuning the shared server
Oracle9i Database Concepts and Oracle9i Net Services Administrator's Guide for more information on shared server concepts and parameters

Using Array Interfaces

Reduce network calls by using array interfaces. Instead of fetching one row at a time, it is more efficient to fetch 10 rows with a single network round trip.

See Also:

Oracle Call Interface Programmer's Guide for more information on array interfaces

Adjusting Session Data Unit Buffer Size

Before sending data across the network, Oracle Net buffers data into the Session Data Unit (SDU). It sends the data stored in this buffer when the buffer is full or when an application tries to read the data. When large amounts of data are being retrieved and when packet size is consistently the same, it might speed retrieval to adjust the default SDU size.

Optimal SDU size depends on the normal transport size. Use a sniffer to find out the frame size, or set tracing on to its highest level to check the number of packets sent and received and to determine whether they are fragmented. Tune your system to limit the amount of fragmentation.

Use Oracle Net Configuration Assistant to configure a change to the default SDU size on both the client and the server; SDU size is generally the same on both.

Using TCP.NODELAY

When a session is established, Oracle Net packages and sends data between server and client using packets. The TCP.NODELAY parameter, which causes packets to be flushed on to the network more frequently, is enabled by default. Although Oracle Net supports many networking protocols, TCP tends to have the best scalability.

See Also:

Your platform-specific Oracle documentation for more information on TCP.NODELAY

Using Connection Manager

In Oracle Net, you can use the Connection Manager to conserve system resources by multiplexing. Multiplexing means funneling many client sessions through a single transport connection to a server destination. This way, you can increase the number of sessions that a process can handle. This applies only to shared server configurations. Alternately, you can use Connection Manager to control client access to dedicated servers. Connection Manager provides multiple protocol support allowing a client and server with different networking protocols to communicate.

See Also:

Oracle9i Net Services Administrator's Guide for more information on Connection Manager

23 Tuning Networks

Understanding Connection Models

Shared Server Configuration

Registering the Dispatchers

Configuring the Initialization Parameter File

Checking the Connections

Checking the Connect/Second Rate

Detecting Network Problems

Using Dynamic Performance Views for Network Performance

Understanding Latency and Bandwidth

Common Network Topologies

Table 23-1 Bandwidth Ratings

Solving Network Problems

Finding Network Bottlenecks

Dissecting Network Bottlenecks

Determining if the Problem is with Oracle Net or the Network

Determining if the Problem is on the Client or the Server (on Oracle Net)

Errors Only in the Client Trace

Errors Only in the Server Trace or Listener Trace

Errors in All: Client, Server, and Listener Trace

Checking if the Server is Configured for Shared Servers

Using Array Interfaces

Adjusting Session Data Unit Buffer Size

Using TCP.NODELAY

Using Connection Manager

23
Tuning Networks