Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Eliminating Orphaned Oracle Sessions with SQLNET.EXPIRE_TIME

Tech 3

SQLNET.EXPIRE_TIME enables server-side dead connection detection (DCD) by sending periodic probes to verify cliant/server connections are still alive. When the server cannot reach the client, it raises an error on the connection and the server process exits, allowing PMON to reclaim database resources and release locks. A non-zero value activates the probe; a value of 0 disables it.

Key constraints and behavior:

  • Intended for server-side use where many sessions are handled concurrently.
  • Not supported for bequeath (BEQ) connections.
  • Probes add small network overhead and may marginally impact performence.
  • Some platforms require additional processing to distinguish probe events, which can slightly degrade performance.
  • A typical starting value is 10 minutes; use smaller values only where quicker cleanup is essential.

Dead Connections vs. Idle Sestions

  • Dead connections

    • The client process or network path disappears unexpectedly (power loss, OS reboot, cable unplug, VPN drop) while the session’s server process remains.
    • With DCD enabled, the server periodically sends a 10-byte probe. If no response arrives, the connection is marked dead and PMON cleans up database state and OS resources.
    • Optional: SQLNET.RECV_TIMEOUT on the server can bound how long a server waits for data on a connection.
  • Idle (INACTIVE) sessions

    • Sessions that are connected but not currently executing work (v$session.status = 'INACTIVE'). They are not necessarily dead and should not be killed automatically.
    • Profiles with RESOURCE_LIMIT and IDLE_TIME can mark such sessions SNIPED and subject them to server-side cleanup according to policy.

Server-Side Configuration

Edit $ORACLE_HOME/network/admin/sqlnet.ora to include a non-zero expire time. The following example sets a 1-minute probe interval for demonstration; use a higher value (for example, 10) in production.

# $ORACLE_HOME/network/admin/sqlnet.ora
SQLNET.EXPIRE_TIME = 1

# Optional: enable server network tracing (remove in production)
TRACE_LEVEL_SERVER = USER
TRACE_FILE_SERVER = dcd_demo
TRACE_DIRECTORY_SERVER = /u01/app/oracle/network/trace
TRACE_UNIQUE_SERVER = ON
DIAG_ADR_ENABLED = OFF

Restart not required for the database instance; new server processes will honor the updated sqlnet.ora automatically.

Reproducing and Observing Cleanup

The scenario below demonstrates a client that loses connectivity while holding a row lock. DCD later detects the dead connection and PMON releases resources.

1) Client activity (Windows)

C:\> sqlplus scott/tiger@ora11g

SQL*Plus: Release 11.2.0.1.0 - Production

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

-- Take a lock without committing
SQL> UPDATE emp SET sal = sal * 1.1 WHERE deptno = 20;
5 rows updated.

-- Simulate a network failure (e.g., disable the NIC or unplug the cable)
-- Issue another statement to reveal the broken pipe
SQL> SELECT * FROM dual;
SELECT * FROM dual
              *
ERROR at line 1:
ORA-03113: end-of-file on communication channel

2) Server-side checks (Linux)

Identify the session and its OS process:

-- Map session to OS process
SELECT s.sid,
       s.serial#,
       s.username,
       s.status,
       s.machine,
       p.spid AS server_pid
FROM   v$session s
JOIN   v$process p ON p.addr = s.paddr
WHERE  s.username = 'SCOTT';

Example output (trimmed):

 SID SERIAL# USERNAME STATUS   MACHINE        SERVER_PID
---- ------- -------- -------- -------------- ----------
 15     447  SCOTT    INACTIVE TRADESZ\PC39   29522

Locate held locks:

SELECT o.object_name,
       lo.locked_mode,
       s.sid,
       s.serial#
FROM   v$locked_object lo
JOIN   dba_objects o ON o.object_id = lo.object_id
JOIN   v$session s    ON s.sid = lo.session_id
WHERE  s.username = 'SCOTT';

Sample output:

OBJECT_NAME LOCKED_MODE SID SERIAL#
----------- ----------- --- -------
EMP                  3  15     447

Confirm the server process exists:

$ ps -ef | grep 29522 | grep -v grep
oracle  29522     1  0 09:58 ?        00:00:00 oracleora11g (LOCAL=NO)

Connectivity to the client is lost:

$ ping 192.168.7.133
From 192.168.7.40 icmp_seq=2 Destination Host Unreachable
From 192.168.7.40 icmp_seq=3 Destination Host Unreachable

Track process count for reference:

SELECT COUNT(*) AS proc_count FROM v$process;

3) Cleanup after EXPIRE_TIME

After the DCD interval elapses and the probe continues to fail, the server marks the connection as dead and PMON releases resources:

  • The session row disappears from v$session for SCOTT.
  • Locks vanish from v$locked_object.
  • The server process (SPID 29522 in this example) exits, and v$process count drops.

Quick verification:

# Process reclaimed
$ ps -ef | grep 29522 | grep -v grep
# (no output)
-- No remaining locks from this session
SELECT * FROM v$locked_object WHERE session_id = 15;
-- no rows

Inspecting Server Traces

If server tracing was enabled, confirm DCD activity in the generated file under $ORACLE_HOME/network/trace.

$ grep -n "dead connection detection" /u01/app/oracle/network/trace/dcd_demo_*.trc
78:[25-JUN-2013 09:58:02:853] niotns: Enabling dead connection detection (1 min)

Timer lifecycle entries typically appear as well:

$ grep -nE "nstim(start|armed|clear)" /u01/app/oracle/network/trace/dcd_demo_*.trc
447:[25-JUN-2013 09:58:03:050] nstimstart: starting timer
451:[25-JUN-2013 09:58:03:051] nsconbrok: timer created for connection
4092:[25-JUN-2013 10:18:26:173] nstimarmed: timer is armed, with value 3833
4096:[25-JUN-2013 10:18:26:173] nstimclear: normal exit

Notes and operational guidance

  • DCD clears dead connections, not idle-but-healthy ones. Both dead and idle sessions can appear as INACTIVE in v$session before cleanup policies apply.
  • Profiles (RESOURCE_LIMIT + IDLE_TIME) can mark long-idle sessions SNIPED. Combine this with SQLNET.EXPIRE_TIME to address both idle and dead cases.
  • A non-zero SQLNET.EXPIRE_TIME introduces periodic probe traffic. Choose a value that balances cleanup speed and overhead; 10 minutes is a common baseline.
  • Actual cleanup can take longer than the configured interval due to TCP stack behavior, retransmits, platform timers, and PMON scheduling. Short intervals do not guarantee immediate release.
  • DCD does not apply to local/bequeath connections.
  • Consider SQLNET.RECV_TIMEOUT to prevent a server process from waiting indefinitely for client data.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.