Thursday, 4 February 2010

Oracle Advanced Replication Technology

This section discusses critical issues when considering replication techniques for Oracle caused by the increase in complexity and management overhead with growth in both the size and the number of the objects to replicate as well as the number of destinations
By Ahmed HAZZAF ahmed.hazzaf@one-smartway.com

INTRODUCTION

This section discusses critical issues when considering replication techniques for Oracle caused by the increase in complexity and management overhead with growth in both the size and the number of the objects to replicate as well as the number of destinations

Scope & Application
To be used by DBAs to employ basic troubleshooting techniques for advanced replication propagation

Type of replication
Updateable materialized views: The snapshot sites are able to modify the data and send their changes back to the master. The Master site determines the frequency of the refreshes and the frequency with which updates are sent back to the master. There is a second type where the site just receives data and don't modify it

Type of Refresh
Existing types of refresh:
Fast refresh: Apply only modifications since last refresh

Complete refresh: First, Truncate concerned Materialized View and insert all data from the master into the slave.

Conflicts Management
When Oracle propagates an update to destination tables, it expects the current data for the row at the destination to match the data at the originating site prior to the update. If the data is not the same, an update conflict results. Similarly, if an insert fails because of a primary key violation (i.e., a unique constraint violation) the result is a uniqueness conflict or violation. And, if the target row of a delete does not exist at the destination site, a delete conflict results.

REPLICATION MONITORING PROCESS
The purpose of this section is to provide basic steps for monitoring advanced replication propagation process and the underlying mechanism it uses.

Check for not refreshed views

In this step, we need to connect to each slave DB (Site) and check for not refreshed views with the following query:

SELECT OWNER,
NAME,
To_Char (LAST_REFRESH, 'DD-MM-RRRR HH24:MI:SS'),
REFRESH_GROUP
FROM DBA_SNAPSHOTS
WHERE LAST_REFRESH < SYSDATE-1

Result:

No rows returned

Action:

If this operation returns one or more Materialized View, you will need to refresh manually the displayed snapshots. Even if the problem persists, please continue with step number two (Check Database links)

Check Database links

Check the existence of the public and private links with the following query at each site involved to propagation:

column owner format a15
column db_link format a45
column username format a15
select owner, db_link, username from dba_db_links;

Result:

OWNER DB_LINK Username
--------------- ----------------------------- ---------------
PUBLIC db1.WORLD
USER1 db1.WORLD


Action:

Test each of the links and ensure that the global name matches the link name with the following query:

connect as database link owner

select * from global_name@db.world;

It is important that the correct links exist for the user who owns the job that performs the replication push job and if the db

Check replication group status

For propagation to be successful the status of the replication groups must be normal:

select gname, status from dba_repgroup;

Result:

GNAME STATUS

------------------------------ ---------

GROUP1 NORMAL

Action:

If the group status is QUIESCING The dbms_repcat.suspend _master_activity () API call has been issued for the master repgroup and is in the process of suspending (quiescing) the master repgroup. Neither DML nor dbms_repcat operations can be performed on objects in the master repgroup. In this case we need to issue a resume in an attempt to get replication working again, do the following:

Connect to the master definition site as repadmin and resume activity on the repgroup:

execute dbms_repcat.resume_master_activity('');

Check status of existing admin requests

Data cannot move between replicated sites if there are outstanding admin requests for the associated master group and there should not be any such requests if the group displays with status normal. Use the following query to check for admin requests:

select id, request, status, master from dba_repcatlog;

Result:

ID REQUEST STATUS MASTER

---------- ----------------------------- -------------- ------

222 RESUME_MASTER_ACTIVITY AWAIT_CALLBACK EE817

221 SUSPEND_MASTER_ACTIVITY AWAIT_CALLBACK EE817

222 RESUME_MASTER_ACTIVITY READY REP817

Action:

You will need to delete the current suspend_master_activity request(s) using the dbms_repcat.purge_master_log('',null,null) package.

execute dbms_repcat.purge_master_log('',null,null)

To purge the suspend_master_activity request from the example above:

execute dbms_repcat.purge_master_log('221',null,null);

PL/SQL procedure successfully completed.

The resume_master_activity request(s) should be the only one(s).

Check replicated object status

For the propagation and queuing of data changes to be successful to remote sites, replicated tables (objects) must display as valid at all replication sites.

select oname, status, generation_status,

replication_trigger_exists, internal_package_exists

from dba_repobject

where status != 'VALID'

or replication_trigger_exists != 'Y'

or internal_package_exists != 'Y';

Result:

ONAME STATUS GENERATIO R I

------------------------------ ---------- --------- - -

CUSTOMERS_TABLE VALID N N

Action:

If the above statement returns invalid replication objects, execute the following statement with Snapadmin user:

SQL>Alter Materialized View mview_name Compile;

If problems persist, run the following statement to ensure that all dependant SYS and SYSTEM objects are valid:

column owner format a25

column object_name format a30

column object_type format a15

select owner, object_name, object_type

from dba_objects

where status != 'VALID'

and owner in ('SYS', 'SYSTEM');

Checking for errors

If a job fails while attempting to push or purge replication data changes, errors will be written to the alertSID.log. Additional and more detailed information will go into the following files referred to by the alert.log:

- V9 and above: SID_cjq0_nnnnn.TRC and SID_jnnn_nnnn.TRC

The format of these files may vary between operating systems, their location can be determined by running the following from SQLPLUS:

SQL> show parameter dump_dest

Result :

NAME TYPE VALUE

--------------------- ----------- ------------------------------

background_dump_dest string /u/app/oracle/bdump

core_dump_dest string /u/app/oracle/cdump

user_dump_dest string /u/app/oracle/udump

Action:

Check dump files with Oracle DBA

Check & Purge of deferred transactions

Oracle uses a lazy algorithm to purge deferred transactions from the local queue. It is important that the purge job runs regularly to clear down this queue because the same underlying table is used for transactions waiting to go to remote sites as for those which have been pushed but not purged. If there is a large number of transactions to be purged it can affect the performance:

declare

rc binary_integer;

begin

rc := sys.dbms_defer_sys.purge(delay_seconds=>0);

end;

Check for locks

When a Materialized view could not be refreshed and the refresh process hangs, it’s very important to check for locks. Use the following select statement to identify sessions blocking propagation at the site where propagation is pushing data too:

SET ECHO off

SET linesize 132 pagesize 66

break on Kill on username on terminal

column Kill heading 'Kill String' format a13

column res heading 'Resource Type' format 999

column id1 format 9999990

column id2 format 9999990

column lmode heading 'Lock Held' format a20

column request heading 'Lock Requested' format a20

column serial# format 99999

column username format a10 heading "Username"

column terminal heading Terminal format a12

column tab format a35 heading "Table Name"

column owner format a9

column Address format a18

selectnvl(S.USERNAME,'Internal') username,

nvl(S.TERMINAL,'None') terminal,

L.SID||','||S.SERIAL# Kill,

U1.NAME||'.'||substr(T1.NAME,1,20) tab,

decode(L.LMODE,1,'No Lock',

2,'Row Share',

3,'Row Exclusive',

4,'Share',

5,'Share Row Exclusive',

6,'Exclusive',null) lmode,

decode(L.REQUEST,1,'No Lock',

2,'Row Share',

3,'Row Exclusive',

4,'Share',

5,'Share Row Exclusive',

6,'Exclusive',null) request

fromV$LOCK L,

V$SESSION S,

SYS.USER$ U1,

SYS.OBJ$ T1

where L.SID = S.SID

and T1.OBJ# = decode(L.ID2,0,L.ID1,L.ID2)

and U1.USER# = T1.OWNER#

and S.TYPE != 'BACKGROUND'

order by 1,2,5;

Result:

Username Kill String Table Name Lock Held Lock Requested

-------- ----------- ---------- --------- ---------------

WAEL 92,6480 USER1.CUSTOMERS_TABLE Row Exclusive

Action:

Check if the concerned materialized view exists under “Table Name” column. If yes, kill session by following:

alter system kill session '92,6480'

Check job_queue_processes is set

The init.ora parameters that control the Oracle job queues must be set, execute the following statement to check that they are correctly set at each replication site:

SQL> show parameter job

Result:

NAME TYPE VALUE

-------------------------------- ------- --------

Job_queue_interval integer 30

Job_queue_processes integer 4 <= must be > 0

Action:

If the parameter don’t exist we can add it to Oracle Spfile :

ALTER SYSTEM set job_queue_processes = 10 SCOPE=SPFILE;

Restart the Database

Check for Schedule push jobs

We can check the UNIX cron by issuing:

db:/oracle9> crontab –l


HOW TO MANAGE DIFFERED QUEUES

Terminating a deferred queue push job that is currently running
There will be situations where the running push job needs to be terminated and prevented from running again, until the current problem that is being encountered is resolved. Perform the following steps:

- Break the job with:

execute dbms_job.broken(, true);

- Kill the Job Queue Process from the Operating System.

- After killing the process, wait approximately 1 minute, to ensure the job is removed from dba_jobs_running.

How to clear down large deferred queues
Database administrators frequently have to make the decision to terminate propagation and manually resynchronise their replicated environments when the deferred queue has become large and or slow.

Once the decision has been made to clear down the queue and resynchronise the data, use DBMS_DEFER_SYS.DELETE_TRAN

Oracle normally recommends customers use dbms_defer_sys.delete_tran with all arguments set to null, to remove transactions from the deferred queue. However when the queue is very large this may not be the most efficient mechanism for clearing the queue.

The basic steps to clear down the queue are listed below, if the queue is large:

- Terminate the current push operation and prevent it from re-running.

- Stop remote sites from replicating to the local site.

- DO NOT attempt to suspend or quiesce the replicated environment that will try to push the queue again and introduce admin requests that also need to be cleaned out.

- If the system schema has optimiser statistics defined, make sure they are up to date, by analyzing the schema with compute statistics. In general Oracle does not recommend adding statistics to the system schema tables.

- From SQLPLUS :

execute dbms_defer_sys.delete_tran (null, null);

By Ahmed HAZZAF

ahmed.hazzaf@one-smartway.com

No comments: