Considerations when backing up from the Oracle Standby Database (Data Guard) with rman to shared storage (NFS)


To reduce load on the Primary DB in a Data Guard configuration you may decide to do your rman backups from the Standby site. A very common approach today is to use NFS as the backup location, especially when using platforms låike Exadata Cloud at Customer. I recently had a case where both the Primary and the Standby site had the same NFS-location mounted and used that as the location for backups.

REMARK: To protect against ransomware attacks, products like Cohesity provide NFS-targets with a Write Once Read Many (WORM) mechanism. I.e. backups can be written, but are then immutable and hence cannot be modified by ransomware.

To be able to backup your Oracle DBs from the Primary and Standby site to shared NFS and also be able to restore and recover the DB on Primary and Standby site from shared NFS a couple of things have to be considered:

Visibility of Backups

Backups written to disk are only visible on the site, which created the backup. Backups to tape are visible on Primary and Standby site. In any case an rman catalog has to be used to see backups from both sites. Below the rman-documentation concerning this topic:

In a Data Guard environment, the recovery catalog considers disk backups as accessible only to the database with which they are associated, whereas tape backups created on one database are accessible to all databases.

Note:
You can transfer a backup from a standby host to a primary host or vice versa, connect as TARGET to the database on this host, and then use the CATALOG command to catalog the backup. After a file is cataloged by the target database, the file is associated with the target database.

REMARK: You may use the following command in RMAN to make backups of the other site visible in your session:

RMAN> SET BACKUP FILES FOR DEVICE TYPE DISK TO ACCESSIBLE;

My colleague has blogged about it some time ago.

The issue is that this is not persistent and has to be set for every RMAN-session again. I.e. an archivelog deletion policy „BACKED UP 1 TIMES TO DISK“ is not considered for backups from the other site after using above command.

So only catalogued backups are visible. I.e. we need an rman-job, which catalogs backups done on the other site regularly.

RMAN-Connection with password required

Doing backups on the Standby site requires an RMAN-connection to the DB with a password. I.e.

RMAN> connect target sys/mysupersecretpwd

instead of

RMAN> connect target /

because a consistent backup needs the redo, which was produced during the backup. I.e. at the end of the e.g. INC0-backup the current online redo has to be archived. That operation has to be done on the Primary site. To do that, rman needs the credentials to login to the Primary site to archive the redo on Primary to be able to back it up at the standby site.

See MOS Note „RMAN-06820 ORA-17629 During Backup at Standby Site (Doc ID 1616074.1)“

Retention policy on Standby site

A command „CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF x DAYS;“ fails on the Standby-Site with RMAN-05021. E.g.

RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 2 DAYS;

resync only records later than timestamp 1128819277
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of configure command at 02/15/2023 13:47:08
RMAN-05021: this configuration cannot be changed for a BACKUP or STANDBY control file

The workaround is to NOT use the command on standby and instead either

  • set it on Primary and recreate the standby controlfile
  • or use the retention policy in the delete obsolete statements. E.g.
delete noprompt obsolete recovery window of <RET_POLICY> days device type disk;

See MOS Note „RMAN-5021 this configuration cannot be changed for a BACKUP or STANDBY (Doc ID 1519386.1)“

Snapshot controlfile with unique name per site

If your snapshot controlfile is on the shared backup location you should make sure to provide a unique name for the standby controlfile on primary and standby. E.g. on Primary:

CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/my_nfs_backup/snapcf_P_MYDB.f';

And on Standby:

CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/my_nfs_backup/snapcf_S_MYDB.f';

Make sure the standby is in synch with the Primary before starting the backup

You may want to check that the standby DB is in synch with the primary before starting a backup.

With the broker active the following SQL can be used on the standby DB to check if there is a lag:

SELECT
CASE 
   WHEN decode(value,'+00 00:00:00','no lag',NULL,'unknown','lag')='no lag' AND (sysdate-to_date(time_computed,'mm/dd/yyyy hh24:mi:ss'))*60*60*24 < 31
   THEN 'no_lag'
   WHEN decode(value,'+00 00:00:00','no lag',NULL,'unknown','lag')='lag'
   THEN 'lag'
   ELSE 'unknown'
END LAG
from dual, v$dataguard_stats where decode(dummy,'X','apply lag')=name(+) ;

Please consider that I do not allow any lag (should be in synch), but I do allow the lag computation to be from up to 30 seconds ago. E.g. when looking at DGMGRL’s command „show configuration lag“ then I would show no lag in the following case:

Apply Lag:          0 seconds (computed 20 seconds ago)

But I would show a lag here:

Apply Lag:          1 seconds (computed 1 second ago)

ARCHIVELOG DELETION POLICY considerations

The ARCHIVELOG DELETION POLICY should be adjusted when writing archivelogs to the Fast Recovery Area and doing backups on the standby site. I.e. archived redo has to become reclaimable when it has been applied to the standby site(s) and has been backed up. Usually we do use the following setting:

On the site where backups happen:

RMAN> CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY  BACKED UP 1 TIMES TO DISK;

On the site where no backups happen:

RMAN> CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;

If archivelog backups have been catalogued at both sites (i.e. at the site which hasn’t done the backup) then both sites can use

RMAN> CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY  BACKED UP 1 TIMES TO DISK;

to make the archived redo in the Fast Recovery Area reclaimable, which has been applied on all standby and has been backed up.

Summary

So in summary you should consider the following when backing up on Primary and Standby (or only on Standby) to a „disk“ visible on both Primary and Standby:

  • use an rman catalog
  • catalog all backups regularly on the site which didn’t do the backup
    E.g. with
    RMAN> catalog start with ‚/rman_disk_backup‘ noprompt;
    This is necessary to „see“ all backups on both sites. If you restore a database on a site which haven’t done all backups then you should catalog before doing restore and recovery.
  • provide a password when connecting to the DB for a backup on the standby site
    E.g. with
    RMAN> connect target sys/mysupersecretpwd@standby_site
    Otherwise archiving of the current redo after the backup is not possible and would make the backup „incomplete“ as the archivelog to recover to a consistent point is not available in the backup.
  • if the snapshot controlfile is on the shared NFS storage then make sure to have a unique name for both sites
  • you should check that the standby DB is in synch with the primary before starting a backup
  • when changing the retention policy then do that on the primary site and recreate the standby controlfile. Alternatively always specify the retention when doing a „delete obsolete“.
    E.g.
    delete noprompt obsolete recovery window of 31 days device type disk;

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert