Identify the Corruption Extension for Block Corruption, Table/Index Inconsistency, Data Dictionary and Lost Writes (Doc ID 836658.1)

 

In this Document

Purpose
Scope
 Corruption Types
Details
 Datafile Block Corruption - Intra block corruption
 RMAN - Identify Datafile Block Corruptions
 DBVerify - Identify Datafile Block Corruptions
 RMAN Vs DBVerify - Datafile Intra Block Corruption
 Identify TABLE / INDEX Mismatch
 Identify Extents Inconsistency in Locally Managed Tablespace
 Identify Extents Inconsistency in Dictionary Managed Tablespaces
 Identify Data Dictionary Inconsistency
 Identify corruption caused by LOST WRITES
References

Applies to:

Oracle Database - Enterprise Edition - Version 10.2.0.1 to 12.1.0.2 [Release 10.2 to 12.1]
Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Information in this document applies to any platform.

Purpose

The purpose of this article is to describe in a single note how to identify the most common corruptions/inconsistencies in Oracle 10g or above.

Scope

Corruption Types

There are several types of corruptions like:

  • Datafile Block Corruption - Physical/Logical
  • Table/Index Mismatch
  • Extents Inconsistencies
  • Data Dictionary Inconsistencies

 
The next section is intended to describe the tools used to detect corruption for each corruption type.

Details

Datafile Block Corruption - Intra block corruption

It refers to intra block corruptions that may cause different errors like ORA-1578, ORA-8103, ORA-1410, ORA-600 etc.

Oracle classifies the corruptions as Physical and Logical.  See Note 840978.1

RMAN - Identify Datafile Block Corruptions

  • To identify both Physical and Logical Block Corruptions use the "CHECK LOGICAL" option. The next command checks the complete database for both corruptions without actually doing a backup:
$ rman target /
RMAN>  backup check logical validate database;
The next command checks the complete database for both corruptions in a backup:
$ rman target /
RMAN> backup check logical database
  • Check the view V$DATABASE_BLOCK_CORRUPTION to identify the block corruptions detected by RMAN. 
  • Use Doc ID 472231.1 (section "Step 2: Identify the corrupt segments") to identify all the Corrupted Objects in the Database reported by RMAN.
  • The above command can use PARALLELISM using multiple channels to make the validation faster.  Reference Doc ID 472231.1 for examples of PARALLELISM.
  • By Default RMAN backups, without the CHECK LOGICAL option, only detect Physical Block Corruptions.
  • In 10g and when using Locally Managed Tablespaces (LMT) RMAN may not check for blocks that belong to free extents; blocks in extents present in dba_free_space.

DBVerify - Identify Datafile Block Corruptions

  • DBVERIFY identifies Physical and Logical Intra Block Corruptions by default. Dbverify cannot be run for the whole database in a single command. It does not need a database connection either:
dbv file=<datafile name> blocksize=<datafile Block size>
  • Use Doc ID 819533.1 to identify all the corrupt objects reported by DBVerify
  • Use Doc ID 369076.1 to simulate running dbv in PARALLEL for a single datafile.

RMAN Vs DBVerify - Datafile Intra Block Corruption

It has been a dilemma of what tool to use when identifying intra block corruptions.  Here are some comparisons between RMAN and DBV:
  • When the logical option is used by RMAN, it does exactly the same checks as DBV does for intra block corruption.
  • RMAN can be run with PARALLELISM using multiple channels making it faster than DBV which  cannot be run in parallel in a single command.  See Note 472231.1 for examples.
  • DBV checks for empty blocks. In 10g RMAN may not check blocks in free extents when Locally Managed Tablespaces are used. In 11g RMAN checks for both free and used extents.
  • Both DBV and RMAN (11g) can check for a range of blocks. RMAN: VALIDATE DATAFILE 1 BLOCK 10 to 100;.  DBV: start=10 end=100
  • RMAN keeps corruption information in the control file (v$database_block_corruption, v$backup_corruption). DBV does not. 
  • RMAN may not report the corruption details like what is exactly corrupted in a block reported as a LOGICAL corrupted block. DBV reports the corruption details in the screen or in a log file.
  • DBV can scan blocks with a higher SCN than a given SCN (HIGH_SCN clause).
  • DBV does not need a connection to the database.
  • RMAN does not detect the Logical Corruptions described in Doc ID 7517208.8 (DBV does).

Identify TABLE / INDEX Mismatch

Table / Index inconsistencies is when an entry in the Table does not exist in the Index or vice versa. The common errors are ORA-8102, ORA-600 [kdsgrp1], ORA-1499 by "analyze validate structure cascade".

  • The tool to identify TABLE / INDEX inconsistencies is the ANALYZE command:
analyze table <table name> validate structure cascade <ONLINE>;
When an inconsistency is identified, the above analyze command will produce error ORA-1499 and a trace file.
  • Use Doc ID 100419.1 to run ANALYZE with the CASCADE option for a specific tablespace.
  • Use Doc ID 563070.1 when the trace file produced by ANALYZE command has the message "Table/Index row count mismatch".

Identify Extents Inconsistency in Locally Managed Tablespace

It refers to inconsistencies in the section used by Locally Managed Tablespaces (LMT) to keep information about used and free extents. 

Examples of these inconsistencies are:

  • Overlapped Extents (2 segments could incorrectly use the same block). The common errors are ORA-8103, ORA-1410, ORA-600 [kdddgb2]
  • A free extent reported as used


LMT is identified by EXTENT_MANAGEMENT='LOCAL' in view dba_tablespaces.

  • To identify inconsistencies in LMT for MSSM (Manual Segment Space Management - dba_tablespaces.SEGMENT_SPACE_MANAGEMENT='MANUAL'):
alter session set tracefile_identifier='TABLESPACE_VERIFY';
execute dbms_space_admin.tablespace_verify('&tablespace_name')
oradebug setmypid
oradebug tracefile_name
  • To identify inconsistencies in LMT for ASSM (Automatic Segment Space Management - dba_tablespaces.SEGMENT_SPACE_MANAGEMENT='AUTO'):
alter session set tracefile_identifier='ASSM_TABLESPACE_VERIFY';
execute dbms_space_admin.assm_tablespace_verify('&tablespace_name',dbms_space_admin.TS_VERIFY_BITMAPS)
oradebug setmypid
oradebug tracefile_name
If there are inconsistencies, dbms_space_admin  generates a trace file. Note that the session running dbms_space_admin may not report in the screen if inconsistencies are detected.

Reference Doc ID 887263.1 to identify the segments reported in the trace file.

 

Identify Extents Inconsistency in Dictionary Managed Tablespaces

It refers to inconsistencies in data dictionary tables FET$/UET$ used by Dictionary Managed Tablespaces  to keep information of free and used extents respectively.

Examples of these inconsistencies are:

  • Overlapped Extents (2 segments could incorrectly use the same block). The common errors are ORA-8103, ORA-1410
  • Free extent reported as used

Dictionary Managed Tablespace are identified by EXTENT_MANAGEMENT='DICTIONARY' in view dba_tablespaces.

  • To identify inconsistencies in Dictionary Managed Tablespaces run script hcheck:
Doc ID 136697.1 "hcheck.sql" script to check for known problems in Oracle8i, Oracle9i, Oracle10g and Oracle 11g

 

Identify Data Dictionary Inconsistency

It refers to inconsistencies between Data Dictionary tables.  These tables are owned by the user SYS and are stored in the SYSTEM tablespace to keep track of the users, tables, indexes, etc that are created in the database. 

Example of this inconsistency is: 

  • 'Problem: OBJ$.OWNER# not in USER$" which refers to a user in table OBJ$ that does not exist in USER$.
  • To identify known Data Dictionary inconsistencies  run script hcheck:
Doc ID 136697.1 "hcheck.sql" script to check for known problems in Oracle8i, Oracle9i, Oracle10g and Oracle 11g

  

Identify corruption caused by LOST WRITES

A data block lost write occurs when an I/O subsystem acknowledges the completion of the block write, while in fact the write did not occur in the persistent storage.  The result is that the block in the database is a stale/old copy which is not logical or physical corrupt; block internal structures are correct. Reference Doc ID 840978.1 for more information about Physical and Logical block corruption.  

A block with lost changes may produce several errors when compared with another context like ORA-600 [kdsgrp1] (table/index inconsistency or invalid chained row pointer), ORA-8103 (old object id),ORA-00600:[ktsircinfo_num1], ORA-00600:[kclchkblk_3],  ORA-00600:[kclchkblkdma_3] etc; or the next errors during media recovery (like in a physical standby): ORA-600 [3020], ORA-752 (if db_lost_write_protect is enabled).

  • DBV/RMAN are not intended to identify inconsistency caused by LOST Write:

Identifying the corruption extension by lost IO is not straight forward as dbverify/rman run intra-block checks (blocks are not compared with another context). The block itself is healthy as structures are valid (not garbage). However, in very rare cases a block can be indirectly exposed to logical corruption especially in the space management area if there is a lost write. Example is that the block was being marked as full in the metadata but that change was lost. Subsequent inserts may logically corrupt the block.

  • Media RECOVERY / Physical Standby

The best option is to have a media recovery in place like a standby database or restore/recover the database in another system.  Media recovery performs checks to identify if the block content is the one expected as the redo structure keeps track of block previous version (expected scn) and compare it with the current block scn.  If there is a mismatch then ORA-600 [3020] or ORA-752 are produced.

If a physical standby database is configured, then set now DB_LOST_WRITE_PROTECT to TYPICAL on primary and standby databases. By doing so the primary database will create redo entries for Reads (select statements) which helps to detect faster a lost write in the standby.

There is the option to run a diagnostic trial recovery, this scans the redo for problems but does not actually make any changes to the recovered database. Trial recovery reports any additional corruptions in the alert.log. The RECOVER ... TEST statement can be used to invoke trial recovery. Refer to Doc ID 283262.1 for additional details on trial recovery.

  • Overlapped Extents


Lost Writes may cause fatal problems like segments overwriting each other due to space management inconsistency in the metadata; overlapped extents.  Use the next sections in this document:

Identify Extents Inconsistency in Locally Managed Tablespace
Identify Data Dictionary Inconsistency 

 

  • ANALYZE


Running "analyze table validate structure cascade" may detect inconsistency caused by Lost Writes as it compares index content with the respective table content.  If there is a mismatch, ORA-1499 is reported.  It may also identify old block versions only if they have a different data object id that may cause ORA-8103, ORA-1410 etc.  Analyze with the cascade option may take significant time for all tables in the entire database.

  • Segment space management metadata DBMS_SPACE_ADMIN.SEGMENT_VERIFY

This procedure checks the metadata associated to space management for a specific segment.  Example for an LOB segment stored in ASSM tablespace:

alter session set tracefile_identifier='ASSM_SEGMENT_VERIFY';
exec DBMS_SPACE_ADMIN.ASSM_SEGMENT_VERIFY('&OWNER','&LOBSEGMENT','LOB',null,DBMS_SPACE_ADMIN.SEGMENT_VERIFY_SPECIFIC,DBMS_SPACE_ADMIN.BITMAPS_CHECK)
oradebug setmypid
oradebug tracefile_name

For MSSM segment:

alter session set tracefile_identifier='SEGMENT_VERIFY';
exec dbms_space_admin.segment_verify('&TABLESPACENAME',&header_file,&header_block)
oradebug setmypid
oradebug tracefile_name

 

  • EXPORT

Running export may help to identify if a chained row has an invalid pointer which may cause ORA-00600 [25027] or ORA-00600 [kdsgrp1] but export does not identify an old block version or if there is a table/index mismatch.  In UNIX systems it can be done to /dev/null:

exp <schema>/<schema_passwd> full=y log=exp_validation.log file=/dev/null volsize=0

 

  • ASM

If ASM is used run validations as described in Doc ID 1675152.1.

 

 

 

  

References

NOTE:369076.1 - How To Run DBV In Parallel On A Single Data File
NOTE:136697.1 - hcheck.sql - Script to Check Data Dictionary for Known Problems
NOTE:819533.1 - How to identify the corrupt Object reported by ORA-1578 / RMAN / DBVERIFY
NOTE:472231.1 - How to identify all the Corrupted Objects in the Database with RMAN
NOTE:840978.1 - Physical and Logical Block Corruptions. All you wanted to know about it.
NOTE:887263.1 - How to identify the SEGMENT reported by dbms_space_admin.TABLESPACE_VERIFY
NOTE:1554054.1 - Quick method to identify table/index mismatch when analyze validate structure cascade takes significant time
NOTE:100419.1 - SCRIPT: VALIDATE.SQL to ANALYZE .. VALIDATE STRUCTURE objects in a Tablespace
NOTE:563070.1 - ORA-1499. Table/Index row count mismatch
NOTE:1675152.1 - Collecting The Required Information For Support To Validate & Troubleshooting ASM Diskgroup Corruptions.

Comments

Popular posts from this blog

How To Check if Asynchronous I/O is Working On Linux (Doc ID 237299.1)

Primary Note for Handling Oracle Database Corruption Issues (Doc ID 1088018.1)