14 October 2011

Using Incremental Global Statistics for Partitioned Tables

Oracle has been working on improving statistics gathering in the database from release to release. With 11g my favorite statistics gathering features are: setting database/schema/table preferences - and incremental global statistics gathering on partitioned tables.
I recently came across a comparatively large database and noticed that nearly 2000 segments had no statistics on them. Furthermore the largest (partitioned) table in the database did not have global stats gathered for more than 6 years - and many of its partitions had no statistics at all. This did cause problems with sub-optimal plans being used by some queries.
Further analysis revealed that a 4 hour window was allowed for maintenance, including database statistics gathering - during work days only. Weekend windows were disabled. The job would run for exactly 4 hours every day. It quickly became clear that the job never completed and was terminated at the end of the maintenance window; persistently leaving many segments without any statistics behind.
The default action when incremental statistics are not enabled for partitioned tables is to maintain global table statistics using a full table scan. Obviously, for multi-terabyte tables this comes at a significant I/O cost and therefore takes a lot of time. The largest table happened to be in line for (global) statistics gathering and a full table scan across all partitions never returned during each 4 hour maintenance window.
Oracle suggests that the key difference between the internal GATHER_DATABASE_STATS_JOB_PROC procedure in DBMS_STATS, which runs during the maintenance window, and GATHER_DATABASE_STATS with option “GATHER AUTO” is that of the former prioritising objects that need statistics gathered. Although I didn’t investigate this to any great extent it probably helps explain the persistent order for gathering statistics causing statistics not to be gathered for some segments that were prioritised lower. On the other hand the latter option appears to run through segments eligible for statistics gathering in alphabetic order.

When to Use Incremental Statistics Gathering?

Many use cases of partitioned tables involve creating new partitions on a regular basis and filling those partitions with data. As the new partitions and data are added, statistics should be gathered on this data.
A typical scenario for incremental statistics is therefore instances where a new daily, weekly or monthly partition is added and data for the period is loaded. I.e. not all partitions are continuously loaded with substantial amounts of data.
The picture is less clear cut when it comes to back-filling data in older partitions. If all partitions in a table are “touched” by DML on a regular basis, then the partitions with significant changes, i.e. 10+% for INSERTs+UPDATEs+DELETEs compared to NUM_ROWS per default [6] will be analyzed again. In this case the only advantage of this approach is that global statistics on the table are gathered by calculating statistics for all partitions - as opposed to a full tablescan, which in itself could be a significant saver, preventing ORA-1555’s after hours of runs and eliminating statistics gathering job runs persistently never completing within the maintenance window.

How to Enable Incremental Statistics Gathering?

This is quite simple. Oracle has clearly stipulated the conditions required for Incremental Statistics Gathering [1].  The table preferences for partitioned tables incremental statistics gathering must be set as follows:
begin
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'ESTIMATE_PERCENT', DBMS_STATS.AUTO_SAMPLE_SIZE);
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'INCREMENTAL', 'TRUE');
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'PUBLISH', 'TRUE');
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'GRANULARITY', 'AUTO');
end;
/
To optionally remove any previous statistics first, run:
exec dbms_stats.delete_table_stats(ownname=>'TESTUSER', tabname=>'INCTAB', cascade_columns=>true, cascade_indexes=>true)
Running the following will immediately get started with incremental statistics gathering on all tables where it has been set-up:
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO')

The Test Case

The following DDL and DML sets up the test case, which was run on an 11.2.0.2.3 database on Linux. It first gathers statistics on all database objects eligible for statistics gathering. It then sets up tablespace, user and partitioned table, populates the table with test data as well as does some test queries to generate COL_USAGE$ statistics for histograms (used later). Finally, incremental statistics are enabled and database statistics are gathered again in order to generate statistics on the new table.
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO')
create tablespace testdata datafile '/u01/oradata/orcl1/testdata_01.dbf' size 1023m extent management local autoallocate segment space management auto;
create user testuser identified by testuser
 default tablespace testdata
 quota unlimited on testdata;
grant create session, create table, create view, create procedure, create sequence to testuser;
DROP TABLE TESTUSER.INCTAB;
CREATE TABLE TESTUSER.INCTAB
  (    TESTCOL1 VARCHAR2(120) NOT NULL ENABLE,
       TESTCOL2 VARCHAR2(30) NOT NULL ENABLE,
       TESTCOL3 DATE NOT NULL ENABLE,
       TESTCOL4 NUMBER(38,5),
       CONSTRAINT PK_INCTAB PRIMARY KEY (TESTCOL1, TESTCOL2, TESTCOL3)
  )
 TABLESPACE TESTDATA
 PARTITION BY RANGE (TESTCOL3)
(PARTITION PART_20110830  VALUES LESS THAN (TO_DATE('2011-09-01', 'YYYY-MM-DD'))
 SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110901  VALUES LESS THAN (TO_DATE('2011-09-02', 'YYYY-MM-DD'))
 SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110902  VALUES LESS THAN (TO_DATE('2011-09-03', 'YYYY-MM-DD'))
 SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110903  VALUES LESS THAN (TO_DATE('2011-09-04', 'YYYY-MM-DD'))
 SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110904  VALUES LESS THAN (TO_DATE('2011-09-05', 'YYYY-MM-DD'))
 SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110905  VALUES LESS THAN (TO_DATE('2011-09-06', 'YYYY-MM-DD'))
 SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110906  VALUES LESS THAN (TO_DATE('2011-09-07', 'YYYY-MM-DD'))
 SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110907  VALUES LESS THAN (TO_DATE('2011-09-08', 'YYYY-MM-DD'))
 SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA
);
truncate table TESTUSER.INCTAB;
insert into TESTUSER.INCTAB
select * from (
select 'test1' TESTCOL1, 'METRIC1' TESTCOL2, TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + 7*rownum/10001 TESTCOL3, 0 TESTCOL4
 from dba_objects o1, dba_extents o2
where rownum < 10001
union all
select 'test2', 'METRIC2', TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + 7*rownum/5001, 0
 from dba_objects o1, dba_extents o2
where rownum < 5001
union all
select 'test3', 'METRIC3', TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + 7*rownum/1001, 0
 from dba_objects o1, dba_extents o2
where rownum < 1001
union all
select 'test4', 'METRIC4', TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + 7*rownum/101, 0
 from dba_objects o1, dba_extents o2
where rownum < 101
union all
select 'test5', 'METRIC5', TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + 7*rownum/51, 0
 from dba_objects o1, dba_extents o2
where rownum < 51
);
commit;
exec dbms_stats.delete_table_stats(ownname=>'TESTUSER', tabname=>'INCTAB', cascade_columns=>true, cascade_indexes=>true)
-- Prime some equality predicate operations to prime col_usage$ for histograms
select count(*) from TESTUSER.INCTAB where TESTCOL1='test2';
select count(*) from TESTUSER.INCTAB where TESTCOL1 between 'test2' and 'test5';
select count(*) from TESTUSER.INCTAB where TESTCOL1 like 'test2%';
select count(*) from TESTUSER.INCTAB where TESTCOL2='METRIC3';
select count(*) from TESTUSER.INCTAB where TESTCOL2 between 'METRIC3' and 'METRIC5';
select count(*) from TESTUSER.INCTAB where TESTCOL2 like 'METRIC%';
exec dbms_stats.flush_database_monitoring_info
set long 8192
SELECT dbms_stats.report_col_usage('TESTUSER', 'INCTAB') from dual;
###############################################################################
COLUMN USAGE REPORT FOR TESTUSER.INCTAB
.......................................
1. TESTCOL1                            : EQ RANGE LIKE
2. TESTCOL2                            : EQ RANGE LIKE
###############################################################################
begin
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'ESTIMATE_PERCENT', DBMS_STATS.AUTO_SAMPLE_SIZE);
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'INCREMENTAL', 'TRUE');
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'PUBLISH', 'TRUE');
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'GRANULARITY', 'AUTO');
end;
/
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO');
select partition_name,sample_size,num_rows, last_analyzed
 from dba_tab_partitions
where table_owner='TESTUSER'
  and table_name='INCTAB'
order by partition_name;
PARTITION_NAME                 SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
------------------------------ ----------- ---------- -------------------
PART_20110830                                      0 2011-10-14 13:25:36
PART_20110901                         2305       2305 2011-10-14 13:25:36
PART_20110902                         2307       2307 2011-10-14 13:25:36
PART_20110903                         2309       2309 2011-10-14 13:25:35
PART_20110904                         2307       2307 2011-10-14 13:25:35
PART_20110905                         2309       2309 2011-10-14 13:25:35
PART_20110906                         2307       2307 2011-10-14 13:25:35
PART_20110907                         2306       2306 2011-10-14 13:25:35
8 rows selected.
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
no rows selected
select sample_size,num_rows, last_analyzed
 from dba_tables
where owner='TESTUSER'
  and table_name='INCTAB';
SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
----------- ---------- -------------------
     16150      16150 2011-10-14 13:25:35
select COLUMN_NAME, NUM_DISTINCT, DENSITY, NUM_NULLS, LAST_ANALYZED, SAMPLE_SIZE, AVG_COL_LEN, HISTOGRAM
 from dba_tab_columns
where owner='TESTUSER'
  and table_name='INCTAB'
order by column_id;
COLUMN_NAME                    NUM_DISTINCT    DENSITY  NUM_NULLS LAST_ANALYZED       SAMPLE_SIZE AVG_COL_LEN HISTOGRAM
------------------------------ ------------ ---------- ---------- ------------------- ----------- ----------- ---------------
TESTCOL1                                  5  .00003096          0 2011-10-14 13:25:35       16150           6 FREQUENCY
TESTCOL2                                  5  .00003096          0 2011-10-14 13:25:35       16150           8 FREQUENCY
TESTCOL3                              16048 .000062313          0 2011-10-14 13:25:35       16150           8 NONE
TESTCOL4                                  1          1          0 2011-10-14 13:25:35       16150           2 NONE
select column_name, count(*)
 from dba_tab_histograms
where owner='TESTUSER'
  and table_name = 'INCTAB'
group by column_name
order by column_name;
COLUMN_NAME                      COUNT(*)
------------------------------ ----------
TESTCOL1                                5
TESTCOL2                                5
TESTCOL3                                2
TESTCOL4                                2

Observations

Statistics Gathering Thresholds

This section shows that there is an adherence to the 10% threshold on both partitions and global table statistics as documented by Oracle [6]. It shows this only for INSERTs in order to keep the example simple and concise.
One interesting observation is that statistics are gathered on individual partitions for which 10+% rows (compared to NUM_ROWS) are changed. However, this does not lead to an update in global table statistics. Global table statistics are not computed until at least 10+% of rows for all partitions in the table have changed.
Another interesting observation is that when global table statistics are gathered, all partitions for which outstanding table modifications have been recorded will have statistics gathered individually as well - before global table statistics are computed from the incremental partition statistics.
Refer to the current statistics that exist on the table in the previous section.
Then proceed with adding 10% minus one row to one partition (230 / 2305 existing rows):
insert into TESTUSER.INCTAB
select * from (
select 'test6' TESTCOL1, 'METRIC1' TESTCOL2, TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + rownum/231 TESTCOL3, 0 TESTCOL4
 from dba_objects o1, dba_extents o2
where rownum < 231
);
230 rows created.
commit;
exec dbms_stats.flush_database_monitoring_info
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
PARTITION_NAME                    INSERTS    UPDATES    DELETES TIMESTAMP           TRU DROP_SEGMENTS
------------------------------ ---------- ---------- ---------- ------------------- --- -------------
                                     230          0          0 2011-10-14 13:31:48 NO              0
PART_20110901                         230          0          0 2011-10-14 13:31:48 NO              0
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO')
select partition_name,sample_size,num_rows, last_analyzed
 from dba_tab_partitions
where table_owner='TESTUSER'
  and table_name='INCTAB'
order by partition_name;
PARTITION_NAME                 SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
------------------------------ ----------- ---------- -------------------
PART_20110830                                      0 2011-10-14 13:25:36
PART_20110901                         2305       2305 2011-10-14 13:25:36
PART_20110902                         2307       2307 2011-10-14 13:25:36
PART_20110903                         2309       2309 2011-10-14 13:25:35
PART_20110904                         2307       2307 2011-10-14 13:25:35
PART_20110905                         2309       2309 2011-10-14 13:25:35
PART_20110906                         2307       2307 2011-10-14 13:25:35
PART_20110907                         2306       2306 2011-10-14 13:25:35
8 rows selected.
As can be seen, no new statistics have been gathered at this point. Let’s add one more row (231 / 2305) and see what happens next:
insert into TESTUSER.INCTAB
select * from (
select 'test7' TESTCOL1, 'METRIC1' TESTCOL2, TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + rownum/2 TESTCOL3, 0 TESTCOL4
 from dba_objects o1, dba_extents o2
where rownum < 2
);
1 row created.
commit;
exec dbms_stats.flush_database_monitoring_info
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
PARTITION_NAME                    INSERTS    UPDATES    DELETES TIMESTAMP           TRU DROP_SEGMENTS
------------------------------ ---------- ---------- ---------- ------------------- --- -------------
                                     231          0          0 2011-10-14 13:36:27 NO              0
PART_20110901                         231          0          0 2011-10-14 13:36:27 NO              0
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO')
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
PARTITION_NAME                    INSERTS    UPDATES    DELETES TIMESTAMP           TRU DROP_SEGMENTS
------------------------------ ---------- ---------- ---------- ------------------- --- -------------
                                     231          0          0 2011-10-14 13:36:27 NO              0
select partition_name,sample_size,num_rows, last_analyzed
 from dba_tab_partitions
where table_owner='TESTUSER'
  and table_name='INCTAB'
order by partition_name;
PARTITION_NAME                 SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
------------------------------ ----------- ---------- -------------------
PART_20110830                                      0 2011-10-14 13:25:36
PART_20110901                         2536       2536 2011-10-14 13:36:47
PART_20110902                         2307       2307 2011-10-14 13:25:36
PART_20110903                         2309       2309 2011-10-14 13:25:35
PART_20110904                         2307       2307 2011-10-14 13:25:35
PART_20110905                         2309       2309 2011-10-14 13:25:35
PART_20110906                         2307       2307 2011-10-14 13:25:35
PART_20110907                         2306       2306 2011-10-14 13:25:35
8 rows selected.
select sample_size,num_rows, last_analyzed
 from dba_tables
where owner='TESTUSER'
  and table_name='INCTAB';
SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
----------- ---------- -------------------
     16150      16150 2011-10-14 13:25:35
We can see that statistics for the partition have been gathered. However, global statistics remain unaffected.
Let’s add enough rows to make up exactly 10% rows to the table over all, i.e. 1615 - we have already inserted 231 - so lets insert 1384 more spread across the partitions:
insert into TESTUSER.INCTAB
select * from (
select 'test8' TESTCOL1, 'METRIC1' TESTCOL2, TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + 7*rownum/1385 TESTCOL3, 0 TESTCOL4
 from dba_objects o1, dba_extents o2
where rownum < 1385
);
commit;
exec dbms_stats.flush_database_monitoring_info
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
PARTITION_NAME                    INSERTS    UPDATES    DELETES TIMESTAMP           TRU DROP_SEGMENTS
------------------------------ ---------- ---------- ---------- ------------------- --- -------------
                                    1615          0          0 2011-10-14 13:47:37 NO              0
PART_20110901                         198          0          0 2011-10-14 13:47:37 NO              0
PART_20110902                         198          0          0 2011-10-14 13:46:02 NO              0
PART_20110903                         198          0          0 2011-10-14 13:46:02 NO              0
PART_20110904                         197          0          0 2011-10-14 13:46:02 NO              0
PART_20110905                         198          0          0 2011-10-14 13:46:02 NO              0
PART_20110906                         198          0          0 2011-10-14 13:46:02 NO              0
PART_20110907                         197          0          0 2011-10-14 13:46:02 NO              0
8 rows selected.
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO')
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
PARTITION_NAME                    INSERTS    UPDATES    DELETES TIMESTAMP           TRU DROP_SEGMENTS
------------------------------ ---------- ---------- ---------- ------------------- --- -------------
                                    1615          0          0 2011-10-14 13:47:37 NO              0
PART_20110901                         198          0          0 2011-10-14 13:47:37 NO              0
PART_20110902                         198          0          0 2011-10-14 13:46:02 NO              0
PART_20110903                         198          0          0 2011-10-14 13:46:02 NO              0
PART_20110904                         197          0          0 2011-10-14 13:46:02 NO              0
PART_20110905                         198          0          0 2011-10-14 13:46:02 NO              0
PART_20110906                         198          0          0 2011-10-14 13:46:02 NO              0
PART_20110907                         197          0          0 2011-10-14 13:46:02 NO              0
8 rows selected.
select sample_size,num_rows, last_analyzed
 from dba_tables
where owner='TESTUSER'
  and table_name='INCTAB';
SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
----------- ---------- -------------------
     16150      16150 2011-10-14 13:25:35
That was unexpected - no statistics have been gathered at exactly 10%. Perhaps > 10% rows must be changed. Let’s try one more row:
insert into TESTUSER.INCTAB
select * from (
select 'test9' TESTCOL1, 'METRIC1' TESTCOL2, TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + rownum/2 TESTCOL3, 0 TESTCOL4
 from dba_objects o1, dba_extents o2
where rownum < 2
);
commit;
exec dbms_stats.flush_database_monitoring_info
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
PARTITION_NAME                    INSERTS    UPDATES    DELETES TIMESTAMP           TRU DROP_SEGMENTS
------------------------------ ---------- ---------- ---------- ------------------- --- -------------
                                    1616          0          0 2011-10-14 13:49:36 NO              0
PART_20110901                         199          0          0 2011-10-14 13:49:36 NO              0
PART_20110902                         198          0          0 2011-10-14 13:46:02 NO              0
PART_20110903                         198          0          0 2011-10-14 13:46:02 NO              0
PART_20110904                         197          0          0 2011-10-14 13:46:02 NO              0
PART_20110905                         198          0          0 2011-10-14 13:46:02 NO              0
PART_20110906                         198          0          0 2011-10-14 13:46:02 NO              0
PART_20110907                         197          0          0 2011-10-14 13:46:02 NO              0
8 rows selected.
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO')
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
no rows selected
select sample_size,num_rows, last_analyzed
 from dba_tables
where owner='TESTUSER'
  and table_name='INCTAB';
SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
----------- ---------- -------------------
     17766      17766 2011-10-14 13:50:00
select partition_name,sample_size,num_rows, last_analyzed
 from dba_tab_partitions
where table_owner='TESTUSER'
  and table_name='INCTAB'
order by partition_name;
PARTITION_NAME                 SAMPLE_SIZE   NUM_ROWS LAST_ANALYZED
------------------------------ ----------- ---------- -------------------
PART_20110830                                      0 2011-10-14 13:25:36
PART_20110901                         2735       2735 2011-10-14 13:49:59
PART_20110902                         2505       2505 2011-10-14 13:50:00
PART_20110903                         2507       2507 2011-10-14 13:49:59
PART_20110904                         2504       2504 2011-10-14 13:49:59
PART_20110905                         2507       2507 2011-10-14 13:50:00
PART_20110906                         2505       2505 2011-10-14 13:50:00
PART_20110907                         2503       2503 2011-10-14 13:50:00
8 rows selected.
We can see that all partitions that have outstanding table modifications will have incremental statistics gathered - prior to global table statistics being computed.

Using Stale Percentage

With 11g it is possible to set the value for how many rows are be added/modified to a table or partition(s) before new statistics are gathered in accordance with DBA_TAB_MODIFICATIONS. This also works for tables with incremental statistics enabled:
exec dbms_stats.set_table_prefs('TESTUSER', 'INCTAB','STALE_PERCENT',20)
Running through the example above again, this time with < / = / > 20% of rows attains the same outcome.
Where no significant benefit is expected in terms of more optimal plans or cost calculation accuracy from the default 10% this number can be amended.

METHOD_OPT is Ignored

For those DBAs who know the data or for whom plan stability is of importance, it seems natural to consider altering histograms parameters. However, unfortunately METHOD_OPT is ignored.
The following demonstrates this:
exec dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'METHOD_OPT', 'FOR ALL COLUMNS SIZE 1');
exec dbms_stats.delete_table_stats(ownname=>'TESTUSER', tabname=>'INCTAB', cascade_columns=>true, cascade_indexes=>true)
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO');
select COLUMN_NAME, NUM_DISTINCT, DENSITY, NUM_NULLS, LAST_ANALYZED, SAMPLE_SIZE, AVG_COL_LEN, HISTOGRAM
 from dba_tab_columns
where owner='TESTUSER'
  and table_name='INCTAB'
order by column_id;
COLUMN_NAME                    NUM_DISTINCT    DENSITY  NUM_NULLS LAST_ANALYZED       SAMPLE_SIZE AVG_COL_LEN HISTOGRAM
------------------------------ ------------ ---------- ---------- ------------------- ----------- ----------- ---------------
TESTCOL1                                  5  .00003096          0 2011-10-14 10:32:09       16150           6 FREQUENCY
TESTCOL2                                  5  .00003096          0 2011-10-14 10:32:09       16150           8 FREQUENCY
TESTCOL3                              16048 .000062313          0 2011-10-14 10:32:09       16150           8 NONE
TESTCOL4                                  1          1          0 2011-10-14 10:32:09       16150           2 NONE
select column_name, count(*)
 from dba_tab_histograms
where owner='TESTUSER'
  and table_name = 'INCTAB'
group by column_name
order by column_name;
COLUMN_NAME                      COUNT(*)
------------------------------ ----------
TESTCOL1                                5
TESTCOL2                                5
TESTCOL3                                2
TESTCOL4                                2

Interesting Objects

The actual incremental statistics used for global statistics calculations on partitioned tables appear to be stored in the two tables SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$ and SYS.WRI$_OPTSTAT_SYNOPSIS$.
The default statistics gathering configuration that is picked up for segments that have not been overridden is stored in SYS.OPTSTAT_HIST_CONTROL$.
The statistics gathering preferences for individual tables are stored in SYS.DBA_TAB_STAT_PREFS.

Bugs & Concerns

New Oracle database features often appear with an entourage of bugs and concerns to be addressed or kept in mind. The incremental statistics gathering feature is not any different.
1)    The implementation in 11.1 is buggy. ORA-1422, as an example, springs to mind during GATHER_DATABASE_STATS; fixed in a subsequent patch [4].
2)    Locking statistics for some partitions can cause partitions to be re-gathered and the WRI$_OPTSTAT_SYNOPSIS$ table to continue to grow [3].
3)      Insufficient partitioning of the WRI$_OPTSTAT_SYNOPSIS$ table in 11.1 could cause deletes on the incremental statistics to take hours at end, which could cause the GATHER_DATABASE_JOB to not complete in time during maintenance windows.
4)      METHOD_OPT is ignored, even when explicitly set on a partitioned table with incremental statistics enabled using DBMS_STATS.SET_TABLE_PREFS; this means that histograms may be gathered depending on data skewing and COL_USAGE$ contents on predicates that filter on candidate columns on the table. This behaviour is also alluded to on Oracle Support [2]. What this means is that histograms could cause plan changes on a per-partition basis with incremental statistics enabled - whether good or bad. Whereas METHOD_OPT can be set for tables normally, e.g. for all columns size 1, this is not possible when enabling incremental statistics. However, SQL baselines could be used to control what plans are deemed “good” against partitioned tables with incremental statistics enabled.
5)      Larger SYSAUX in order to store synopsis data. For a 4TB database I noticed that 22GB was used for synopsis data relating to 6 partitioned tables (constituting 75+% of the data in the database) with incremental statistics enabled, equalling a 0.55% overhead. So make sure to size SYSAUX accordingly. It may be worth considering auto extensible SYSAUX data files, at least for the first few runs. I also suspect that space consumption may well increase slightly over the initial 31 day period (the historic statistics retention period) and then stabilise. The reason is that if partition statistics are re-gathered, previous statistics will be stored for historic purposes, which allows for rollback of statistics.
6)    Global index statistics for local partitioned indexes still appear to be maintained with a full index scan, i.e. not with local partition statistics. However, local index partition statistics are still gathered on a per-partition basis [1].

Final Thoughts

I really like the new incremental statistics gathering feature as it resolves statistics gathering issues and resource wastage with large partitioned tables. However, using it in 11.1 is rather buggy and prone to performance issues when it comes to the OPTSTAT tables. I therefore strongly recommend that upgrading to 11.2 be considered if table sizes reach the point where incremental statistics are considered.
Regarding the database I was working on, running GATHER_DATABASE_STATS with the GATHER AUTO option on-and-off at quiet periods of the day over a couple of days quickly managed to catch up on segments that had no or stale statistics; the job can be interrupted without causing loss to statistics already gathered on existing segments.

References

[1] Oracle Database Performance Tuning Guide - 11g Release 2 (11.2). Managing Optimizer Statistics. URL: http://download.oracle.com/docs/cd/E18283_01/server.112/e16638/stats.htm
[2] Oracle Support. DBMS_STATS 'gather auto' and 'gather stale' and column histograms [ID 1073120.1]. URL: https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=HOWTO&id=1073120.1
[3] Oracle Support. Bug 12369250 - sys.wri$_optstat_synopsis$ growth / all partitions are regathered stats in incremental mode when some partitions are locked [ID 12369250.8]. URL: https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=PATCH&id=12369250.8
[4] Oracle Support. Patch 8301385: DBMS_STATS ERRORS WITH ORA-01422: EXACT FETCH RETURNS MORE THAN REQUESTED NUMBER. URL: (article not available for ordinary support at 05-OCT-2011)
[5] Oracle Support. Collect statistics for a large partitioned table takes a lot when incremental is used [ID 1302628.1]. URL: https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=HOWTO&id=1073120.1
[6] Oracle Support. Relation between Table Monitoring and STATISTICS_LEVEL parameter in 10g [ID 252597.1]. URL: https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=BULLETIN&id=252597.1

1 comment:

Shrish Chaturvedi said...

Thank you for such an illustrated set of examples.
I have conducted almost a similar test case. Only difference is that i have a list partitioned table and i have created a local partitioned (prefix) on that. When i try to gather incremental stats on the table (after makking all the table preferences) it just does not work. However, if i do not have a local index on the tables, it works as expected and gathers stats only on the new / dml effected partitions. Is there a way fo achieving incremental stats gather on a table with local partitioned index?