Oracle
has been working on improving statistics gathering in the database from
release to release. With 11g my favorite statistics gathering features
are: setting database/schema/table preferences - and incremental global
statistics gathering on partitioned tables.
I
recently came across a comparatively large database and noticed that
nearly 2000 segments had no statistics on them. Furthermore the largest
(partitioned) table in the database did not have global stats gathered
for more than 6 years - and many of its partitions had no statistics at
all. This did cause problems with sub-optimal plans being used by some
queries.
Further
analysis revealed that a 4 hour window was allowed for maintenance,
including database statistics gathering - during work days only. Weekend
windows were disabled. The job would run for exactly 4 hours every day.
It quickly became clear that the job never completed and was terminated
at the end of the maintenance window; persistently leaving many
segments without any statistics behind.
The
default action when incremental statistics are not enabled for
partitioned tables is to maintain global table statistics using a full
table scan. Obviously, for multi-terabyte tables this comes at a
significant I/O cost and therefore takes a lot of time. The largest
table happened to be in line for (global) statistics gathering and a
full table scan across all partitions never returned during each 4 hour
maintenance window.
Oracle
suggests that the key difference between the internal
GATHER_DATABASE_STATS_JOB_PROC procedure in DBMS_STATS, which runs
during the maintenance window, and GATHER_DATABASE_STATS with option
“GATHER AUTO” is that of the former prioritising objects that need
statistics gathered. Although I didn’t investigate this to any great
extent it probably helps explain the persistent order for gathering
statistics causing statistics not to be gathered for some segments that
were prioritised lower. On the other hand the latter option appears to
run through segments eligible for statistics gathering in alphabetic
order.
When to Use Incremental Statistics Gathering?
Many
use cases of partitioned tables involve creating new partitions on a
regular basis and filling those partitions with data. As the new
partitions and data are added, statistics should be gathered on this
data.
A
typical scenario for incremental statistics is therefore instances
where a new daily, weekly or monthly partition is added and data for the
period is loaded. I.e. not all partitions are continuously loaded with
substantial amounts of data.
The
picture is less clear cut when it comes to back-filling data in older
partitions. If all partitions in a table are “touched” by DML on a
regular basis, then the partitions with significant changes, i.e. 10+%
for INSERTs+UPDATEs+DELETEs compared to NUM_ROWS per default [6] will be
analyzed again. In this case the only advantage of this approach is
that global statistics on the table are gathered by calculating
statistics for all partitions - as opposed to a full tablescan, which in
itself could be a significant saver, preventing ORA-1555’s after hours
of runs and eliminating statistics gathering job runs persistently never
completing within the maintenance window.
How to Enable Incremental Statistics Gathering?
This
is quite simple. Oracle has clearly stipulated the conditions required
for Incremental Statistics Gathering [1]. The table preferences for
partitioned tables incremental statistics gathering must be set as
follows:
begin
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'ESTIMATE_PERCENT', DBMS_STATS.AUTO_SAMPLE_SIZE);
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'INCREMENTAL', 'TRUE');
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'PUBLISH', 'TRUE');
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'GRANULARITY', 'AUTO');
end;
/
To optionally remove any previous statistics first, run:
exec
dbms_stats.delete_table_stats(ownname=>'TESTUSER',
tabname=>'INCTAB', cascade_columns=>true,
cascade_indexes=>true)
Running the following will immediately get started with incremental statistics gathering on all tables where it has been set-up:
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO')
The Test Case
The
following DDL and DML sets up the test case, which was run on an
11.2.0.2.3 database on Linux. It first gathers statistics on all
database objects eligible for statistics gathering. It then sets up
tablespace, user and partitioned table, populates the table with test
data as well as does some test queries to generate COL_USAGE$ statistics
for histograms (used later). Finally, incremental statistics are
enabled and database statistics are gathered again in order to generate
statistics on the new table.
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO')
create
tablespace testdata datafile '/u01/oradata/orcl1/testdata_01.dbf' size
1023m extent management local autoallocate segment space management
auto;
create user testuser identified by testuser
default tablespace testdata
quota unlimited on testdata;
grant create session, create table, create view, create procedure, create sequence to testuser;
DROP TABLE TESTUSER.INCTAB;
CREATE TABLE TESTUSER.INCTAB
( TESTCOL1 VARCHAR2(120) NOT NULL ENABLE,
TESTCOL2 VARCHAR2(30) NOT NULL ENABLE,
TESTCOL3 DATE NOT NULL ENABLE,
TESTCOL4 NUMBER(38,5),
CONSTRAINT PK_INCTAB PRIMARY KEY (TESTCOL1, TESTCOL2, TESTCOL3)
)
TABLESPACE TESTDATA
PARTITION BY RANGE (TESTCOL3)
(PARTITION PART_20110830 VALUES LESS THAN (TO_DATE('2011-09-01', 'YYYY-MM-DD'))
SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110901 VALUES LESS THAN (TO_DATE('2011-09-02', 'YYYY-MM-DD'))
SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110902 VALUES LESS THAN (TO_DATE('2011-09-03', 'YYYY-MM-DD'))
SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110903 VALUES LESS THAN (TO_DATE('2011-09-04', 'YYYY-MM-DD'))
SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110904 VALUES LESS THAN (TO_DATE('2011-09-05', 'YYYY-MM-DD'))
SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110905 VALUES LESS THAN (TO_DATE('2011-09-06', 'YYYY-MM-DD'))
SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110906 VALUES LESS THAN (TO_DATE('2011-09-07', 'YYYY-MM-DD'))
SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA ,
PARTITION PART_20110907 VALUES LESS THAN (TO_DATE('2011-09-08', 'YYYY-MM-DD'))
SEGMENT CREATION IMMEDIATE TABLESPACE TESTDATA
);
truncate table TESTUSER.INCTAB;
insert into TESTUSER.INCTAB
select * from (
select
'test1' TESTCOL1, 'METRIC1' TESTCOL2, TO_DATE('2011-09-01 00:00:00',
'YYYY-MM-DD HH24:MI:SS') + 7*rownum/10001 TESTCOL3, 0 TESTCOL4
from dba_objects o1, dba_extents o2
where rownum < 10001
union all
select 'test2', 'METRIC2', TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + 7*rownum/5001, 0
from dba_objects o1, dba_extents o2
where rownum < 5001
union all
select 'test3', 'METRIC3', TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + 7*rownum/1001, 0
from dba_objects o1, dba_extents o2
where rownum < 1001
union all
select 'test4', 'METRIC4', TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + 7*rownum/101, 0
from dba_objects o1, dba_extents o2
where rownum < 101
union all
select 'test5', 'METRIC5', TO_DATE('2011-09-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + 7*rownum/51, 0
from dba_objects o1, dba_extents o2
where rownum < 51
);
commit;
exec
dbms_stats.delete_table_stats(ownname=>'TESTUSER',
tabname=>'INCTAB', cascade_columns=>true,
cascade_indexes=>true)
-- Prime some equality predicate operations to prime col_usage$ for histograms
select count(*) from TESTUSER.INCTAB where TESTCOL1='test2';
select count(*) from TESTUSER.INCTAB where TESTCOL1 between 'test2' and 'test5';
select count(*) from TESTUSER.INCTAB where TESTCOL1 like 'test2%';
select count(*) from TESTUSER.INCTAB where TESTCOL2='METRIC3';
select count(*) from TESTUSER.INCTAB where TESTCOL2 between 'METRIC3' and 'METRIC5';
select count(*) from TESTUSER.INCTAB where TESTCOL2 like 'METRIC%';
exec dbms_stats.flush_database_monitoring_info
set long 8192
SELECT dbms_stats.report_col_usage('TESTUSER', 'INCTAB') from dual;
###############################################################################
COLUMN USAGE REPORT FOR TESTUSER.INCTAB
.......................................
1. TESTCOL1 : EQ RANGE LIKE
2. TESTCOL2 : EQ RANGE LIKE
###############################################################################
begin
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'ESTIMATE_PERCENT', DBMS_STATS.AUTO_SAMPLE_SIZE);
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'INCREMENTAL', 'TRUE');
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'PUBLISH', 'TRUE');
dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'GRANULARITY', 'AUTO');
end;
/
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO');
select partition_name,sample_size,num_rows, last_analyzed
from dba_tab_partitions
where table_owner='TESTUSER'
and table_name='INCTAB'
order by partition_name;
PARTITION_NAME SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
------------------------------ ----------- ---------- -------------------
PART_20110830 0 2011-10-14 13:25:36
PART_20110901 2305 2305 2011-10-14 13:25:36
PART_20110902 2307 2307 2011-10-14 13:25:36
PART_20110903 2309 2309 2011-10-14 13:25:35
PART_20110904 2307 2307 2011-10-14 13:25:35
PART_20110905 2309 2309 2011-10-14 13:25:35
PART_20110906 2307 2307 2011-10-14 13:25:35
PART_20110907 2306 2306 2011-10-14 13:25:35
8 rows selected.
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
no rows selected
select sample_size,num_rows, last_analyzed
from dba_tables
where owner='TESTUSER'
and table_name='INCTAB';
SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
----------- ---------- -------------------
16150 16150 2011-10-14 13:25:35
select COLUMN_NAME, NUM_DISTINCT, DENSITY, NUM_NULLS, LAST_ANALYZED, SAMPLE_SIZE, AVG_COL_LEN, HISTOGRAM
from dba_tab_columns
where owner='TESTUSER'
and table_name='INCTAB'
order by column_id;
COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS LAST_ANALYZED SAMPLE_SIZE AVG_COL_LEN HISTOGRAM
------------------------------ ------------ ---------- ---------- ------------------- ----------- ----------- ---------------
TESTCOL1 5 .00003096 0 2011-10-14 13:25:35 16150 6 FREQUENCY
TESTCOL2 5 .00003096 0 2011-10-14 13:25:35 16150 8 FREQUENCY
TESTCOL3 16048 .000062313 0 2011-10-14 13:25:35 16150 8 NONE
TESTCOL4 1 1 0 2011-10-14 13:25:35 16150 2 NONE
select column_name, count(*)
from dba_tab_histograms
where owner='TESTUSER'
and table_name = 'INCTAB'
group by column_name
order by column_name;
COLUMN_NAME COUNT(*)
------------------------------ ----------
TESTCOL1 5
TESTCOL2 5
TESTCOL3 2
TESTCOL4 2
Observations
Statistics Gathering Thresholds
This
section shows that there is an adherence to the 10% threshold on both
partitions and global table statistics as documented by Oracle [6]. It
shows this only for INSERTs in order to keep the example simple and
concise.
One
interesting observation is that statistics are gathered on individual
partitions for which 10+% rows (compared to NUM_ROWS) are changed.
However, this does not lead to an update in global table statistics.
Global table statistics are not computed until at least 10+% of rows for
all partitions in the table have changed.
Another
interesting observation is that when global table statistics are
gathered, all partitions for which outstanding table modifications have
been recorded will have statistics gathered individually as well -
before global table statistics are computed from the incremental
partition statistics.
Refer to the current statistics that exist on the table in the previous section.
Then proceed with adding 10% minus one row to one partition (230 / 2305 existing rows):
insert into TESTUSER.INCTAB
select * from (
select
'test6' TESTCOL1, 'METRIC1' TESTCOL2, TO_DATE('2011-09-01 00:00:00',
'YYYY-MM-DD HH24:MI:SS') + rownum/231 TESTCOL3, 0 TESTCOL4
from dba_objects o1, dba_extents o2
where rownum < 231
);
230 rows created.
commit;
exec dbms_stats.flush_database_monitoring_info
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
PARTITION_NAME INSERTS UPDATES DELETES TIMESTAMP TRU DROP_SEGMENTS
------------------------------ ---------- ---------- ---------- ------------------- --- -------------
230 0 0 2011-10-14 13:31:48 NO 0
PART_20110901 230 0 0 2011-10-14 13:31:48 NO 0
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO')
select partition_name,sample_size,num_rows, last_analyzed
from dba_tab_partitions
where table_owner='TESTUSER'
and table_name='INCTAB'
order by partition_name;
PARTITION_NAME SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
------------------------------ ----------- ---------- -------------------
PART_20110830 0 2011-10-14 13:25:36
PART_20110901 2305 2305 2011-10-14 13:25:36
PART_20110902 2307 2307 2011-10-14 13:25:36
PART_20110903 2309 2309 2011-10-14 13:25:35
PART_20110904 2307 2307 2011-10-14 13:25:35
PART_20110905 2309 2309 2011-10-14 13:25:35
PART_20110906 2307 2307 2011-10-14 13:25:35
PART_20110907 2306 2306 2011-10-14 13:25:35
8 rows selected.
As
can be seen, no new statistics have been gathered at this point. Let’s
add one more row (231 / 2305) and see what happens next:
insert into TESTUSER.INCTAB
select * from (
select
'test7' TESTCOL1, 'METRIC1' TESTCOL2, TO_DATE('2011-09-01 00:00:00',
'YYYY-MM-DD HH24:MI:SS') + rownum/2 TESTCOL3, 0 TESTCOL4
from dba_objects o1, dba_extents o2
where rownum < 2
);
1 row created.
commit;
exec dbms_stats.flush_database_monitoring_info
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
PARTITION_NAME INSERTS UPDATES DELETES TIMESTAMP TRU DROP_SEGMENTS
------------------------------ ---------- ---------- ---------- ------------------- --- -------------
231 0 0 2011-10-14 13:36:27 NO 0
PART_20110901 231 0 0 2011-10-14 13:36:27 NO 0
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO')
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
PARTITION_NAME INSERTS UPDATES DELETES TIMESTAMP TRU DROP_SEGMENTS
------------------------------ ---------- ---------- ---------- ------------------- --- -------------
231 0 0 2011-10-14 13:36:27 NO 0
select partition_name,sample_size,num_rows, last_analyzed
from dba_tab_partitions
where table_owner='TESTUSER'
and table_name='INCTAB'
order by partition_name;
PARTITION_NAME SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
------------------------------ ----------- ---------- -------------------
PART_20110830 0 2011-10-14 13:25:36
PART_20110901 2536 2536 2011-10-14 13:36:47
PART_20110902 2307 2307 2011-10-14 13:25:36
PART_20110903 2309 2309 2011-10-14 13:25:35
PART_20110904 2307 2307 2011-10-14 13:25:35
PART_20110905 2309 2309 2011-10-14 13:25:35
PART_20110906 2307 2307 2011-10-14 13:25:35
PART_20110907 2306 2306 2011-10-14 13:25:35
8 rows selected.
select sample_size,num_rows, last_analyzed
from dba_tables
where owner='TESTUSER'
and table_name='INCTAB';
SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
----------- ---------- -------------------
16150 16150 2011-10-14 13:25:35
We can see that statistics for the partition have been gathered. However, global statistics remain unaffected.
Let’s
add enough rows to make up exactly 10% rows to the table over all, i.e.
1615 - we have already inserted 231 - so lets insert 1384 more spread
across the partitions:
insert into TESTUSER.INCTAB
select * from (
select
'test8' TESTCOL1, 'METRIC1' TESTCOL2, TO_DATE('2011-09-01 00:00:00',
'YYYY-MM-DD HH24:MI:SS') + 7*rownum/1385 TESTCOL3, 0 TESTCOL4
from dba_objects o1, dba_extents o2
where rownum < 1385
);
commit;
exec dbms_stats.flush_database_monitoring_info
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
PARTITION_NAME INSERTS UPDATES DELETES TIMESTAMP TRU DROP_SEGMENTS
------------------------------ ---------- ---------- ---------- ------------------- --- -------------
1615 0 0 2011-10-14 13:47:37 NO 0
PART_20110901 198 0 0 2011-10-14 13:47:37 NO 0
PART_20110902 198 0 0 2011-10-14 13:46:02 NO 0
PART_20110903 198 0 0 2011-10-14 13:46:02 NO 0
PART_20110904 197 0 0 2011-10-14 13:46:02 NO 0
PART_20110905 198 0 0 2011-10-14 13:46:02 NO 0
PART_20110906 198 0 0 2011-10-14 13:46:02 NO 0
PART_20110907 197 0 0 2011-10-14 13:46:02 NO 0
8 rows selected.
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO')
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
PARTITION_NAME INSERTS UPDATES DELETES TIMESTAMP TRU DROP_SEGMENTS
------------------------------ ---------- ---------- ---------- ------------------- --- -------------
1615 0 0 2011-10-14 13:47:37 NO 0
PART_20110901 198 0 0 2011-10-14 13:47:37 NO 0
PART_20110902 198 0 0 2011-10-14 13:46:02 NO 0
PART_20110903 198 0 0 2011-10-14 13:46:02 NO 0
PART_20110904 197 0 0 2011-10-14 13:46:02 NO 0
PART_20110905 198 0 0 2011-10-14 13:46:02 NO 0
PART_20110906 198 0 0 2011-10-14 13:46:02 NO 0
PART_20110907 197 0 0 2011-10-14 13:46:02 NO 0
8 rows selected.
select sample_size,num_rows, last_analyzed
from dba_tables
where owner='TESTUSER'
and table_name='INCTAB';
SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
----------- ---------- -------------------
16150 16150 2011-10-14 13:25:35
That
was unexpected - no statistics have been gathered at exactly 10%.
Perhaps > 10% rows must be changed. Let’s try one more row:
insert into TESTUSER.INCTAB
select * from (
select
'test9' TESTCOL1, 'METRIC1' TESTCOL2, TO_DATE('2011-09-01 00:00:00',
'YYYY-MM-DD HH24:MI:SS') + rownum/2 TESTCOL3, 0 TESTCOL4
from dba_objects o1, dba_extents o2
where rownum < 2
);
commit;
exec dbms_stats.flush_database_monitoring_info
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
PARTITION_NAME INSERTS UPDATES DELETES TIMESTAMP TRU DROP_SEGMENTS
------------------------------ ---------- ---------- ---------- ------------------- --- -------------
1616 0 0 2011-10-14 13:49:36 NO 0
PART_20110901 199 0 0 2011-10-14 13:49:36 NO 0
PART_20110902 198 0 0 2011-10-14 13:46:02 NO 0
PART_20110903 198 0 0 2011-10-14 13:46:02 NO 0
PART_20110904 197 0 0 2011-10-14 13:46:02 NO 0
PART_20110905 198 0 0 2011-10-14 13:46:02 NO 0
PART_20110906 198 0 0 2011-10-14 13:46:02 NO 0
PART_20110907 197 0 0 2011-10-14 13:46:02 NO 0
8 rows selected.
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO')
select partition_name, inserts, updates, deletes, timestamp, truncated, drop_segments
from dba_tab_modifications
where table_owner='TESTUSER'
and table_name='INCTAB';
no rows selected
select sample_size,num_rows, last_analyzed
from dba_tables
where owner='TESTUSER'
and table_name='INCTAB';
SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
----------- ---------- -------------------
17766 17766 2011-10-14 13:50:00
select partition_name,sample_size,num_rows, last_analyzed
from dba_tab_partitions
where table_owner='TESTUSER'
and table_name='INCTAB'
order by partition_name;
PARTITION_NAME SAMPLE_SIZE NUM_ROWS LAST_ANALYZED
------------------------------ ----------- ---------- -------------------
PART_20110830 0 2011-10-14 13:25:36
PART_20110901 2735 2735 2011-10-14 13:49:59
PART_20110902 2505 2505 2011-10-14 13:50:00
PART_20110903 2507 2507 2011-10-14 13:49:59
PART_20110904 2504 2504 2011-10-14 13:49:59
PART_20110905 2507 2507 2011-10-14 13:50:00
PART_20110906 2505 2505 2011-10-14 13:50:00
PART_20110907 2503 2503 2011-10-14 13:50:00
8 rows selected.
We
can see that all partitions that have outstanding table modifications
will have incremental statistics gathered - prior to global table
statistics being computed.
Using Stale Percentage
With
11g it is possible to set the value for how many rows are be
added/modified to a table or partition(s) before new statistics are
gathered in accordance with DBA_TAB_MODIFICATIONS. This also works for
tables with incremental statistics enabled:
exec dbms_stats.set_table_prefs('TESTUSER', 'INCTAB','STALE_PERCENT',20)
Running through the example above again, this time with < / = / > 20% of rows attains the same outcome.
Where
no significant benefit is expected in terms of more optimal plans or
cost calculation accuracy from the default 10% this number can be
amended.
METHOD_OPT is Ignored
For
those DBAs who know the data or for whom plan stability is of
importance, it seems natural to consider altering histograms parameters.
However, unfortunately METHOD_OPT is ignored.
The following demonstrates this:
exec dbms_stats.set_table_prefs('TESTUSER', 'INCTAB', 'METHOD_OPT', 'FOR ALL COLUMNS SIZE 1');
exec
dbms_stats.delete_table_stats(ownname=>'TESTUSER',
tabname=>'INCTAB', cascade_columns=>true,
cascade_indexes=>true)
exec DBMS_STATS.GATHER_DATABASE_STATS(OPTIONS=>'GATHER AUTO');
select COLUMN_NAME, NUM_DISTINCT, DENSITY, NUM_NULLS, LAST_ANALYZED, SAMPLE_SIZE, AVG_COL_LEN, HISTOGRAM
from dba_tab_columns
where owner='TESTUSER'
and table_name='INCTAB'
order by column_id;
COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS LAST_ANALYZED SAMPLE_SIZE AVG_COL_LEN HISTOGRAM
------------------------------ ------------ ---------- ---------- ------------------- ----------- ----------- ---------------
TESTCOL1 5 .00003096 0 2011-10-14 10:32:09 16150 6 FREQUENCY
TESTCOL2 5 .00003096 0 2011-10-14 10:32:09 16150 8 FREQUENCY
TESTCOL3 16048 .000062313 0 2011-10-14 10:32:09 16150 8 NONE
TESTCOL4 1 1 0 2011-10-14 10:32:09 16150 2 NONE
select column_name, count(*)
from dba_tab_histograms
where owner='TESTUSER'
and table_name = 'INCTAB'
group by column_name
order by column_name;
COLUMN_NAME COUNT(*)
------------------------------ ----------
TESTCOL1 5
TESTCOL2 5
TESTCOL3 2
TESTCOL4 2
Interesting Objects
The
actual incremental statistics used for global statistics calculations
on partitioned tables appear to be stored in the two tables
SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$ and SYS.WRI$_OPTSTAT_SYNOPSIS$.
The
default statistics gathering configuration that is picked up for
segments that have not been overridden is stored in
SYS.OPTSTAT_HIST_CONTROL$.
The statistics gathering preferences for individual tables are stored in SYS.DBA_TAB_STAT_PREFS.
Bugs & Concerns
New
Oracle database features often appear with an entourage of bugs and
concerns to be addressed or kept in mind. The incremental statistics
gathering feature is not any different.
1)
The implementation in 11.1 is buggy. ORA-1422, as an example, springs
to mind during GATHER_DATABASE_STATS; fixed in a subsequent patch [4].
2)
Locking statistics for some partitions can cause partitions to be
re-gathered and the WRI$_OPTSTAT_SYNOPSIS$ table to continue to grow
[3].
3)
Insufficient partitioning of the WRI$_OPTSTAT_SYNOPSIS$ table in
11.1 could cause deletes on the incremental statistics to take hours at
end, which could cause the GATHER_DATABASE_JOB to not complete in time
during maintenance windows.
4)
METHOD_OPT is ignored, even when explicitly set on a partitioned
table with incremental statistics enabled using
DBMS_STATS.SET_TABLE_PREFS; this means that histograms may be gathered
depending on data skewing and COL_USAGE$ contents on predicates that
filter on candidate columns on the table. This behaviour is also alluded
to on Oracle Support [2]. What this means is that histograms could
cause plan changes on a per-partition basis with incremental statistics
enabled - whether good or bad. Whereas METHOD_OPT can be set for tables
normally, e.g. for all columns size 1, this is not possible when
enabling incremental statistics. However, SQL baselines could be used to
control what plans are deemed “good” against partitioned tables with
incremental statistics enabled.
5)
Larger SYSAUX in order to store synopsis data. For a 4TB database I
noticed that 22GB was used for synopsis data relating to 6 partitioned
tables (constituting 75+% of the data in the database) with incremental
statistics enabled, equalling a 0.55% overhead. So make sure to size
SYSAUX accordingly. It may be worth considering auto extensible SYSAUX
data files, at least for the first few runs. I also suspect that space
consumption may well increase slightly over the initial 31 day period
(the historic statistics retention period) and then stabilise. The
reason is that if partition statistics are re-gathered, previous
statistics will be stored for historic purposes, which allows for
rollback of statistics.
6)
Global index statistics for local partitioned indexes still appear to
be maintained with a full index scan, i.e. not with local partition
statistics. However, local index partition statistics are still gathered
on a per-partition basis [1].
Final Thoughts
I
really like the new incremental statistics gathering feature as it
resolves statistics gathering issues and resource wastage with large
partitioned tables. However, using it in 11.1 is rather buggy and prone
to performance issues when it comes to the OPTSTAT tables. I therefore
strongly recommend that upgrading to 11.2 be considered if table sizes
reach the point where incremental statistics are considered.
Regarding
the database I was working on, running GATHER_DATABASE_STATS with the
GATHER AUTO option on-and-off at quiet periods of the day over a couple
of days quickly managed to catch up on segments that had no or stale
statistics; the job can be interrupted without causing loss to
statistics already gathered on existing segments.
References
[1] Oracle Database Performance Tuning Guide - 11g Release 2 (11.2). Managing Optimizer Statistics. URL: http://download.oracle.com/docs/cd/E18283_01/server.112/e16638/stats.htm
[2] Oracle Support. DBMS_STATS 'gather auto' and 'gather stale' and column histograms [ID 1073120.1]. URL: https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=HOWTO&id=1073120.1
[3]
Oracle Support. Bug 12369250 - sys.wri$_optstat_synopsis$ growth / all
partitions are regathered stats in incremental mode when some partitions
are locked [ID 12369250.8]. URL: https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=PATCH&id=12369250.8
[4]
Oracle Support. Patch 8301385: DBMS_STATS ERRORS WITH ORA-01422: EXACT
FETCH RETURNS MORE THAN REQUESTED NUMBER. URL: (article not available
for ordinary support at 05-OCT-2011)
[5] Oracle Support. Collect statistics for a large partitioned table takes a lot when incremental is used [ID 1302628.1]. URL: https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=HOWTO&id=1073120.1
[6] Oracle Support. Relation between Table Monitoring and STATISTICS_LEVEL parameter in 10g [ID 252597.1]. URL: https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=BULLETIN&id=252597.1
1 comment:
Thank you for such an illustrated set of examples.
I have conducted almost a similar test case. Only difference is that i have a list partitioned table and i have created a local partitioned (prefix) on that. When i try to gather incremental stats on the table (after makking all the table preferences) it just does not work. However, if i do not have a local index on the tables, it works as expected and gathers stats only on the new / dml effected partitions. Is there a way fo achieving incremental stats gather on a table with local partitioned index?
Post a Comment