A customer, who is on 19c with his Oracle databases asked me recently why he still has Height Balanced Histograms in his database? E.g.
SQL> select histogram, count(*) from dba_tab_columns where histogram <> 'NONE' group by histogram order by 2;
HISTOGRAM COUNT(*)
--------------- ----------
TOP-FREQUENCY 4
HEIGHT BALANCED 5
HYBRID 39
FREQUENCY 492
SQL>
In 12.1. Oracle introduced Top Frequency and Hybrid Histograms, which should replace Height Balanced histograms. There are 2 main reasons why Height Balanced histograms may still be there:
1. Top Frequency and/or Hybrid histograms are disabled
By setting the preference ENABLE_HYBRID_HISTOGRAMS and ENABLE_TOP_FREQ_HISTOGRAMS to 0 (globally or on table level) you can disable the new histogram types. The default value is 3 and enables the 2 new histogram types:
SQL> select dbms_stats.get_prefs('ENABLE_HYBRID_HISTOGRAMS') hist_enabled from dual;
HIST_ENABLED
------------
3
SQL> select dbms_stats.get_prefs('ENABLE_TOP_FREQ_HISTOGRAMS') hist_enabled from dual;
HIST_ENABLED
------------
3
SQL>
2. When gathering statistics and using a non-default value for ESTIMATE_PERCENT (default is DBMS_STATS.AUTO_SAMPLE_SIZE) then Height Balanced histograms will be used instead of the new histogram types.
The question is on how to find out what caused HEIGHT BALANCED histograms to be created?
First let’s check what table-columns have Height Balanced histograms and when they’ve been created:
SQL> select table_name, column_name, last_analyzed from dba_tab_columns where histogram='HEIGHT BALANCED';
TABLE_NAME COLUMN_NAME LAST_ANALYZED
-------------------------------- -------------------------------- -------------------
T1 TIMESTAMP 27.07.2022 13:11:31
T1 LAST_DDL_TIME 27.07.2022 13:11:31
T1 CREATED 27.07.2022 13:11:31
T1 OBJECT_ID 27.07.2022 13:11:31
T1 OBJECT_NAME 27.07.2022 13:11:31
SQL>
Ok, I do only have a test table T1 (copy of ALL_OBJECTS) where 5 columns have a HEIGHT BALANCED histogram. What caused them to be created?
Are the new histogram types disabled?
SQL> select dbms_stats.get_prefs('ENABLE_HYBRID_HISTOGRAMS') hist_enabled from dual;
HIST_ENABLED
------------
3
SQL> select dbms_stats.get_prefs('ENABLE_TOP_FREQ_HISTOGRAMS') hist_enabled from dual;
HIST_ENABLED
------------
3
SQL>
REMARK: Alternatively you may also run this query to get the global preferences:
SQL> select sname, nvl(to_char(sval1),spare4) value
2 from sys.optstat_hist_control$
3 where sname like '%HISTOGRAMS';
SNAME VALUE
------------------------------ --------------------------------
ENABLE_TOP_FREQ_HISTOGRAMS 3
ENABLE_HYBRID_HISTOGRAMS 3
SQL>
So globally the new histogram types are enabled and I could also check if something specific has been set on table-level:
SQL> select dbms_stats.get_prefs('ENABLE_HYBRID_HISTOGRAMS',ownname=>'CBLEILE',tabname=>'T1') hist_enabled from dual;
HIST_ENABLED
------------
3
SQL> select dbms_stats.get_prefs('ENABLE_TOP_FREQ_HISTOGRAMS',ownname=>'CBLEILE',tabname=>'T1') hist_enabled from dual;
HIST_ENABLED
------------
3
SQL>
I.e. the new histogram types are enabled. So that’s not the reason I still have height balanced histograms. To find out how statistics were gathered on table T1 the NOTES column on DBA_OPTSTAT_OPERATIONS is very useful. I split this into 2 SQL-statements to improve the readability:
SQL> select target, end_time, operation
2 from dba_optstat_operations
3 where end_time between to_date('27-JUL-2022 13:11:00','dd-mon-yyyy hh24:mi:ss')
4 and to_date('27-JUL-2022 13:12:00','dd-mon-yyyy hh24:mi:ss');
TARGET END_TIME OPERATION
--------------- ----------------------------------- ------------------------
"CBLEILE"."T1" 27-JUL-22 01.11.31.849682 PM +01:00 gather_table_stats
SQL> select notes
2 from dba_optstat_operations
3 where end_time between to_date('27-JUL-2022 13:11:00','dd-mon-yyyy hh24:mi:ss')
4 and to_date('27-JUL-2022 13:12:00','dd-mon-yyyy hh24:mi:ss');
NOTES
--------------------------------------------------
<params><param name="block_sample" val="FALSE"/><p
aram name="cascade" val="NULL"/><param name="concu
rrent" val="FALSE"/><param name="degree" val="NULL
"/><param name="estimate_percent" val="50"/><param
name="force" val="FALSE"/><param name="granularit
y" val="AUTO"/><param name="method_opt" val="FOR A
LL COLUMNS SIZE 254"/><param name="no_invalidate"
val="NULL"/><param name="ownname" val="CBLEILE"/><
param name="partname" val=""/><param name="reporti
ng_mode" val="FALSE"/><param name="statid" val=""/
><param name="statown" val=""/><param name="statta
b" val=""/><param name="stattype" val="DATA"/><par
am name="tabname" val="T1"/></params>
So statistics were gathered with dbms_stats.gather_table_stats and ESTIMATE_PERCENT => 50 was used (see the NOTES column). Hence HEIGHT BALANCED histograms were created.
To fix this there are 2 possibilities:
- Tell the developer or DBA who gathers statistics to use the default for ESTIMATE_PERCENT so that he can fix the code accordingly.
- Ignore non-default settings when gathering statistics. I.e. there is a preference PREFERENCE_OVERRIDES_PARAMETER which actually ignores parameters provided in dbms_stats and uses the preference on the table instead. E.g.
SQL> select dbms_stats.get_prefs('ESTIMATE_PERCENT','CBLEILE','T1') from dual;
DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT','CBLEILE','T1')
------------------------------------------------------------------------------
DBMS_STATS.AUTO_SAMPLE_SIZE
SQL> exec dbms_stats.set_table_prefs('CBLEILE','T1','PREFERENCE_OVERRIDES_PARAMETER','TRUE');
SQL> exec dbms_stats.gather_table_stats('CBLEILE','T1',ESTIMATE_PERCENT=>50,options=>'GATHER',method_opt=>'FOR ALL COLUMNS SIZE 254');
SQL> select column_name, histogram from user_tab_columns
2 where table_name='T1' and histogram <> 'NONE';
COLUMN_NAME HISTOGRAM
-------------------------------- ---------------
OBJECT_TYPE FREQUENCY
SQL>
Why do I only have a frequency based histogram and no other histograms anymore?
The reason is that the setting
method_opt=>’FOR ALL COLUMNS SIZE 254′
has also been overwritten by the preference on the table:
SQL> select dbms_stats.get_prefs('METHOD_OPT','CBLEILE','T1') t1_prefs from dual;
T1_PREFS
----------------------------
FOR ALL COLUMNS SIZE AUTO
I.e. to use ‚FOR ALL COLUMNS SIZE 254‘ I have to set the preference as well (here for test purposes):
SQL> exec dbms_stats.set_table_prefs('CBLEILE','T1','METHOD_OPT','FOR ALL COLUMNS SIZE 254');
SQL> exec dbms_stats.gather_table_stats('CBLEILE','T1',ESTIMATE_PERCENT=>50,options=>'GATHER',method_opt=>'FOR ALL COLUMNS SIZE 1');
REMARK: I used method_opt=>’FOR ALL COLUMNS SIZE 1′ on purpose which means „disable histograms“.
SQL> select column_name, histogram from user_tab_columns
2 where table_name='T1' and histogram <> 'NONE';
COLUMN_NAME HISTOGRAM
-------------------------------- ---------------
OWNER FREQUENCY
OBJECT_NAME HYBRID
SUBOBJECT_NAME HYBRID
OBJECT_ID HYBRID
DATA_OBJECT_ID HYBRID
OBJECT_TYPE FREQUENCY
CREATED HYBRID
LAST_DDL_TIME HYBRID
TIMESTAMP HYBRID
STATUS FREQUENCY
TEMPORARY FREQUENCY
GENERATED FREQUENCY
SECONDARY FREQUENCY
NAMESPACE FREQUENCY
SHARING FREQUENCY
EDITIONABLE FREQUENCY
ORACLE_MAINTAINED FREQUENCY
APPLICATION FREQUENCY
DEFAULT_COLLATION FREQUENCY
DUPLICATED FREQUENCY
SHARDED FREQUENCY
21 rows selected.
SQL>
I.e. with the preference PREFERENCE_OVERRIDES_PARAMETER = TRUE I can „force“ to use all preferences set on the table (or the global preferences if the table preferences have not been set). So I overwrote my manual settings ESTIMATE_PERCENT=>50 and method_opt=>’FOR ALL COLUMNS SIZE 1′ here.
REMARK: Check if statistics have been gathered, because options=>’GATHER‘ may also have been overwritten with the preference on the table.
Be careful with PREFERENCE_OVERRIDES_PARAMETER = TRUE because it may have unwanted side-effects (as seen above) or statistics gathering in the application may take longer than before.
Summary: If you still see HEIGHT BALANCED histograms in your database then you probably use ESTIMATE_PERCENT <> Default when gathering statistics. It’s recommended to fix that and gather statistics with ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE. There have been lots of improvements since 11g to speed up stats gathering (like e.g. the use of approx_for_count_distinct = TRUE). Hence settings like ESTIMATE_PERCENT => 1 to speed up stats gathering on huge tables are usually not necessary anymore today.