MySQL Blog Archive
For the latest blogs go toblogs.oracle.com/mysql
MySQL 8.0: Improvements to Information_schema
Posted on Sep 13, 2016 byGopal Shankar
Category:Monitoring
Tags:information_schema,mysql-8-0

Coinciding with the new native data dictionary in MySQL 8.0, we have made a number of useful enhancements to ourINFORMATION_SCHEMA subsystem design in MySQL 8.0. In this post I will first go through our legacy implementation as it has stood since MySQL 5.1, and then cover what’s changed.

Background

INFORMATION_SCHEMA was first introduced into MySQL 5.0, as a standards compliant way of retrieving meta data from a running MySQL server. When we look at the history ofINFORMATION_SCHEMA there have been a number of complaints about the performance of certain queries, particularly in the case that there are many database objects (schemas, tables etc).

In an effort to address these reported issues, since MySQL 5.1 we have made a number of performance optimizations to speed up the execution ofINFORMATION_SCHEMA queries. The optimizations aredescribed in the MySQL manual, and apply when the user provides an explicit schema name or table name in the query.

Alas, despite these improvementsINFORMATION_SCHEMA performance is still a major pain point for many of our users. The key reason behind these performance issues in the currentINFORMATION_SCHEMA implementation is thatINFORMATION_SCHEMA tables are implemented as temporary tables that are created on-the-fly during query execution. These temporary tables are populated via:

  1. Meta data from files, e.g. table definitions from .FRM files.
  2. Details from storage engines, e.g. dynamic table statistics.
  3. Data from global data structures in the MySQL server.

For a MySQL server having hundreds of database, each with hundreds of tables within them, theINFORMATION_SCHEMA query would end-up doing lot of I/O reading each individual FRM files from the file system. And it would also end-up using more CPU cycles in effort to open the table and prepare related in-memory data structures. It does attempt to use the MySQL server table cache (the system variable ‘table_definition_cache‘), however in large server instances it’s very rare to have a table cache that is large enough to accommodate all of these tables.

One can easily face the above mentioned performance issue if the optimization is not used by theINFORMATION_SCHEMA query. For example, let us consider the two queries below,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
mysql>EXPLAINSELECTTABLE_NAMEFROMINFORMATION_SCHEMA.TABLESWHERE
    ->TABLE_SCHEMA='test'ANDTABLE_NAME='t1'\G
    ***************************1.row ***************************
          id:1
      select_type:SIMPLE
        table:TABLES
      partitions:NULL
        type:ALL
    possible_keys:NULL
          key:TABLE_SCHEMA,TABLE_NAME
      key_len:NULL
          ref:NULL
        rows:NULL
    filtered:NULL
        Extra:Usingwhere;Skip_open_table;Scanned0databases
    1rowinset,1warning(0.00sec)
 
    mysql>EXPLAINSELECTTABLE_NAMEFROMINFORMATION_SCHEMA.TABLESWHERE
    ->TABLE_SCHEMAlike'test%'ANDTABLE_NAMElike't%'\G
    ***************************1.row ***************************
          id:1
      select_type:SIMPLE
        table:TABLES
      partitions:NULL
        type:ALL
    possible_keys:NULL
          key:NULL
      key_len:NULL
          ref:NULL
        rows:NULL
    filtered:NULL
        Extra:Usingwhere;Skip_open_table;Scannedalldatabases
    1rowinset,1warning(0.00sec)

As we can see from theEXPLAIN output, we see that the former query would use the values provided inWHERE clause for theTABLE_SCHEMA andTABLE_NAME field as a key to read the desiredFRM files from the file system. However, the latter query would end up reading all theFRM in the entire data directory, which is very costly and does not scale.

Changes in MySQL 8.0

One of the major changes in 8.0 is the introduction of a native data dictionary based on InnoDB.  This change has enabled us to get rid of file-based metadata store (FRM files) and also help MySQL to move towards supporting transactional DDL. For more details on introduction of data dictionary feature in 8.0 and its benefits, please look at Staale’spost here.

Now that the metadata of all database tables is stored in transactional data dictionary tables, it enables us to design anINFORMATION_SCHEMA table as a database VIEW over the data dictionary tables. This eliminates costs such as the creation of temporary tables for eachINFORMATION_SCHEMA query during execution on-the-fly, and also scanning file-system directories to find FRM files. It is also now possible to utilize the full power of the MySQL optimizer to prepare better query execution plans using indexes on data dictionary tables.

The following diagram explains the difference in design in MySQL 5.7 and 8.0.

overview_of_isIf we consider the above example underBackground, we see that the optimizer plans to use indexes on data dictionary tables, in both the cases.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
mysql>EXPLAINSELECTTABLE_NAMEFROMINFORMATION_SCHEMA.TABLESWHERETABLE_SCHEMA='test'ANDTABLE_NAME='t1';
  +--+-----------+-----++------+------------------+----------++----------------------+----+--------+----------------------------------+
  |id|select_type|table||type  |possible_keys    |key      ||ref                  |rows|filtered|Extra                            |
  +--+-----------+-----++------+------------------+----------++----------------------+----+--------+----------------------------------+
  |1|SIMPLE    |cat  ||index|PRIMARY          |name      ||NULL                  |  1|  100.00|Usingindex                      |
  |1|SIMPLE    |sch  ||eq_ref|PRIMARY,catalog_id|catalog_id||mysql.cat.id,const    |  1|  100.00|Usingindex                      |
  |1|SIMPLE    |tbl  ||eq_ref|schema_id        |schema_id||mysql.sch.id,const    |  1|  10.00|Usingindexcondition;Usingwhere|
  |1|SIMPLE    |col  ||eq_ref|PRIMARY          |PRIMARY  ||mysql.tbl.collation_id|  1|  100.00|Usingindex                      |
  +--+-----------+-----++------+------------------+----------++----------------------+----+--------+----------------------------------+
 
  mysql>EXPLAINSELECTTABLE_NAMEFROMINFORMATION_SCHEMA.TABLESWHERETABLE_SCHEMAlike'test%'ANDTABLE_NAMElike't%';
  +--+-----------+-----++------+------------------+----------++-----------------------+----+--------+---------------------------------+
  |id|select_type|table||type  |possible_keys    |key      ||ref                  |rows|filtered|Extra                            |
  +--+-----------+-----++------+------------------+----------++-----------------------+----+--------+---------------------------------+
  |1|SIMPLE    |cat  ||index|PRIMARY          |name      ||NULL                  |  1|  100.00|Usingindex                      |
  |1|SIMPLE    |sch  ||ref  |PRIMARY,catalog_id|catalog_id||mysql.cat.id          |  6|  16.67|Usingwhere;Usingindex        |
  |1|SIMPLE    |tbl  ||ref  |schema_id        |schema_id||mysql.sch.id          |  26|    1.11|Usingindexcondition;Usingwhere|
  |1|SIMPLE    |col  ||eq_ref|PRIMARY          |PRIMARY  ||mysql.tbl.collation_id|  1|  100.00|Usingindex                      |
  +--+-----------+-----++------+------------------+----------++-----------------------+----+--------+---------------------------------+

When we look at performance gain with this newINFORMATION_SCHEMA design in 8.0, we see that it is much more efficient than MySQL 5.7. As an example,this query is now ~100 times faster (with 100 databases with 50 tables each). A separate blog will describe more about performance of INFORMATION_SCHEMA in 8.0.

1
2
3
SELECTTABLE_SCHEMA,TABLE_NAME,TABLE_TYPE,ENGINE,ROW_FORMAT
  FROMinformation_schema.tables
  WHERETABLE_SCHEMALIKE'db%';

Sources of Metadata

Not all theINFORMATION_SCHEMA tables are implemented as a VIEW over the data dictionary tables in 8.0. Currently we have the followingINFORMATION_SCHEMA tables designed as views:

  • SCHEMATA
  • TABLES
  • COLUMNS
  • VIEWS
  • CHARACTER_SETS
  • COLLATIONS
  • COLLATION_CHARACTER_SET_APPLICABILITY
  • STATISTICS
  • KEY_COLUMN_USAGE
  • TABLE_CONSTRAINTS

Upcoming MySQL 8.0 versions aims to provide even the following INFORMATION_SCHEMA tables as views:

  • EVENTS
  • TRIGGERS
  • ROUTINES
  • REFERENTIAL_CONSTRAINTS

To describe theINFORMATION_SCHEMA queries which are not directly implemented as VIEWs over data dictionary tables, let me first describe that there are two types of meta data which are presented inINFORMATION_SCHEMA tables:

  1. Static table metadata. For example:TABLE_SCHEMA,TABLE_NAME,TABLE_TYPE,ENGINE. These statistics will be read directly from the data dictionary.
  2. Dynamic table metadata. For example:AUTO_INCREMENT,AVG_ROW_LENGTH,DATA_FREE. Dynamic metadata frequently changes (for example: the auto_increment value will advance after each insert).In many cases the dynamic metadata will also incur some cost to accurately calculate on demand, and accuracy may not be beneficial for the typical query. Consider the case of theDATA_FREE statistic which shows the number of free bytes in a table – a cached value is usually sufficient.

In MySQL 8.0 (Prior to 8.0.3 RC1), the dynamic table metadata willdefault to being cached. This is configurable via the settinginformation_schema_stats (defaultcached), and can be changed toinformation_schema_stats=latest in order toalways retrieve the dynamic information directly from the storage engine (at the cost of slightly higher query execution).

As an alternative, the user can also executeANALYZE TABLE on the table, to update the cached dynamic statistics.

In MySQL 8.0.3 RC1 onwards we have replaced the optioninformation_schema_statswith a new optioninformation_schema_stats_expiry .Please seeFurther improvements on INFORMATION_SCHEMA in MySQL 8.0.3 for details.

Conclusion

TheINFORMATION_SCHEMA design in 8.0 is a step forward enabling:

    • Simple and maintainable implementation.
    • Us to get rid of numerousINFORMATION_SCHEMA legacy bugs.
    • Proper use of the MySQL optimizer forINFORMATION_SCHEMA queries.
    • INFORMATION_SCHEMA queries to execute ~100 times faster, compared to 5.7, when retrieving static table metadata, as show in query Q1.

There is more to discuss aboutINFORMATION_SCHEMA in 8.0. The new implementation comes with a few changes in behavior when compared to the oldINFORMATION_SCHEMA implementation. Please check the MySQL manual for more details about it.

Thanks for using MySQL!