Detecting and Removing Unused Indexes in MySQL

Preface: The following post is a backup from a post first published on the Moviepilot Techblog, which is going to be replaced by the Moviepilot Labs Blog. The content is a bit outdated, as the way to go today is using MariaDB instead of OurDelta. The very content about the UserStats plugin and using it for detecting and removing unused indexes is still valid, though – and a nice way of getting rid of performance killers…

MySQL performance depends on a balanced usage of MySQL indexes. While it is easy to add an index and identify queries not using indexes via EXPLAIN during development or slow.log it is a lot harder to get rid of unused indexes. Finding and removing them might be crucial for your performance as indexes can create a remarkable cpu cycle and i/o overhead during updates to tables (INSERT/UPDATE/DELETE).

The default MySQL community edition server from mysql.com or your Linux/BSD distribution (which you shouldn’t use for a lot of reasons anyway) is not yet helpfull in this regard. There are however inofficial patches for advanced statistics that provide the details needed for optimizing your list of indexes. The easiest way to get started with a patched MySQL server is using a pre-patched binary. At Moviepilot an OurDelta’s pre-patchted MySQL 5.0 server that includes the UserStats patch is running fine for about a year now.

Let’s assume you already installed OurDelta’s MySQL 5.0, which is fairly more than adding and using an apt-source in Debian/Ubuntu or similar in rpm-based distributions. After installation the MySQL server behaves

Enable UserStats‘ Enhanced Statistics

As stated on the official patch originator’s (Percona) documentation, UserStats is enabled by setting the global variable „userstat_running“ to „on“. You can do this on the fly by entering your mysql command line interface and issuing „SET GLOBAL userstat_running = 1;“ as shown below:

mysql> SET GLOBAL userstat_running = 1;
Query OK, 0 rows affected (0.00 sec)

The UserStats counter is now running and only has a slight impact on your cpu performance. For us it’s fine to run it by default but you might enable it on an on-demand basis

Grab Statistics

The UserStats statistics can be retrieved in two ways. The simple way is using „SHOW INDEX_STATISTICS“. This will provide with an unsorted list of all indexes that have been used so far with count times.

mysql> show index_statistics;
+-------------+----------+--------------------------+---------+
|Table_schema |Table_name|Index_name                |Rows_read|
+-------------+----------+--------------------------+---------+
|de_moviepilot|broadcasts|movie_id_and_ends_at_index|  7244936|
|fr_moviepilot|place_keyw|lft_and_rgt               |    46965|
|de_moviepilot|mushes916 |index_mushes_on_user_id_an|   310538|
|de_moviepilot|mushes567 |top                       |   137855|
|de_moviepilot|mushes402 |PRIMARY                   |  3033119|
...
|pl_moviepilot|u_settings|index_user_settings_on_use|   469600|
|de_moviepilot|answers   |answerable_id_and_answerab| 11162446|
|es_moviepilot|cinema_the|PRIMARY                   |    76805|
|de_moviepilot|list_items|PRIMARY                   |    14208|
+-------------+----------+--------------------------+---------+
10689 rows in set (0.03 sec)

This table is already quite useful as it gives you handy details about your indexes. As „SHOW“ only processes WHERE-clauses, ignores LIKE-clauses and rejects ORDER you should rather query the virtual table in information_schema like this:

mysql> select * from information_schema.INDEX_STATISTICS\
ORDER BY Rows_read DESC LIMIT 0,10;
+-------------+----------+-------------------------+------------+
|TABLE_SCHEMA |TABLE_NAME|INDEX_NAME               |ROWS_READ   |
+-------------+----------+-------------------------+------------+
|de_moviepilot|images    |parent_id_and_thumbnail_o|138769917931|
|de_moviepilot|ratings   |PRIMARY                  |116200730622|
|de_moviepilot|ratings   |top_on_ratings           |111350089590|
|de_moviepilot|events    |index_events_on_parent_id| 97002618593|
|de_moviepilot|ratings   |movie_id_and_user_id_and_| 45962792087|
|de_moviepilot|neighbours|PRIMARY                  | 34403784465|
|de_moviepilot|plot_keywo|lft_and_rgt              | 30943317768|
|de_moviepilot|comments  |index_comments_on_comment| 26576184065|
|de_moviepilot|comments  |commentable_type_and_comm| 25467669528|
|moviepilot   |users     |type_and_id_idx          | 21950479057|
+-------------+----------+-------------------------+------------+
10 rows in set (0.02 sec)

You just got the list of the ten most used MyIsam/InnoDb indexes in your database. See tables TABLE_STATISTICS, CLIENT_STATISTICS and USER_STATISTICS in information_schema for further details on table, client and user stats. Feel free to check your InnoDb tables for ones with few writes that maybe should be migrated to MyIsam or heave write MyIsam tables vice versa.

Detect Unused Indexes

But our task for this post is detecting unused indexes. As you already might have noticed, INDEX_STATISTICS only shows indexes that have been used at least once. If you need a list of unused indexes, meaning indexes that have been accessed zero times, you can get them by comparing the list of available indexes and the list of used indexes on a per table base.

select disctinct(INDEX_NAME) from STATISTICS \
where INDEX_NAME != 'PRIMARY' and INDEX_SCHEMA = '${DB}' \
and table_name = '${TABLE}' and INDEX_NAME not in (select \
INDEX_NAME from index_statistics where INDEX_SCHEMA =
'${DB}' and table_name = '${TABLE}');

The variables are placeholders ${DB} and ${TABLE} for usage in shell scripts. Just replace them by a database and table name of your choice.

Putting it all together

As the query above only works on a table basis (I am sure, there are better queries for this issue), and you might want to run this on a regular basis, we wrote a little shell script called „unused_indexes.sh“, available on our snippets repo on github. The script checks all tables in all or a specific database:

$ ./unused_indexes.sh
usage: -d DATABASE (OR -a for all databases) [-f TABLENAMEFILTER]
# check all databases/tables
$ ./unused_indexes.sh -a
# check all tables in database "moviepilot"
$ ./unused_indexes.sh -d moviepilot

The output looks similar to

unused indexes in table moviepilot.stat_promo:
referrer_index mandant_index
---------------------------------------
unused indexes in table moviepilot.stat_promo_del:
c_i_m
---------------------------------------
unused indexes in table mp.comments:
comment_id meta

As we „sharded“ some large tables by splitting them we’d also like to be able to exclude tables:

# check all tables in all databases not matching "%mushes%"
$ ./unused_indexes.sh -a -f mushes
# check all tables in database "moviepilot" not matching "%mushes%"
$ ./unused_indexes.sh -d moviepilot -f mushes

Pitfalls

Please keep in mind that you should enable UserStats for a period long enough to grab statistics that show an average usage of your application and database setup. Also keep in mind that you might have indexes that are only used a few times when running scheduled jobs like importers and therefore might seem to be unused but are important anyway. Also consider flushing your statistics from time to time. As your application’s behaviour changes through deployments your index usage does, too. It might be a good idea to flush UserStats after every deployment.

The current version of unused_indexes.sh ignores all indexes that have been used at least once. It might be a good idea also checking indexes that have been used fewer than n times – just use the SELECT … ORDER BY from above.