Wednesday, March 28, 2012

Impact of index only scans in mysql

Introduction

Index only scans is an access method when information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row.

It's a well known concept that is implemented in mysql, Oracle, Pg 9.2 has patches.

So I've decided to see for myself how big impact can be.

Goal

We have a big table 'CGM' with g, m and disabled columns. Users table with id and name.

We want to find all users who are member of a group X:

    SELECT u.* FROM CGM JOIN Users u ON u.id = CGM.m
    WHERE CGM.g = X

Sometimes we add 'disabled = 0' condition to the query.

I wrote a script in perl that performes profiling.

First results

First results looked like this:

                    Rate            g   g,disabled          g,m g,m,disabled
    g            13307/s           --          -1%          -3%          -3%
    g,disabled   13474/s           1%           --          -2%          -2%
    g,m          13714/s           3%           2%           --          -0%
    g,m,disabled 13714/s           3%           2%          -0%           --

Not impressive, but it's clear that there is no I/O at all and script is testing CPU intensive task.

Impressive results

When I/O comes to scene situation is completly different. To achive this I've lowered InnoDB buffer pool size to its minimum (1M)

                    Rate            g   g,disabled g,m,disabled          g,m
    g             5531/s           --          -0%         -57%         -58%
    g,disabled    5547/s           0%           --         -57%         -57%
    g,m,disabled 12799/s         131%         131%           --          -2%
    g,m          13033/s         136%         135%           2%           --

In above test we don't check disabled column and two indexes are suitable. Next test adds condition on disabled column.

                    Rate            g          g,m   g,disabled g,m,disabled
    g             5531/s           --          -0%          -5%         -57%
    g,m           5547/s           0%           --          -4%         -57%
    g,disabled    5802/s           5%           5%           --         -55%
    g,m,disabled 12800/s         131%         131%         121%           --

Memory considerations

To cover with indexes all possible access patterns on CGM table we will need two indexes: (g, m, disabled) and (m, g, disabled). The latter required when query starts from users table and fetches groups. This means that you need additional space.

Let's asume we use 4 bytes long integers for all three columns in CGM. Row size is 12 bytes. The following situations are possible:

From the list, one can assume that covering index needs 1.8 times more memory than minimal indexing. However, mysql don't need to access table, so 36MB turns into 24MB. It's 20% more than minimal.

Conclusions

As you could see covering indexes improve performance when data doesn't fit into memory. Otherwise benefit is close to zero. Longer indexes need more memory, but this additional footprint can be lowered.

Recomendations would be. Don't bother as long as data fits into memory. Create only required indexes at this point, smaller indexes would give you more room to grow until you hit memory limit. If you decided to use covering idexes then cover all possible access patterns or you will need additional space in buffer pool for table data.

That's it.