フレームワーク開発時代のSQLチューニング基礎（２）SQLのやっちゃだめ前編

http://mtl.recruit.co.jp/blog/2011/02/sqlsql1.htmlの転載です。

研究員の棚橋です。前回に引き続いてSQLのチューニング例を例示していきます。

３．SQLのやっちゃ駄目（前編）

　それでは、SQLクエリのチューニングでよくいわれる「やっちゃ駄目」とその対策を２回にわたって実際にみていきましょう。

「＊」ワイルドカードは使わない
範囲検索はBETWEENを使う
MAXやMIN関数でインデックスを使う
GROUP BYやORDER BYにインデックスを使う。
キーに演算はしない。

「＊」ワイルドカードは使わない

　SELECT文を記述するときに、取得する項目を指定するのが面倒で、「＊」（ワイルドカード）を指定する事がままあると思います。ワイルドカード指定は何が悪いのでしょうか？　実際にクエリとそのexplainをみてみましょう。user_levelが50のuser_idを取得するとします。

mysql> select * from users where user_level=50;

| 898987 | 0000898986 | USER898986 |         2 |         50 |     4997 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 899128 | 0000899127 | USER899127 |         2 |         50 |     4959 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 899850 | 0000899849 | USER899849 |         2 |         50 |     4987 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 899954 | 0000899953 | USER899953 |         2 |         50 |     4958 | NULL         | 2011-01-27 00:00:00 |          0 | 
+--------+------------+------------+-----------+------------+----------+--------------+---------------------+------------+
2210 rows in set (0.02 sec)

mysql> explain select * from users where user_level=50;
+----+-------------+-------+------+-------------------------+-------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys           | key         | key_len | ref   | rows | Extra |
+----+-------------+-------+------+-------------------------+-------------+---------+-------+------+-------+
|  1 | SIMPLE      | users | ref  | users_idx03,users_idx04 | users_idx04 | 4       | const | 1980 |       | 
+----+-------------+-------+------+-------------------------+-------------+---------+-------+------+-------+
1 row in set (0.00 sec)

mysql> select user_id from users where user_level=50;

| 0000898986 | 
| 0000899127 | 
| 0000899849 | 
| 0000899953 | 
+------------+
2210 rows in set (0.00 sec)

mysql> explain select user_id from users where user_level=50;
+----+-------------+-------+------+-------------------------+-------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys           | key         | key_len | ref   | rows | Extra |
+----+-------------+-------+------+-------------------------+-------------+---------+-------+------+-------+
|  1 | SIMPLE      | users | ref  | users_idx03,users_idx04 | users_idx04 | 4       | const | 1980 |       | 
+----+-------------+-------+------+-------------------------+-------------+---------+-------+------+-------+
1 row in set (0.00 sec)

　確かに速くはなっていますが、そのときの状態によってばらつきがありますから、確実に速くなったとはいえない気がします。explainも全く同じですし、MySQLは同じ検索方法をとったといえるでしょう。では、ワイルドカードは遅いというのは嘘なのでしょうか？実は、ワイルドカードを使わないようにというのは検索速度より、メモリやデータキャッシュに関係します。

　ワイルドカードを使った場合、必要のないデータまでメモリ上に展開されますが、項目を指定した場合は必要なデータのみメモリ上に展開されます。メモリは高速ですが、無限の容量がある訳ではないので、効率よく使用するために項目指定をするようにしましょう。　
また、項目指定の方が明らかに速い検索も存在します。例をみてみましょう。

mysql> select * from sales_history where user_id=0000654321;
+---------+------------+---------+------------+--------------+---------------------+
| id      | user_id    | item_id | sales_type | complete_flg | created_at          |
+---------+------------+---------+------------+--------------+---------------------+
| 1518822 | 0000654321 |    3286 |          0 |            0 | 2010-04-30 00:00:00 | 
| 2333849 | 0000654321 |    5644 |          0 |            0 | 2011-01-12 00:00:00 | 
| 2516249 | 0000654321 |    3432 |          0 |            0 | 2010-03-18 00:00:00 | 
| 2631634 | 0000654321 |    5368 |          0 |            0 | 2009-10-02 00:00:00 | 
+---------+------------+---------+------------+--------------+---------------------+
4 rows in set (1.59 sec)

mysql> explain select * from sales_history where user_id=0000654321;
+----+-------------+---------------+------+-----------------------------------------+------+---------+------+---------+-------------+
| id | select_type | table         | type | possible_keys                           | key  | key_len | ref  | rows    | Extra       |
+----+-------------+---------------+------+-----------------------------------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | sales_history | ALL  | sales_history_idx01,sales_history_idx02 | NULL | NULL    | NULL | 3600189 | Using where | 
+----+-------------+---------------+------+-----------------------------------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

mysql> select item_id from sales_history where user_id=0000654321;
+---------+
| item_id |
+---------+
|    3286 | 
|    3432 | 
|    5368 | 
|    5644 | 
+---------+
4 rows in set (0.87 sec)

mysql> explain select item_id from sales_history where user_id=0000654321;
+----+-------------+---------------+-------+-----------------------------------------+---------------------+---------+------+---------+--------------------------+
| id | select_type | table         | type  | possible_keys                           | key                 | key_len | ref  | rows    | Extra                    |
+----+-------------+---------------+-------+-----------------------------------------+---------------------+---------+------+---------+--------------------------+
|  1 | SIMPLE      | sales_history | index | sales_history_idx01,sales_history_idx02 | sales_history_idx01 | 38      | NULL | 3600189 | Using where; Using index | 
+----+-------------+---------------+-------+-----------------------------------------+---------------------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)

確かに速くなっていますし、explainの結果にも差異があります。インデックスを使ったから早いんだろう？と思いますか？実はここにカラクリがあります。候補にあがったインデックスは二つ、idx01はuser_idとitem_idに張られたインデックス、idx02はuser_idに張られたインデックスです。両クエリとも条件はuser_idです、インデックスを使うのならidx02の方を使うのかと思いきや、MySQLはidx01を選択しました。
　カラクリはここにあります、idx01はuser_idとitem_idに張られている、条件と取得するデータがインデックスのキーとして存在しているのです。つまり、MySQLはidx01インデックスを検索するだけで、クエリの結果に必要な情報すべてを取得できるのです。

　通常の検索ではインデックスを使っても、最終的には直接データを参照して必要な情報を取得しますが、取得する情報がインデックスにすべて含まれているときは、インデックスを検索するだけでクエリが完結するのです。

　メモリの効率的な利用や、実際に高速化するクエリ。これだけでもワイルドカードを使わず、項目を指定する価値があります。出来るだけ項目を指定するようにしましょう。

範囲検索はBETWEENを使う

ある項目が値A以上値B以下のものを選択するという場合、例えば以下のように書きます。

select * from users where user_level>=10 and user_level<=50

この例だとusers表からuse r_levelが10以上50以下のデータを検索するという事になります。このような場合、以上以下の複数の条件を各よりSQLのBETWEENを使った方が高速になります。

mysql> select * from users where user_level >=10 and user_level<=50;

| 898987 | 0000898986 | USER898986 |         2 |         50 |     4997 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 899128 | 0000899127 | USER899127 |         2 |         50 |     4959 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 899850 | 0000899849 | USER899849 |         2 |         50 |     4987 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 899954 | 0000899953 | USER899953 |         2 |         50 |     4958 | NULL         | 2011-01-27 00:00:00 |          0 | 
+--------+------------+------------+-----------+------------+----------+--------------+---------------------+------------+
92017 rows in set (0.63 sec)

mysql> explain select * from users where user_level >=10 and user_level<=50;
+----+-------------+-------+-------+-------------------------+-------------+---------+------+-------+-------------+
| id | select_type | table | type  | possible_keys           | key         | key_len | ref  | rows  | Extra       |
+----+-------------+-------+-------+-------------------------+-------------+---------+------+-------+-------------+
|  1 | SIMPLE      | users | range | users_idx03,users_idx04 | users_idx04 | 4       | NULL | 85832 | Using where | 
+----+-------------+-------+-------+-------------------------+-------------+---------+------+-------+-------------+
1 row in set (0.00 sec)

mysql> select * from users where user_level between 10 and 50;

| 898987 | 0000898986 | USER898986 |         2 |         50 |     4997 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 899128 | 0000899127 | USER899127 |         2 |         50 |     4959 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 899850 | 0000899849 | USER899849 |         2 |         50 |     4987 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 899954 | 0000899953 | USER899953 |         2 |         50 |     4958 | NULL         | 2011-01-27 00:00:00 |          0 | 
+--------+------------+------------+-----------+------------+----------+--------------+---------------------+------------+
92017 rows in set (0.61 sec)

mysql> explain select * from users where user_level between 10 and 50;
+----+-------------+-------+-------+-------------------------+-------------+---------+------+-------+-------------+
| id | select_type | table | type  | possible_keys           | key         | key_len | ref  | rows  | Extra       |
+----+-------------+-------+-------+-------------------------+-------------+---------+------+-------+-------------+
|  1 | SIMPLE      | users | range | users_idx03,users_idx04 | users_idx04 | 4       | NULL | 85832 | Using where | 
+----+-------------+-------+-------+-------------------------+-------------+---------+------+-------+-------------+
1 row in set (0.00 sec)

　確かに速く？なっているけど、期待したほどではありません。explainの結果も同じですし？

　このような範囲検索の場合、BETWEENが速くなるというのは言い過ぎで、「速くなる可能性がある」が正解です。BETWEENでは検索時にインデックスが使えれば、インデックスを一回参照する事でデータを検索できます。条件を複数書いた場合は、データベースサーバによりますが、インデックスを複数回検索する事があるので、結果的に速くなる事があるというだけで、データベースがインデックスを一回だけ参照する場合は検索速度に違いがでないのです。（explainの結果からMySQLは内部的に同じ動作をしているようです。）

　ですが、BETWEENを使う意味はあります。それはSQLクエリがすっきりすることです。コードの可読性をあげる事はいい事です。

MAXやMIN関数でインデックスを使う

MAXやMIN関数を適用する項目にインデックスが設定されていると、高速に動作します。usersテーブルを例にとってみてみましょう。

mysql> drop index users_idx01 on users;
Query OK, 900000 rows affected (30.71 sec)
Records: 900000  Duplicates: 0  Warnings: 0

mysql> select max(user_id) from users;
+--------------+
| max(user_id) |
+--------------+
| 0000899999   | 
+--------------+
1 row in set (0.45 sec)

mysql> explain select max(user_id) from users;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL | NULL    | NULL | 900324 |       | 
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
1 row in set (0.00 sec)

mysql> create index users_idx01 on users(user_id);
Query OK, 900000 rows affected (28.35 sec)
Records: 900000  Duplicates: 0  Warnings: 0

mysql> select max(user_id) from users;
+--------------+
| max(user_id) |
+--------------+
| 0000899999   | 
+--------------+
1 row in set (0.00 sec)

mysql> explain select max(user_id) from users;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away | 
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.00 sec)

　速くなってます。これは一目瞭然で、explainのExtraに出ているように「最大限の最適化検索をしました。」という事です。実際にはインデックスがないと、全件を検索して最大値（最小値）を求めますが、インデックスがあると、インデックスの最大値（最小値）を検索すればよいだけなので、非常に高速になったという訳です。

GROUP BYやORDER BYにインデックスを使う。

これも、他の手法と同じでインデックスを効率よく使うことが肝という例です。usersテーブルのuser_expにはインデックスがないので、この項目を利用して。

mysql> select * from users order by user_exp;

| 834414 | 0000834413 | USER834413 |         2 |        400 |    39999 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 209947 | 0000209946 | USER209946 |         0 |        400 |    40000 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 360382 | 0000360381 | USER360381 |         1 |        400 |    40000 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 340522 | 0000340521 | USER340521 |         1 |        400 |    40000 | NULL         | 2011-01-27 00:00:00 |          0 | 
+--------+------------+------------+-----------+------------+----------+--------------+---------------------+------------+
900000 rows in set (2 min 9.29 sec)

mysql> explain select * from users order by user_exp;
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL | NULL    | NULL | 900153 | Using filesort | 
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
1 row in set (0.00 sec)

mysql> create index users_idx05 on users(user_exp);
Query OK, 900000 rows affected (37.99 sec)
Records: 900000  Duplicates: 0  Warnings: 0

mysql> select * from users order by user_exp;

| 885921 | 0000885920 | USER885920 |         2 |        400 |    39999 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 209947 | 0000209946 | USER209946 |         0 |        400 |    40000 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 340522 | 0000340521 | USER340521 |         1 |        400 |    40000 | NULL         | 2011-01-27 00:00:00 |          0 | 
| 360382 | 0000360381 | USER360381 |         1 |        400 |    40000 | NULL         | 2011-01-27 00:00:00 |          0 | 
+--------+------------+------------+-----------+------------+----------+--------------+---------------------+------------+
900000 rows in set (6.73 sec)

mysql> explain select * from users order by user_exp;
+----+-------------+-------+-------+---------------+-------------+---------+------+--------+-------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref  | rows   | Extra |
+----+-------------+-------+-------+---------------+-------------+---------+------+--------+-------+
|  1 | SIMPLE      | users | index | NULL          | users_idx05 | 4       | NULL | 900257 |       | 
+----+-------------+-------+-------+---------------+-------------+---------+------+--------+-------+
1 row in set (0.00 sec)

圧倒的に速くなりました。

mysql> select user_level,sum(user_exp) from users group by user_level;

|        397 |      89290895 | 
|        398 |      90428195 | 
|        399 |      89465111 | 
|        400 |      89806399 | 
+------------+---------------+
400 rows in set (0.66 sec)

mysql> explain select user_level,sum(user_exp) from users group by user_level;
+----+-------------+-------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+--------+---------------------------------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL | NULL    | NULL | 900346 | Using temporary; Using filesort | 
+----+-------------+-------+------+---------------+------+---------+------+--------+---------------------------------+
1 row in set (0.00 sec)

mysql> create index users_idx06 on users(user_level,user_exp);
Query OK, 900000 rows affected (33.70 sec)
Records: 900000  Duplicates: 0  Warnings: 0

mysql> select user_level,sum(user_exp) from users force index(users_idx06) group by user_level;

|        397 |      89290895 | 
|        398 |      90428195 | 
|        399 |      89465111 | 
|        400 |      89806399 | 
+------------+---------------+
400 rows in set (0.11 sec)

mysql> explain select user_level,sum(user_exp) from users force index(users_idx06) group by user_level;
+----+-------------+-------+-------+---------------+-------------+---------+------+--------+-------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref  | rows   | Extra       |
+----+-------------+-------+-------+---------------+-------------+---------+------+--------+-------------+
|  1 | SIMPLE      | users | index | NULL          | users_idx06 | 8       | NULL | 900162 | Using index | 
+----+-------------+-------+-------+---------------+-------------+---------+------+--------+-------------+
1 row in set (0.00 sec)

　ORDER BYも速くなりました。ただし、インデックスを利用しても遅くなる場合があります。これは、意図したインデックスが利用されない場合です。そのため上記の例ではFROM句でFORCE INDEXを記述しています。

　今回の例のように多数のインデックスが作成されていると、意図しないインデックスの使用によりかえって遅くなる事があります。インデックスは便利ですが、よく考えて作成するようにしましょう。

キーに演算はしない。

条件に指定した項目でインデックスが作成されている場合、その項目に関数や演算を用いると、せっかく作成したインデックスが使われません。例えば以下のような場合です。

mysql> create index sales_history_idx01 on sales_history(created_at);
Query OK, 3600000 rows affected (1 min 7.09 sec)
Records: 3600000  Duplicates: 0  Warnings: 0

mysql> select * from sales_history where date_format(created_at,"%Y%m%d")="20101012";

| 3598752 | 0000370765 |    2354 |          0 |            0 | 2010-10-12 00:00:00 | 
| 3598958 | 0000863257 |    4547 |          0 |            0 | 2010-10-12 00:00:00 | 
| 3599063 | 0000669464 |    3283 |          0 |            0 | 2010-10-12 00:00:00 | 
| 3599886 | 0000702869 |    3655 |          0 |            0 | 2010-10-12 00:00:00 | 
+---------+------------+---------+------------+--------------+---------------------+
7237 rows in set (3.43 sec)


mysql> explain select * from sales_history where date_format(created_at,"%Y%m%d")="20101012";
+----+-------------+---------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table         | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+---------------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | sales_history | ALL  | NULL          | NULL | NULL    | NULL | 3600264 | Using where | 
+----+-------------+---------------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

mysql> select * from sales_history where created_at between "2010-10-12 00:00:00" and "2010-10-12 23:59:59";

| 3598752 | 0000370765 |    2354 |          0 |            0 | 2010-10-12 00:00:00 | 
| 3598958 | 0000863257 |    4547 |          0 |            0 | 2010-10-12 00:00:00 | 
| 3599063 | 0000669464 |    3283 |          0 |            0 | 2010-10-12 00:00:00 | 
| 3599886 | 0000702869 |    3655 |          0 |            0 | 2010-10-12 00:00:00 | 
+---------+------------+---------+------------+--------------+---------------------+
7237 rows in set (0.05 sec)

mysql> explain select * from sales_history where created_at between "2010-10-12 00:00:00" and "2010-10-12 23:59:59";
+----+-------------+---------------+-------+---------------------+---------------------+---------+------+------+-------------+
| id | select_type | table         | type  | possible_keys       | key                 | key_len | ref  | rows | Extra       |
+----+-------------+---------------+-------+---------------------+---------------------+---------+------+------+-------------+
|  1 | SIMPLE      | sales_history | range | sales_history_idx01 | sales_history_idx01 | 9       | NULL | 8030 | Using where | 
+----+-------------+---------------+-------+---------------------+---------------------+---------+------+------+-------------+
1 row in set (0.00 sec)

　関数を使った場合は3.43秒、使わなかった場合は0.05秒。差は歴然ですね。explainの結果も関数を使った場合は全表検索、使わない場合はインデックス検索、インデックスの威力がわかる好例です。せっかく作ったインデクスを有効に活用するために、条件の指定方法に注意を払いましょう。

最終回の次回は、さらに「やっちゃダメ」を紹介します。

hiroohiのメモ

はてななのでITやスタートアップ周りの話（ほとんどが自分への備忘録だけど）を書いています。