MySQL 知识-- 相关子查询优化

本文最后更新于:10 天前

相关子查询的 MySQL 错误优化(MySQL 5.7 以后不适用)

问题

对于不相关子查询来说,我们会认为先执行子查询,在执行外层查询,但是实际不会: MySQL 会将相关的外层表压缩到子查询中,它认为这样可以更高效地查找到数据行

例如:

select select * from test.tabname where id in(select id from test.tabname2 where name='name');

# MySQL 不会这样执行:
select group_concat(id) from test.tabname2 where name='name';
# 内层查询结果:1,3,5,7,9,11,13,15,17,1
select * from test.tabname where id in(1,3,5,7,9,11,13,15,17,19);

# MySQL 实际执行:
select * from test.tabname 
where exists(
    select * from test.tabname2 
    where tabname.id=tabname2.id 
    and tabname2.name='name');
# 这时,子查询先根据 id 来关联外部表 tabname ,通过 explain 我们可以看到子查询是一个相关子查询(DEPENDENT SUBQUERY)

explain select * from test.tabname where id in(select id from test.tabname2 where name='name');
# +----+--------------------+----------+-----------------+---------------+---------+---------+------+------+-------------+
# | id | select_type        | table    | type            | possible_keys | key     | key_len | ref  | rows | Extra       |
# +----+--------------------+----------+-----------------+---------------+---------+---------+------+------+-------------+
# |  1 | PRIMARY            | tabname  | ALL             | NULL          | NULL    | NULL    | NULL |   30 | Using where | 
# |  2 | DEPENDENT SUBQUERY | tabname2 | unique_subquery | PRIMARY       | PRIMARY | 4       | func |    1 | Using where | 
# +----+--------------------+----------+-----------------+---------------+---------+---------+------+------+-------------+

explain select * from test.tabname
where exists(
    select * from test.tabname2
    where tabname.id=tabname2.id and tabname2.name='name'
    );
# +----+--------------------+----------+--------+---------------+---------+---------+-----------------+------+-------------+
# | id | select_type        | table    | type   | possible_keys | key     | key_len | ref             | rows | Extra       |
# +----+--------------------+----------+--------+---------------+---------+---------+-----------------+------+-------------+
# |  1 | PRIMARY            | tabname  | ALL    | NULL          | NULL    | NULL    | NULL            |   30 | Using where | 
# |  2 | DEPENDENT SUBQUERY | tabname2 | eq_ref | PRIMARY       | PRIMARY | 4       | test.tabname.id |    1 | Using where | 
# +----+--------------------+----------+--------+---------------+---------+---------+-----------------+------+-------------+

# 然而,MySQL 的优化把不相关子查询变成了相关子查询,效率更低: 先扫描外层表,再逐个匹配内层表,如果外层表很大,这样效率降低更加明显

解决方案:使用连接查询

explain select tabname. * from test.tabname inner join test.tabname2 using(id) where tabname2.name='name';

# +----+-------------+----------+--------+---------------+---------+---------+------------------+------+-------------+
# | id | select_type | table    | type   | possible_keys | key     | key_len | ref              | rows | Extra       |
# +----+-------------+----------+--------+---------------+---------+---------+------------------+------+-------------+
# |  1 | SIMPLE      | tabname2 | ALL    | PRIMARY       | NULL    | NULL    | NULL             |   20 | Using where | 
# |  1 | SIMPLE      | tabname  | eq_ref | PRIMARY       | PRIMARY | 4       | test.tabname2.id |    1 |             | 
# +----+-------------+----------+--------+---------------+---------+---------+------------------+------+-------------+

# 这次可以看到,select_type 变为两个简单查询simple
# 首先访问的是 tabname2,因为表 tabname2 的记录比较少,只需该表全表扫描,再查询子查询