Contents

Mysql hash join

hash join

mysql์—์„œ join์„ ์‹คํ–‰ํ•˜๋Š” ์œ ์ผํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ BNL ์ด์—ˆ๋‹ค. BNL์€ ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์„ ์Šค์บ”ํ•˜๋ฉด์„œ ์ผ์ • ํฌ๊ธฐ์˜ ๋ธ”๋ก(๋ฉ”๋ชจ๋ฆฌ ๋ฒ„ํผ)๋ฅผ ์ฑ„์šธ ๋•Œ๊นŒ์ง€ ํ–‰์„ ์ฝ๊ณ , ๋‚ด๋ถ€ ํ…Œ์ด๋ธ”์„ ์Šค์บ”ํ•˜๋ฉด์„œ ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์˜ ๋ฉ”๋ชจ๋ฆฌ ๋ฒ„ํผ์— ์žˆ๋Š” ํ–‰๊ณผ ์ผ์น˜ํ•˜๋Š” ์ง€ ํ™•์ธํ•˜๊ณ  ๊ฒฐํ•ฉํ•œ ํ›„ ๊ฒฐ๊ณผ์„ธํŠธ์— ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

๊ทธ ์ดํ›„๋กœ๋Š”, hash join์ด๋ผ๋Š” BNL ๋ณด๋‹ค ๋” ํšจ์œจ์ ์œผ๋กœ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค. hash join์€ ๋‘ ์ž…๋ ฅ ์‚ฌ์ด์—์„œ ์ผ์น˜ํ•˜๋Š” ํ–‰์„ ์ฐพ๊ธฐ ์œ„ํ•ด ํ•ด์‰ฌ ํ…Œ์ด๋ธ”์„ ์ด์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

1
2
3
4
SELECT
  given_name, country_name
FROM
  persons JOIN countries ON persons.country_id = countries.country_id;

build phase

์œ„์™€ ๊ฐ™์€ ์ฟผ๋ฆฌ๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ณด์ž. build phase ์—์„œ in-memory hash table์˜ key๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ๋‘ ํ…Œ์ด๋ธ” ์ค‘ ์ž‘์€ ๊ฒƒ(๋ฐ”์ดํŠธ ์ˆ˜๊ฐ€)์„ ์„ ํƒํ•ด join ์กฐ๊ฑด์œผ๋กœ ๊ฑธ๋ฆฐ column์„ ํ•ด์‰ฌ ํ…Œ์ด๋ธ”์˜ key๋กœ ์‚ฌ์šฉํ•˜๊ณ  ํ•ด์‰ฌํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ํ–‰์„ ์ €์žฅํ•œ๋‹ค.

probe phase

์ด์ œ ๋‚˜๋จธ์ง€ ๋‚จ์€ persons ํ…Œ์ด๋ธ”์˜ country_id๋ฅผ ํ•ด์‰ฌ ํ…Œ์ด๋ธ”์˜ key์™€ ๋งค์นญํ•ด์„œ ์ƒ์ˆ˜์‹œ๊ฐ„์— ๋ฐ์ดํ„ฐ๋ฅผ ์กฐํšŒํ•˜์—ฌ client์— ๋„˜๊ฒจ์ฃผ๊ฒŒ ๋œ๋‹ค.

์œ„์™€ ๊ฐ™์€ ์ž‘์—…์ด ์ •ํ•ด์ง„ ๋ฉ”๋ชจ๋ฆฌ ๋‚ด(join_buffer_size)์—์„œ ์ด๋ฃจ์–ด ์งˆ ์ˆ˜ ์žˆ๋‹ค๋ฉด, hash join์€ ๋งค์šฐ ์ž˜ ์ž‘๋™ํ•œ๋‹ค. ํ•˜์ง€๋งŒ ๋นŒ๋“œ ์ž…๋ ฅ์ด ๋ฉ”๋ชจ๋ฆฌ ์ œํ•œ๋ณด๋‹ค ํฌ๋‹ค๋ฉด ๋””์Šคํฌ๋กœ ๋„˜์น˜๊ฒŒ ๋œ๋‹ค.

spill to disk

๋””์Šคํฌ ๊ธฐ๋ฐ˜์˜ ์ž‘์—…์œผ๋กœ ๋„˜์น˜๊ฒŒ ๋˜๋ฉด ์„ฑ๋Šฅ์— ๋ถ€์ •์ ์ธ ์˜ํ–ฅ์„ ๋ผ์นœ๋‹ค. ์˜ค๋กœ์ง ๋””์Šคํฌ์— ์“ฐ๊ฒŒ ๋˜๋ฉด ๋” ๋‚˜์œ ์„ฑ๋Šฅ์ €ํ•˜๋ฅผ ์ดˆ๋ž˜ํ•˜๊ฒŒ ๋˜๋ฏ€๋กœ, chunk ๋‹จ์œ„๋กœ ์ž˜๋ผ์„œ ๋””์Šคํฌ์— ์ž…๋ ฅํ•œ๋‹ค. ์ด ๋•Œ๋„ ํ•ด์‰ฌ ํ•จ์ˆ˜๊ฐ€ ์‚ฌ์šฉ๋˜์–ด ์ตœ์•…์˜ ์„ฑ๋Šฅ์ €ํ•˜๋ฅผ ๋ง‰๋Š”๋‹ค.

์ด๋ ‡๊ฒŒ ๋„˜์น˜๋Š” ๋ฐ์ดํ„ฐ๋Š” chunk์— ์“ฐ์ด๋Š” ์ด์œ ๋Š” dist IO๋ฅผ ์ตœ๋Œ€ํ•œ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด์„œ์ด๋‹ค. ์ถ”๊ฐ€๋กœ, probe phase์—์„œ ํ•ด์‰ฌ ํ…Œ์ด๋ธ”์—์„œ ์ผ์น˜ํ•˜๋Š” ๋ชจ๋“  ํ–‰์„ ์ฐพ๋Š” ๊ฒƒ ๋ฟ ์•„๋‹ˆ๋ผ ๋””์Šคํฌ์— ์“ฐ์—ฌ์ง„ ํ–‰์—์„œ๋„ ํ•ด์‰ฌํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด ์ผ์น˜ํ•˜๋Š” ํ–‰์„ ์ฐพ๋Š”๋‹ค.

hash join์€ ์–ธ์ œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ?

mysql 8.0.18 ์ด์ „์—์„œ๋Š” hash join์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„ , optimizer_switch flag๋ฅผ ๊ฑด๋“œ๋ ค์„œ hash join ์„ ์‚ฌ์šฉํ•˜๋„๋ก ํ•ด์•ผํ–ˆ๋‹ค. ๊ทธ ์ดํ›„ ๋ฒ„์ „์—์„œ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ equi-join์— ๋Œ€ํ•ด hash join์„ ์‚ฌ์šฉํ•œ๋‹ค. ๊ผญ equi-join์ด ์•„๋‹ˆ๋”๋ผ๋„(cartesian์ด๋ผ๋„) hash join์„ ์‚ฌ์šฉํ•œ๋‹ค.

๋‹ค์Œ์€ hash join์„ ์‚ฌ์šฉํ•˜๋Š” ์˜ˆ์ œ์ด๋‹ค.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# inner non-equi join

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 JOIN t2 ON t1.c1 < t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t1.c1 < t2.c1)  (cost=4.70 rows=12)
    -> Inner hash join (no condition)  (cost=4.70 rows=12)
        -> Table scan on t2  (cost=0.08 rows=6)
        -> Hash
            -> Table scan on t1  (cost=0.85 rows=6)

# Semijoin:

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1
    ->     WHERE t1.c1 IN (SELECT t2.c2 FROM t2)\G
*************************** 1. row ***************************
EXPLAIN: -> Hash semijoin (t2.c2 = t1.c1)  (cost=0.70 rows=1)
    -> Table scan on t1  (cost=0.35 rows=1)
    -> Hash
        -> Table scan on t2  (cost=0.35 rows=1)

# Antijoin

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t2
    ->     WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.c1 = t2.c1)\G
*************************** 1. row ***************************
EXPLAIN: -> Hash antijoin (t1.c1 = t2.c1)  (cost=0.70 rows=1)
    -> Table scan on t2  (cost=0.35 rows=1)
    -> Hash
        -> Table scan on t1  (cost=0.35 rows=1)

1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1276
Message: Field or reference 't3.t2.c1' of SELECT #2 was resolved in SELECT #1

# Left outer join:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t2.c1 = t1.c1)  (cost=0.70 rows=1)
    -> Table scan on t1  (cost=0.35 rows=1)
    -> Hash
        -> Table scan on t2  (cost=0.35 rows=1)