编辑
2025-07-23
数仓
0
请注意,本文编写于 294 天前,最后修改于 149 天前,其中某些信息可能已经过时。

优化前:

sql
SELECT /*+ mapjoin(dd),skewjoin(a(url))*/ a.* except(url_tag, join_key), CASE WHEN cc.url IS NOT NULL THEN 'black_url' WHEN dd.url IS NOT NULL THEN 'black_url' ELSE '' END AS url_tag FROM ( select *, 1 AS join_key from dwd_crawler_urls_step1_di a where ymd = '${P_DATE}' AND url_tag != 'black_url' ) as LEFT JOIN black_urls2 cc ON replace(a.host, 'www.', '') = replace(cc.url, 'www.', '') LEFT JOIN black_urls1 dd ON INSTR(a.url, dd.url) > 0

优化后

sql
SELECT /*+ mapjoin(dd),skewjoin(a(url))*/ a.* except(url_tag, join_key), CASE WHEN cc.url IS NOT NULL THEN 'black_url' WHEN dd.url IS NOT NULL THEN 'black_url' ELSE '' END AS url_tag FROM ( select *, 1 AS join_key from dwd_crawler_urls_step1_di a where ymd = '${P_DATE}' AND url_tag != 'black_url' ) a LEFT JOIN black_urls1 dd ON INSTR(a.url, dd.url) > 0 LEFT JOIN black_urls2 cc ON replace(a.host, 'www.', '') = replace(cc.url, 'www.', '')

优化前执行时间

Image Image

优化后执行时间

Image Image

本文作者:ender

本文链接:

版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!