开发人员掌握了这个技术,SQL效率会有几百倍的性能提升

老虎刘

发布于 2022-06-22 18:00:59

2780

发布于 2022-06-22 18:00:59

文章被收录于专栏：老虎刘谈oracle性能优化

完成相同业务逻辑的SQL,写法不同,执行效率可能会有几百上千倍的差距,今天我们通过几个案例来说明一下:

case1 :

原sql代码如下(执行时间1.2分钟):

with holder_clear_temp as
  ( select distinct t.principal_holder_account
    from ch_member.holder_account s, ch_member.clear_agency_relation t
    where s.holder_account = t.principal_holder_account      and s.holder_account_status = '1'
      and t.agency_status = '1'      and t.agency_type in ('1','2')
      and t.agency_holder_account = :1      and t.principal_holder_account != :2
  )  , holder_settle_temp as
  ( select t.principal_holder_account, t.product_category
    from ch_member.holder_account s, ch_member.settle_agency_rel t
    where s.holder_account = t.principal_holder_account
      and s.holder_account_status = '1'      and t.agency_status = '1'
      and t.agency_type in ('1','2')      and t.agency_holder_account = :3
      and t.principal_holder_account != :4
      and not exists
      ( select 1 from holder_clear_temp c where c.principal_holder_account=t.principal_holder_account  )
  ) , temp as
  ( select jour.BALANCE_CHG_SN
    from ch_his.HIS_ACCOUNT_CHG_BALANCE_JOUR jour
    inner join ch_stock.product_info info
      on (info.product_code = jour.product_code
        or 
        (info.pub_product_code = jour.product_code and info.has_distribution_flag='1'))
    where 1=1
      and (exists
      ( select 1 from holder_clear_temp c where jour.holder_account=c.principal_holder_account )
      or exists
      ( select 1 from holder_settle_temp s where jour.holder_account=s.principal_holder_account
          and info.product_Category =s.product_category
      ) )
      and jour.init_date >= :5  and jour.init_date <= :6
  union all
    select jour.BALANCE_CHG_SN
    from ch_stock.ACCOUNT_CHG_BALANCE_JOUR jour
    inner join ch_stock.product_info info
      on (info.product_code = jour.product_code
        or (info.pub_product_code = jour.product_code and info.has_distribution_flag='1'))
    where 1=1
      and ( exists
      ( select 1 from holder_clear_temp c where jour.holder_account=c.principal_holder_account )
      or exists
      ( select 1 from holder_settle_temp s
        where jour.holder_account=s.principal_holder_account and info.product_Category =s.product_category
      ) )
      and jour.init_date >= :7 and jour.init_date <= :8
  )
select count(1) from temp;

这个sql相对复杂一点,我们通过sql monitor显示的执行计划可以明显的看出瓶颈所在: 因为谓词条件使用了or 连接两个exists子查询,所以只能使用filter操作,而主查询返回的记录数又比较多,就导致sql执行时间比较长. 根据sql写法和执行计划反馈的信息,我们就可以通过改写来优化这个SQL.

sql monitor显示(部分):

建议有兴趣的朋友可以下载sql monitor文件,调动一下自己的思维,先自己分析一下该如何优化这个SQL . 我之前分享到微信群和QQ群,有一个群友已经找到了优化的核心所在,尽管还有一些小瑕疵.

注:sql monitor文件可以在QQ群16778072 下载. 这个sql monitor的"Plan Statistics"页面显示的执行计划有点不太正确,但是不影响大局."Plan"页面显示的是正常的.

改写后的SQL:

with holder_clear_temp as
  ( select distinct t.principal_holder_account
    from ch_member.holder_account s, ch_member.clear_agency_relation t
    where s.holder_account = t.principal_holder_account      and s.holder_account_status = '1'
      and t.agency_status = '1'      and t.agency_type in ('1','2')
      and t.agency_holder_account = '2110348'      and t.principal_holder_account != '2110348'
  )  , 
   holder_settle_temp as
  ( select t.principal_holder_account, t.product_category
    from ch_member.holder_account s, ch_member.settle_agency_rel t
    where s.holder_account = t.principal_holder_account
      and s.holder_account_status = '1'      and t.agency_status = '1'
      and t.agency_type in ('1','2')      and t.agency_holder_account = '2110348'
      and t.principal_holder_account != '2110348'
      and not exists
      ( select 1 from holder_clear_temp c where c.principal_holder_account=t.principal_holder_account  )
  ) , 
  exists_temp as
  (select principal_holder_account,'xx' as product_category  from holder_clear_temp
   union
   select principal_holder_account,product_category          from holder_settle_temp
  ),
  temp as
 ( 
  select jour.BALANCE_CHG_SN
    from ch_his.HIS_ACCOUNT_CHG_BALANCE_JOUR jour,ch_stock.product_info info,
         exists_temp uuu
    where info.product_code = jour.product_code
       and jour.holder_account=uuu.principal_holder_account 
       and (uuu.product_category='xx' or info.product_Category =uuu.product_category)
      and jour.init_date >= 20190205      
      and jour.init_date <= 20190505
 union all
   select jour.BALANCE_CHG_SN
    from ch_his.HIS_ACCOUNT_CHG_BALANCE_JOUR jour,ch_stock.product_info info,
         exists_temp uuu 
    where (info.pub_product_code = jour.product_code and info.has_distribution_flag='1')
       and jour.holder_account=uuu.principal_holder_account 
       and (uuu.product_category='xx' or info.product_Category =uuu.product_category)
      and jour.init_date >= 20190205      
      and jour.init_date <= 20190505
      and lnnvl(info.product_code = jour.product_code)
-------------------------------------------------------------------------------------      
  union all
    select jour.BALANCE_CHG_SN
    from ch_stock.ACCOUNT_CHG_BALANCE_JOUR jour
    inner join ch_stock.product_info info
      on (info.product_code = jour.product_code or (info.pub_product_code = jour.product_code and info.has_distribution_flag='1'))
    where 1=1
      and ( exists
          ( select 1 from holder_clear_temp c  where jour.holder_account=c.principal_holder_account )
           or 
           exists
          ( select 1 from holder_settle_temp s where jour.holder_account=s.principal_holder_account and info.product_Category =s.product_category
          ) )
      and jour.init_date >= 20190205 
      and jour.init_date <= 20190505
  )
select count(1) from temp;

改写效果:

经过改写后,原来执行1.2分钟的SQL,现场测试只需要耗时0.6秒(这个测试只改了耗时较长union all的上半部分,如果下半部分也做相同改写,预计最终执行时间不到0.3秒,性能提升达200多倍).

改写说明:

原sql用or 连接的两个exists ,存在相同的关联条件,我们通过一个union(注意不是union all)把它合并在一起,通过CTE(with as)定义为exists_temp ,然后就可以与主查询的两个表做关联,而不是做filter. 因为主查询两个表的关联关系也存在一个or,优化器必然会使用concat,那样就会拆分成4段做union all. 我只希望主查询做concat,就人工做了concat,将主查询拆分成了union all.

注:网上很多sql优化专家在对or 改写的时候,基本上全部改成了union,这是不等价的改写方法,标准改写请参考一下本例的union all配合lnnvl的写法.

case2:

原SQL:

SELECT A.FLOW_INID, A.CURR_STEP, A.FLOW_NAME, A.FINS_NAME, 
      TO_CHAR(A.INST_CRDA, 'YYYY-MM-DD HH24:MI:SS') INST_CRDA, 'manual_rel' RELA_TYPE
FROM FLOW_INST A
WHERE EXISTS
  (SELECT 1
  FROM FLOW_RELATE_INFO B
  WHERE A.FLOW_INID = B.RELATE_FLOW_INID    AND B.FLOW_INID = :1
  )
  OR 
    EXISTS
  (SELECT 1
  FROM FLOW_RELATE_INFO B
  WHERE A.FLOW_INID = B.FLOW_INID    AND B.RELATE_FLOW_INID = :2
  )
UNION ALL
SELECT S.FLOW_INID, S.CURR_STEP, S.FLOW_NAME
, S.FINS_NAME, TO_CHAR(S.INST_CRDA, 'YYYY-MM-DD HH24:MI:SS') INST_CRDA, 'auto_rel' RELA_TYPE
FROM FLOW_INST S, 
(SELECT FI.FLOW_INID, FI.PARA_INID
  FROM FLOW_INST FI
  WHERE FI.FLOW_INID = :3
) F
WHERE ((F.FLOW_INID = S.PARA_INID  AND S.IF_SUB = 1) OR F.PARA_INID = S.FLOW_INID )  AND S.DEL_FLAG = 0;

执行计划:
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name              | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                   |      1 |        |       |   194K(100)|          |      0 |00:00:01.83 |     352K|
|   1 |  UNION-ALL                     |                   |      1 |        |       |            |          |      0 |00:00:01.83 |     352K|
|*  2 |   FILTER                       |                   |      1 |        |       |            |          |      0 |00:00:01.83 |     352K|
|   3 |    TABLE ACCESS FULL           | FLOW_INST         |      1 |  57564 |  5115K|  1094   (1)| 00:00:14 |  58003 |00:00:00.05 |    4156 |
|*  4 |    TABLE ACCESS FULL           | FLOW_RELATE_INFO  |  58003 |      1 |    11 |     4   (0)| 00:00:01 |      0 |00:00:01.75 |     348K|
|   5 |   CONCATENATION                |                   |      1 |        |       |            |          |      0 |00:00:00.01 |       8 |
|   6 |    NESTED LOOPS                |                   |      1 |      1 |   106 |     3   (0)| 00:00:01 |      0 |00:00:00.01 |       3 |
|*  7 |     TABLE ACCESS BY INDEX ROWID| FLOW_INST         |      1 |      1 |     8 |     2   (0)| 00:00:01 |      0 |00:00:00.01 |       3 |
|*  8 |      INDEX UNIQUE SCAN         | PK_FLOW_INST      |      1 |      1 |       |     1   (0)| 00:00:01 |      1 |00:00:00.01 |       2 |
|*  9 |     TABLE ACCESS BY INDEX ROWID| FLOW_INST         |      0 |      1 |    98 |     1   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|* 10 |      INDEX UNIQUE SCAN         | PK_FLOW_INST      |      0 |      1 |       |     0   (0)|          |      0 |00:00:00.01 |       0 |
|  11 |    NESTED LOOPS                |                   |      1 |      1 |   106 |     4   (0)| 00:00:01 |      0 |00:00:00.01 |       5 |
|  12 |     TABLE ACCESS BY INDEX ROWID| FLOW_INST         |      1 |      1 |     8 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |
|* 13 |      INDEX UNIQUE SCAN         | PK_FLOW_INST      |      1 |      1 |       |     1   (0)| 00:00:01 |      1 |00:00:00.01 |       2 |
|* 14 |     TABLE ACCESS BY INDEX ROWID| FLOW_INST         |      1 |      1 |    98 |     2   (0)| 00:00:01 |      0 |00:00:00.01 |       2 |
|* 15 |      INDEX RANGE SCAN          | IDX_SUBFLOW_CHECK |      1 |      1 |       |     1   (0)| 00:00:01 |      0 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------------------------------------------

分析:

这是一个OA系统的业务SQL,执行时间接近2秒. FLOW_RELATE_INFO 表只有480条记录,8 blocks.

在不改写SQL的情况下,我们可以通过创建FLOW_RELATE_INFO表上 (FLOW_INID,RELATE_FLOW_INID)两字段联合索引,将sql执行效率提高到0.37秒(OA系统相对可以接受的一个响应时间):

这个创建小表索引提升效率的方法,也是对那些小表不需要创建索引说法的一个反证.

如果我们改写这个sql,可以不需要创建索引,就能得到一个更好的性能提升:不到0.01秒.

SQL改写结果如下:

SELECT A.FLOW_INID, A.CURR_STEP, A.FLOW_NAME
, A.FINS_NAME, TO_CHAR(A.INST_CRDA, 'YYYY-MM-DD HH24:MI:SS') INST_CRDA, 'manual_rel' RELA_TYPE
FROM FLOW_INST A
WHERE FLOW_INID in
  (SELECT RELATE_FLOW_INID
  FROM FLOW_RELATE_INFO
  WHERE FLOW_INID = '77913'
  union 
  SELECT FLOW_INID
  FROM FLOW_RELATE_INFO B
  WHERE RELATE_FLOW_INID= '77913'
  )
UNION ALL
SELECT S.FLOW_INID, S.CURR_STEP, S.FLOW_NAME
, S.FINS_NAME, TO_CHAR(S.INST_CRDA, 'YYYY-MM-DD HH24:MI:SS') INST_CRDA, 'auto_rel' RELA_TYPE
FROM FLOW_INST S, 
(SELECT FI.FLOW_INID, FI.PARA_INID
  FROM FLOW_INST FI
  WHERE FI.FLOW_INID = '77913'
) F
WHERE ((F.FLOW_INID = S.PARA_INID AND S.IF_SUB = 1) OR F.PARA_INID = S.FLOW_INID )
  AND S.DEL_FLAG = 0;
  
  执行计划:
  ----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name              | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                   |      1 |        |       |    19 (100)|          |      0 |00:00:00.01 |      20 |
|   1 |  UNION-ALL                     |                   |      1 |        |       |            |          |      0 |00:00:00.01 |      20 |
|   2 |   NESTED LOOPS                 |                   |      1 |      2 |   198 |    12  (17)| 00:00:01 |      0 |00:00:00.01 |      12 |
|   3 |    NESTED LOOPS                |                   |      1 |      2 |   198 |    12  (17)| 00:00:01 |      0 |00:00:00.01 |      12 |
|   4 |     VIEW                       | VW_NSO_1          |      1 |      2 |    16 |    10  (20)| 00:00:01 |      0 |00:00:00.01 |      12 |
|   5 |      SORT UNIQUE               |                   |      1 |      2 |    22 |    10  (20)| 00:00:01 |      0 |00:00:00.01 |      12 |
|   6 |       UNION-ALL                |                   |      1 |        |       |            |          |      0 |00:00:00.01 |      12 |
|*  7 |        TABLE ACCESS FULL       | FLOW_RELATE_INFO  |      1 |      1 |    11 |     4   (0)| 00:00:01 |      0 |00:00:00.01 |       6 |
|*  8 |        TABLE ACCESS FULL       | FLOW_RELATE_INFO  |      1 |      1 |    11 |     4   (0)| 00:00:01 |      0 |00:00:00.01 |       6 |
|*  9 |     INDEX UNIQUE SCAN          | PK_FLOW_INST      |      0 |      1 |       |     0   (0)|          |      0 |00:00:00.01 |       0 |
|  10 |    TABLE ACCESS BY INDEX ROWID | FLOW_INST         |      0 |      1 |    91 |     1   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|  11 |   CONCATENATION                |                   |      1 |        |       |            |          |      0 |00:00:00.01 |       8 |
|  12 |    NESTED LOOPS                |                   |      1 |      1 |   106 |     3   (0)| 00:00:01 |      0 |00:00:00.01 |       3 |
|* 13 |     TABLE ACCESS BY INDEX ROWID| FLOW_INST         |      1 |      1 |     8 |     2   (0)| 00:00:01 |      0 |00:00:00.01 |       3 |
|* 14 |      INDEX UNIQUE SCAN         | PK_FLOW_INST      |      1 |      1 |       |     1   (0)| 00:00:01 |      1 |00:00:00.01 |       2 |
|* 15 |     TABLE ACCESS BY INDEX ROWID| FLOW_INST         |      0 |      1 |    98 |     1   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|* 16 |      INDEX UNIQUE SCAN         | PK_FLOW_INST      |      0 |      1 |       |     0   (0)|          |      0 |00:00:00.01 |       0 |
|  17 |    NESTED LOOPS                |                   |      1 |      1 |   106 |     4   (0)| 00:00:01 |      0 |00:00:00.01 |       5 |
|  18 |     TABLE ACCESS BY INDEX ROWID| FLOW_INST         |      1 |      1 |     8 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |
|* 19 |      INDEX UNIQUE SCAN         | PK_FLOW_INST      |      1 |      1 |       |     1   (0)| 00:00:01 |      1 |00:00:00.01 |       2 |
|* 20 |     TABLE ACCESS BY INDEX ROWID| FLOW_INST         |      1 |      1 |    98 |     2   (0)| 00:00:01 |      0 |00:00:00.01 |       2 |
|* 21 |      INDEX RANGE SCAN          | IDX_SUBFLOW_CHECK |      1 |      1 |       |     1   (0)| 00:00:01 |      0 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------------------------------------------

case3:

原sql:

SELECT C.fd_txtname As "文件名称",a.fd_start_time AS "开始时间",
       a.fd_end_time AS "结束时间",c.N AS "数据量"  
FROM dapdw.tb_dapetl_log_proc a
join dapdw.tb_dapetl_distribute_spool B on a.fd_proc_name=b.fd_id
join 
(SELECT 'YCCTOEAL_INSTALMENT_DYYMMDD.DAT' as fd_txtname,COUNT(1) as N FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YCC01' 
UNION ALL
SELECT 'YCCTOEAL_UNDRAWN_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YCC03' 
UNION ALL
SELECT 'YCCTOEAL_FEE_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YCC05' 
UNION ALL
SELECT 'NDSTOEAL_FXSPOT_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='NDS04' 
UNION ALL
SELECT 'YI2TOEAL_LOAN_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YI201' 
UNION ALL
SELECT 'YRLTOEAL_CCFD_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YRL01' 
) C
ON C.fd_txtname=B.fd_txtname 
WHERE A.FD_DATE=20190217　
;

sql分析:

union all部分的C_EAL_LOANDEPO_HIS 表占用空间几十G,以data_Dt字段按天分区,有50个分区,data_Dt字段是varchar2类型.

存在两个问题:

1.data_Dt字段类型不匹配,发生了隐式类型转换,无法实现分区裁剪.类型匹配只需要访问一个分区,但是使用number类型变量要访问全部50个分区.

2.C_EAL_LOANDEPO_HIS表6次重复访问,可以使用case when的写法,只需要访问一次.

解决了上面两个问题后,预计改写后的SQL,执行执行效率会是原来的50*6=300倍. 只需要将data_Dt=20190217改成data_Dt='20190217',然后再配合case when, 不需要union all,只需要访问C_EAL_LOANDEPO_HIS表一次就能实现原SQL的业务逻辑,这个改写比较简单,这里就不多做说明.

总结:

今天通过3个case 来谈谈sql写法的重要性: 实现相同逻辑,写法不同,可能会有成百上千倍的性能差异.

只有熟练掌握分析执行计划的方法,再加上对各种SQL低效写法的了解,才能让SQL得以用最少的资源,最快的速度,完成业务需求.

感谢大家的阅读!

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2019-05-18，如有侵权请联系 cloudcommunity@tencent.com 删除

sql