完成相同业务逻辑的SQL,写法不同,执行效率可能会有几百上千倍的差距,今天我们通过几个案例来说明一下:
case1 :
原sql代码如下(执行时间1.2分钟):
with holder_clear_temp as
( select distinct t.principal_holder_account
from ch_member.holder_account s, ch_member.clear_agency_relation t
where s.holder_account = t.principal_holder_account and s.holder_account_status = '1'
and t.agency_status = '1' and t.agency_type in ('1','2')
and t.agency_holder_account = :1 and t.principal_holder_account != :2
) , holder_settle_temp as
( select t.principal_holder_account, t.product_category
from ch_member.holder_account s, ch_member.settle_agency_rel t
where s.holder_account = t.principal_holder_account
and s.holder_account_status = '1' and t.agency_status = '1'
and t.agency_type in ('1','2') and t.agency_holder_account = :3
and t.principal_holder_account != :4
and not exists
( select 1 from holder_clear_temp c where c.principal_holder_account=t.principal_holder_account )
) , temp as
( select jour.BALANCE_CHG_SN
from ch_his.HIS_ACCOUNT_CHG_BALANCE_JOUR jour
inner join ch_stock.product_info info
on (info.product_code = jour.product_code
or
(info.pub_product_code = jour.product_code and info.has_distribution_flag='1'))
where 1=1
and (exists
( select 1 from holder_clear_temp c where jour.holder_account=c.principal_holder_account )
or exists
( select 1 from holder_settle_temp s where jour.holder_account=s.principal_holder_account
and info.product_Category =s.product_category
) )
and jour.init_date >= :5 and jour.init_date <= :6
union all
select jour.BALANCE_CHG_SN
from ch_stock.ACCOUNT_CHG_BALANCE_JOUR jour
inner join ch_stock.product_info info
on (info.product_code = jour.product_code
or (info.pub_product_code = jour.product_code and info.has_distribution_flag='1'))
where 1=1
and ( exists
( select 1 from holder_clear_temp c where jour.holder_account=c.principal_holder_account )
or exists
( select 1 from holder_settle_temp s
where jour.holder_account=s.principal_holder_account and info.product_Category =s.product_category
) )
and jour.init_date >= :7 and jour.init_date <= :8
)
select count(1) from temp;
这个sql相对复杂一点,我们通过sql monitor显示的执行计划可以明显的看出瓶颈所在: 因为谓词条件使用了or 连接两个exists子查询,所以只能使用filter操作,而主查询返回的记录数又比较多,就导致sql执行时间比较长. 根据sql写法和执行计划反馈的信息,我们就可以通过改写来优化这个SQL.
sql monitor显示(部分):
建议有兴趣的朋友可以下载sql monitor文件,调动一下自己的思维,先自己分析一下该如何优化这个SQL . 我之前分享到微信群和QQ群,有一个群友已经找到了优化的核心所在,尽管还有一些小瑕疵.
注:sql monitor文件可以在QQ群16778072 下载. 这个sql monitor的"Plan Statistics"页面显示的执行计划有点不太正确,但是不影响大局."Plan"页面显示的是正常的.
改写后的SQL:
with holder_clear_temp as
( select distinct t.principal_holder_account
from ch_member.holder_account s, ch_member.clear_agency_relation t
where s.holder_account = t.principal_holder_account and s.holder_account_status = '1'
and t.agency_status = '1' and t.agency_type in ('1','2')
and t.agency_holder_account = '2110348' and t.principal_holder_account != '2110348'
) ,
holder_settle_temp as
( select t.principal_holder_account, t.product_category
from ch_member.holder_account s, ch_member.settle_agency_rel t
where s.holder_account = t.principal_holder_account
and s.holder_account_status = '1' and t.agency_status = '1'
and t.agency_type in ('1','2') and t.agency_holder_account = '2110348'
and t.principal_holder_account != '2110348'
and not exists
( select 1 from holder_clear_temp c where c.principal_holder_account=t.principal_holder_account )
) ,
exists_temp as
(select principal_holder_account,'xx' as product_category from holder_clear_temp
union
select principal_holder_account,product_category from holder_settle_temp
),
temp as
(
select jour.BALANCE_CHG_SN
from ch_his.HIS_ACCOUNT_CHG_BALANCE_JOUR jour,ch_stock.product_info info,
exists_temp uuu
where info.product_code = jour.product_code
and jour.holder_account=uuu.principal_holder_account
and (uuu.product_category='xx' or info.product_Category =uuu.product_category)
and jour.init_date >= 20190205
and jour.init_date <= 20190505
union all
select jour.BALANCE_CHG_SN
from ch_his.HIS_ACCOUNT_CHG_BALANCE_JOUR jour,ch_stock.product_info info,
exists_temp uuu
where (info.pub_product_code = jour.product_code and info.has_distribution_flag='1')
and jour.holder_account=uuu.principal_holder_account
and (uuu.product_category='xx' or info.product_Category =uuu.product_category)
and jour.init_date >= 20190205
and jour.init_date <= 20190505
and lnnvl(info.product_code = jour.product_code)
-------------------------------------------------------------------------------------
union all
select jour.BALANCE_CHG_SN
from ch_stock.ACCOUNT_CHG_BALANCE_JOUR jour
inner join ch_stock.product_info info
on (info.product_code = jour.product_code or (info.pub_product_code = jour.product_code and info.has_distribution_flag='1'))
where 1=1
and ( exists
( select 1 from holder_clear_temp c where jour.holder_account=c.principal_holder_account )
or
exists
( select 1 from holder_settle_temp s where jour.holder_account=s.principal_holder_account and info.product_Category =s.product_category
) )
and jour.init_date >= 20190205
and jour.init_date <= 20190505
)
select count(1) from temp;
改写效果:
经过改写后,原来执行1.2分钟的SQL,现场测试只需要耗时0.6秒(这个测试只改了耗时较长union all的上半部分,如果下半部分也做相同改写,预计最终执行时间不到0.3秒,性能提升达200多倍).
改写说明:
原sql用or 连接的两个exists ,存在相同的关联条件,我们通过一个union(注意不是union all)把它合并在一起,通过CTE(with as)定义为exists_temp ,然后就可以与主查询的两个表做关联,而不是做filter. 因为主查询两个表的关联关系也存在一个or,优化器必然会使用concat,那样就会拆分成4段做union all. 我只希望主查询做concat,就人工做了concat,将主查询拆分成了union all.
注:网上很多sql优化专家在对or 改写的时候,基本上全部改成了union,这是不等价的改写方法,标准改写请参考一下本例的union all配合lnnvl的写法.
case2:
原SQL:
SELECT A.FLOW_INID, A.CURR_STEP, A.FLOW_NAME, A.FINS_NAME,
TO_CHAR(A.INST_CRDA, 'YYYY-MM-DD HH24:MI:SS') INST_CRDA, 'manual_rel' RELA_TYPE
FROM FLOW_INST A
WHERE EXISTS
(SELECT 1
FROM FLOW_RELATE_INFO B
WHERE A.FLOW_INID = B.RELATE_FLOW_INID AND B.FLOW_INID = :1
)
OR
EXISTS
(SELECT 1
FROM FLOW_RELATE_INFO B
WHERE A.FLOW_INID = B.FLOW_INID AND B.RELATE_FLOW_INID = :2
)
UNION ALL
SELECT S.FLOW_INID, S.CURR_STEP, S.FLOW_NAME
, S.FINS_NAME, TO_CHAR(S.INST_CRDA, 'YYYY-MM-DD HH24:MI:SS') INST_CRDA, 'auto_rel' RELA_TYPE
FROM FLOW_INST S,
(SELECT FI.FLOW_INID, FI.PARA_INID
FROM FLOW_INST FI
WHERE FI.FLOW_INID = :3
) F
WHERE ((F.FLOW_INID = S.PARA_INID AND S.IF_SUB = 1) OR F.PARA_INID = S.FLOW_INID ) AND S.DEL_FLAG = 0;
执行计划:
----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 194K(100)| | 0 |00:00:01.83 | 352K|
| 1 | UNION-ALL | | 1 | | | | | 0 |00:00:01.83 | 352K|
|* 2 | FILTER | | 1 | | | | | 0 |00:00:01.83 | 352K|
| 3 | TABLE ACCESS FULL | FLOW_INST | 1 | 57564 | 5115K| 1094 (1)| 00:00:14 | 58003 |00:00:00.05 | 4156 |
|* 4 | TABLE ACCESS FULL | FLOW_RELATE_INFO | 58003 | 1 | 11 | 4 (0)| 00:00:01 | 0 |00:00:01.75 | 348K|
| 5 | CONCATENATION | | 1 | | | | | 0 |00:00:00.01 | 8 |
| 6 | NESTED LOOPS | | 1 | 1 | 106 | 3 (0)| 00:00:01 | 0 |00:00:00.01 | 3 |
|* 7 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 1 | 1 | 8 | 2 (0)| 00:00:01 | 0 |00:00:00.01 | 3 |
|* 8 | INDEX UNIQUE SCAN | PK_FLOW_INST | 1 | 1 | | 1 (0)| 00:00:01 | 1 |00:00:00.01 | 2 |
|* 9 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 0 | 1 | 98 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 |
|* 10 | INDEX UNIQUE SCAN | PK_FLOW_INST | 0 | 1 | | 0 (0)| | 0 |00:00:00.01 | 0 |
| 11 | NESTED LOOPS | | 1 | 1 | 106 | 4 (0)| 00:00:01 | 0 |00:00:00.01 | 5 |
| 12 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 1 | 1 | 8 | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 3 |
|* 13 | INDEX UNIQUE SCAN | PK_FLOW_INST | 1 | 1 | | 1 (0)| 00:00:01 | 1 |00:00:00.01 | 2 |
|* 14 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 1 | 1 | 98 | 2 (0)| 00:00:01 | 0 |00:00:00.01 | 2 |
|* 15 | INDEX RANGE SCAN | IDX_SUBFLOW_CHECK | 1 | 1 | | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 2 |
----------------------------------------------------------------------------------------------------------------------------------------------
分析:
这是一个OA系统的业务SQL,执行时间接近2秒. FLOW_RELATE_INFO 表只有480条记录,8 blocks.
在不改写SQL的情况下,我们可以通过创建FLOW_RELATE_INFO表上 (FLOW_INID,RELATE_FLOW_INID)两字段联合索引,将sql执行效率提高到0.37秒(OA系统相对可以接受的一个响应时间):
这个创建小表索引提升效率的方法,也是对那些小表不需要创建索引说法的一个反证.
如果我们改写这个sql,可以不需要创建索引,就能得到一个更好的性能提升:不到0.01秒.
SQL改写结果如下:
SELECT A.FLOW_INID, A.CURR_STEP, A.FLOW_NAME
, A.FINS_NAME, TO_CHAR(A.INST_CRDA, 'YYYY-MM-DD HH24:MI:SS') INST_CRDA, 'manual_rel' RELA_TYPE
FROM FLOW_INST A
WHERE FLOW_INID in
(SELECT RELATE_FLOW_INID
FROM FLOW_RELATE_INFO
WHERE FLOW_INID = '77913'
union
SELECT FLOW_INID
FROM FLOW_RELATE_INFO B
WHERE RELATE_FLOW_INID= '77913'
)
UNION ALL
SELECT S.FLOW_INID, S.CURR_STEP, S.FLOW_NAME
, S.FINS_NAME, TO_CHAR(S.INST_CRDA, 'YYYY-MM-DD HH24:MI:SS') INST_CRDA, 'auto_rel' RELA_TYPE
FROM FLOW_INST S,
(SELECT FI.FLOW_INID, FI.PARA_INID
FROM FLOW_INST FI
WHERE FI.FLOW_INID = '77913'
) F
WHERE ((F.FLOW_INID = S.PARA_INID AND S.IF_SUB = 1) OR F.PARA_INID = S.FLOW_INID )
AND S.DEL_FLAG = 0;
执行计划:
----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 19 (100)| | 0 |00:00:00.01 | 20 |
| 1 | UNION-ALL | | 1 | | | | | 0 |00:00:00.01 | 20 |
| 2 | NESTED LOOPS | | 1 | 2 | 198 | 12 (17)| 00:00:01 | 0 |00:00:00.01 | 12 |
| 3 | NESTED LOOPS | | 1 | 2 | 198 | 12 (17)| 00:00:01 | 0 |00:00:00.01 | 12 |
| 4 | VIEW | VW_NSO_1 | 1 | 2 | 16 | 10 (20)| 00:00:01 | 0 |00:00:00.01 | 12 |
| 5 | SORT UNIQUE | | 1 | 2 | 22 | 10 (20)| 00:00:01 | 0 |00:00:00.01 | 12 |
| 6 | UNION-ALL | | 1 | | | | | 0 |00:00:00.01 | 12 |
|* 7 | TABLE ACCESS FULL | FLOW_RELATE_INFO | 1 | 1 | 11 | 4 (0)| 00:00:01 | 0 |00:00:00.01 | 6 |
|* 8 | TABLE ACCESS FULL | FLOW_RELATE_INFO | 1 | 1 | 11 | 4 (0)| 00:00:01 | 0 |00:00:00.01 | 6 |
|* 9 | INDEX UNIQUE SCAN | PK_FLOW_INST | 0 | 1 | | 0 (0)| | 0 |00:00:00.01 | 0 |
| 10 | TABLE ACCESS BY INDEX ROWID | FLOW_INST | 0 | 1 | 91 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 |
| 11 | CONCATENATION | | 1 | | | | | 0 |00:00:00.01 | 8 |
| 12 | NESTED LOOPS | | 1 | 1 | 106 | 3 (0)| 00:00:01 | 0 |00:00:00.01 | 3 |
|* 13 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 1 | 1 | 8 | 2 (0)| 00:00:01 | 0 |00:00:00.01 | 3 |
|* 14 | INDEX UNIQUE SCAN | PK_FLOW_INST | 1 | 1 | | 1 (0)| 00:00:01 | 1 |00:00:00.01 | 2 |
|* 15 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 0 | 1 | 98 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 |
|* 16 | INDEX UNIQUE SCAN | PK_FLOW_INST | 0 | 1 | | 0 (0)| | 0 |00:00:00.01 | 0 |
| 17 | NESTED LOOPS | | 1 | 1 | 106 | 4 (0)| 00:00:01 | 0 |00:00:00.01 | 5 |
| 18 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 1 | 1 | 8 | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 3 |
|* 19 | INDEX UNIQUE SCAN | PK_FLOW_INST | 1 | 1 | | 1 (0)| 00:00:01 | 1 |00:00:00.01 | 2 |
|* 20 | TABLE ACCESS BY INDEX ROWID| FLOW_INST | 1 | 1 | 98 | 2 (0)| 00:00:01 | 0 |00:00:00.01 | 2 |
|* 21 | INDEX RANGE SCAN | IDX_SUBFLOW_CHECK | 1 | 1 | | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 2 |
----------------------------------------------------------------------------------------------------------------------------------------------
case3:
原sql:
SELECT C.fd_txtname As "文件名称",a.fd_start_time AS "开始时间",
a.fd_end_time AS "结束时间",c.N AS "数据量"
FROM dapdw.tb_dapetl_log_proc a
join dapdw.tb_dapetl_distribute_spool B on a.fd_proc_name=b.fd_id
join
(SELECT 'YCCTOEAL_INSTALMENT_DYYMMDD.DAT' as fd_txtname,COUNT(1) as N FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YCC01'
UNION ALL
SELECT 'YCCTOEAL_UNDRAWN_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YCC03'
UNION ALL
SELECT 'YCCTOEAL_FEE_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YCC05'
UNION ALL
SELECT 'NDSTOEAL_FXSPOT_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='NDS04'
UNION ALL
SELECT 'YI2TOEAL_LOAN_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YI201'
UNION ALL
SELECT 'YRLTOEAL_CCFD_DYYMMDD.DAT',COUNT(1) FROM C_EAL_LOANDEPO_HIS where data_Dt = 20190217 and source_id ='YRL01'
) C
ON C.fd_txtname=B.fd_txtname
WHERE A.FD_DATE=20190217
;
sql分析:
union all部分的C_EAL_LOANDEPO_HIS 表占用空间几十G,以data_Dt字段按天分区,有50个分区,data_Dt字段是varchar2类型.
存在两个问题:
1.data_Dt字段类型不匹配,发生了隐式类型转换,无法实现分区裁剪.类型匹配只需要访问一个分区,但是使用number类型变量要访问全部50个分区.
2.C_EAL_LOANDEPO_HIS表6次重复访问,可以使用case when的写法,只需要访问一次.
解决了上面两个问题后,预计改写后的SQL,执行执行效率会是原来的50*6=300倍. 只需要将data_Dt=20190217改成data_Dt='20190217',然后再配合case when, 不需要union all,只需要访问C_EAL_LOANDEPO_HIS表一次就能实现原SQL的业务逻辑,这个改写比较简单,这里就不多做说明.
总结:
今天通过3个case 来谈谈sql写法的重要性: 实现相同逻辑,写法不同,可能会有成百上千倍的性能差异.
只有熟练掌握分析执行计划的方法,再加上对各种SQL低效写法的了解,才能让SQL得以用最少的资源,最快的速度,完成业务需求.
感谢大家的阅读!
本文分享自 老虎刘谈oracle性能优化 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!