Hive窗口函数sum、min、max、avg入门
现有 hive 表 cookie1, 内容如下:
hive> select * from cookie1;
cookie1 2015-04-10 1
cookie1 2015-04-11 5
cookie1 2015-04-12 7
cookie1 2015-04-13 3
cookie1 2015-04-14 2
cookie1 2015-04-15 4
cookie1 2015-04-16 4
select cookieid, createtime, pv,
sum(pv) over(partition by cookieid) as sum1
from cookie1;
结果:
cookie1 2015-04-16 4 26
cookie1 2015-04-15 4 26
cookie1 2015-04-14 2 26
cookie1 2015-04-13 3 26
cookie1 2015-04-12 7 26
cookie1 2015-04-11 5 26
cookie1 2015-04-10 1 26
partition by cookieid : 按照cookieid分组
select cookieid, createtime, pv,
sum(pv) over(partition by cookieid order by createtime) as sum2
from cookie1;
# 结果
cookie1 2015-04-10 1 1
cookie1 2015-04-11 5 6
cookie1 2015-04-12 7 13
cookie1 2015-04-13 3 16
cookie1 2015-04-14 2 18
cookie1 2015-04-15 4 22
cookie1 2015-04-16 4 26
以上SQL语句是下面这个SQL语句的简写:
select
cookieid,
createtime,
pv,
sum(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as sum3
from cookie1;
order by : 在同一个组内,先累加完相同createtime的pv,再累加其他createtime的pv, 比如 :
现在在表末尾加一条数据cookie1 2015-04-10 1 1
,那么结果就是 :
cookie1 2015-04-10 1 1
cookie1 2015-04-10 1 2
cookie1 2015-04-11 5 7
cookie1 2015-04-12 7 14
cookie1 2015-04-13 3 17
cookie1 2015-04-14 2 19
cookie1 2015-04-15 4 23
cookie1 2015-04-16 4 27
与cookie1 2015-04-10 1 1
这两条数据是否相邻无关。
unbounded : 起点
preceding : 往前
following : 往后
between...end... : 指定操作的范围
current row : 当前行,计算到哪一行就是哪一行
between unbounded preceding and current row : 从起点到当前行,往前(对应表中的从上到下的顺序)累加
默认就是从起点到当前行往前累加,所以between unbounded preceding and current row
这个条件可以不写。
select cookieid, createtime, pv,
sum(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as sum4
from cookie1;
结果分析 :
cookie1 2015-04-10 1 26 (1+5+7+3+2+4+4=26)
cookie1 2015-04-11 5 25 (5+7+3+2+4+4=25)
cookie1 2015-04-12 7 20 (7+3+2+4+4=20)
cookie1 2015-04-13 3 13 (3+2+4+4=13)
cookie1 2015-04-14 2 10 (2+4+4=10)
cookie1 2015-04-15 4 8 (4+4=8)
cookie1 2015-04-16 4 4 (4)
select cookieid, createtime, pv,
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as sum5
from cookie1;
结果分析 :
cookie1 2015-04-10 1 1 (这一行前面没有记录,只能取本身的值)
cookie1 2015-04-11 5 6 (这一行前面有1,不够取3行的值,只能取一行与本身相加,取5+1=6)
cookie1 2015-04-12 7 13 (这一行前面有1,5,不够取3行的值,只能取两行与本身相加,取7+5+1=13)
cookie1 2015-04-13 3 16 (这一行前面有3行可以取,依次往回取三行,取3+7+5+1=16)
cookie1 2015-04-14 2 17 (同理,取2+3+7+5=17)
cookie1 2015-04-15 4 16 (同理,取4+2+3+7=16)
cookie1 2015-04-16 4 13 (同理,取4+4+2+3=13)
select cookieid, createtime, pv,
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as sum6
from cookie1;
结果分析 :
cookie1 2015-04-10 1 6 (这一行前面没有记录,后面可以取一条记录,1+5=6)
cookie1 2015-04-11 5 13 (这一行前面只能取一条记录,后面可以取一条记录,1+5+7=13)
cookie1 2015-04-12 7 16 (这一行前面只能取两条记录,后面可以取一条记录,1+5+7+3=16)
cookie1 2015-04-13 3 18 (这一行前面可以取三条记录,后面可以取一条记录,1+5+7+3+2=18)
cookie1 2015-04-14 2 21 (同理,5+7+3+2+4=21)
cookie1 2015-04-15 4 20 (同理,7+3+2+4+4=20)
cookie1 2015-04-16 4 13 (这一行前面可以取三条记录,后面没有记录,3+2+4+4=13)
sum、avg、max、min这些窗口函数的语法都是一样的,以下用一个SQL语句来演示 :
select cookieid,createtime,pv,
avg(pv) over(partition by cookieid) as avg1,
avg(pv) over(partition by cookieid order by createtime) as avg2,
avg(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as avg3,
avg(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as avg4,
avg(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as avg5,
avg(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as avg6
from cookie1;
结果 :
cookie1 2015-04-10 1 3.7142857142857144 1.0 1.0 3.7142857142857144 1.0 3.0
cookie1 2015-04-11 5 3.7142857142857144 3.0 3.0 4.166666666666667 3.0 4.333333333333333
cookie1 2015-04-12 7 3.7142857142857144 4.333333333333333 4.333333333333333 4.0 4.333333333333333 4.0
cookie1 2015-04-13 3 3.7142857142857144 4.0 4.0 3.25 4.0 3.6
cookie1 2015-04-14 2 3.7142857142857144 3.6 3.6 3.3333333333333335 4.25 4.2
cookie1 2015-04-15 4 3.7142857142857144 3.6666666666666665 3.6666666666666665 4.0 4.0 4.0
cookie1 2015-04-16 4 3.7142857142857144 3.7142857142857144 3.7142857142857144 4.0 3.25 3.25
select cookieid, createtime, pv,
min(pv) over(partition by cookieid) as min1,
min(pv) over(partition by cookieid order by createtime) as min2,
min(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as min3,
min(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as min4,
min(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as min5,
min(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as min6
from cookie1;
结果 :
cookie1 2015-04-10 1 1 1 1 1 1 1
cookie1 2015-04-11 5 1 1 1 2 1 1
cookie1 2015-04-12 7 1 1 1 2 1 1
cookie1 2015-04-13 3 1 1 1 2 1 1
cookie1 2015-04-14 2 1 1 1 2 2 2
cookie1 2015-04-15 4 1 1 1 4 2 2
cookie1 2015-04-16 4 1 1 1 4 2 2
select cookieid,createtime,pv,
max(pv) over(partition by cookieid) as max1,
max(pv) over(partition by cookieid order by createtime) as max2,
max(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as max3,
max(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as max4,
max(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as max5,
max(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as max6
from cookie1;
结果 :
cookie1 2015-04-10 1 7 1 1 7 1 5
cookie1 2015-04-11 5 7 5 5 7 5 7
cookie1 2015-04-12 7 7 7 7 7 7 7
cookie1 2015-04-13 3 7 7 7 4 7 7
cookie1 2015-04-14 2 7 7 7 4 7 7
cookie1 2015-04-15 4 7 7 7 4 7 7
cookie1 2015-04-16 4 7 7 7 4 4 4
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有