我有一张有以下数据的表格:
dt device id count
2018-10-05 computer 7541185957382 6
2018-10-20 computer 7541185957382 3
2018-10-14 computer 7553187775734 6
2018-10-17 computer 7553187775734 10
2018-10-21 computer 7553187775734 2
2018-10-22 computer 7549187067178 5
2018-10-20 computer 7553187757256 3
2018-10-11 computer 7549187067178 10
我想得到每个dt
的最后一个和第一个id
。因此,我使用了窗口函数first_value和last_value如下:
select id,last_value(dt) over (partition by id order by dt) last_dt
from table
order by id
;
但是我发现了一个错误:
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: Primitve type DATE not supported in Value Boundary expression
我无法诊断出这个问题,我希望得到任何帮助。
发布于 2018-10-25 01:02:37
如果在查询中在子句之间添加行,则查询将正常工作。
hive> select id,last_value(dt) over (partition by id order by dt
rows between unbounded preceding and unbounded following) last_dt
from table order by id;
结果:
+----------------+-------------+--+
| id | last_dt |
+----------------+-------------+--+
| 7541185957382 | 2018-10-20 |
| 7541185957382 | 2018-10-20 |
| 7549187067178 | 2018-10-22 |
| 7549187067178 | 2018-10-22 |
| 7553187757256 | 2018-10-20 |
| 7553187775734 | 2018-10-21 |
| 7553187775734 | 2018-10-21 |
| 7553187775734 | 2018-10-21 |
+----------------+-------------+--+
有关于原语类型支持的Jira,并在Hive.2.1.0中得到了修正。
更新:
对于不同的记录,您可以使用ROW_NUMBER窗口函数,并且只从结果集中筛选出first row
。
hive> select id,last_dt from
(select id,last_value(dt) over (partition by id order by dt
rows between unbounded preceding and unbounded following) last_dt,
ROW_NUMBER() over (partition by id order by dt)rn
from so )t
where t.rn=1;
结果:
+----------------+-------------+--+
| id | dt |
+----------------+-------------+--+
| 7541185957382 | 2018-10-20 |
| 7553187757256 | 2018-10-20 |
| 7553187775734 | 2018-10-21 |
| 7549187067178 | 2018-10-22 |
+----------------+-------------+--+
https://stackoverflow.com/questions/52979610
复制相似问题