我有一个由日志和时间戳组成的表,例如:
timestmp log_error
1507031197631 Er7
1507031197621 Er8
1507031197409 Er9
1506888444602 Er10
1506880074401 Er10
1506880047684 Er10
1506880030996 Er10
1506879980929 Er10
1506879977580 Er10
1506879974250 Er10
1506879970901 Er10
1506879964241 Er10
1506879954212 Er10
1506879900817 Er10我想编写一个SQL查询,它在时间戳的某个时间间隔(5分钟)内忽略相同的连续错误(在本例中是Er10)。我怎样才能做到这一点?使用自内连接?我想要的结果是:
timestmp log_error
1507031197631 Er7
1507031197621 Er8
1507031197409 Er9
1506888444602 Er10 /* The last one from this example, based on the difference in timestmp */
1506879900817 Er10 /* The first Er10 registry */发布于 2017-10-04 10:43:57
您可以使用lag()、累积和group by来完成这一任务。
select log_error, min(timestamp), max(timestamp)
from (select l.*,
sum(case when prev_le = log_error and
prev_timestamp > timestamp - "5 minutes"
then 0 else 1
end) over (order by timestamp) as grp
from (select l.*,
lag(log_error) over (order by timestmp) as prev_le,
lag(timestmp) over (order by timestmp) as prev_timestmp
from logs l
) l
) l
group by grp, log_error;注意:无论逻辑是什么,- "5 minutes"都是用于这个目的的。这可能是5 * 60或5 * 60 * 1000。
发布于 2017-10-04 10:47:47
可以使用row_number生成连续log_error值的组。这种方法被称为“塔比妥聚糖法”。
select log_error, min(timestmp), max(timestmp)
from (
select t.*,
row_number() over (order by timestmp)
- row_number() over (partition by log_error order by timestmp) as grp
from your_table t
) t
group by log_error, grp;我承认,结果格式并不完全是您想要的,但它有您需要的信息。
https://stackoverflow.com/questions/46562775
复制相似问题