我有一个包含3种不同状态的表
1. CLICKED
2. CLAIMED
3. BOUGHT按照这个特定的顺序。我正在尝试根据记录的日期确定是否有未按正确顺序出现的记录。
例如,以下是数据:
Record 121144 has correct order status, this is good.
Record 121200 is incorrect since bought happens before clicked even if clicked and claimed follow the right order.
Record 121122 is incorrect, since CLICKED status comes after CLAIMED.
Record 121111 also has correct order status (even if they are the same).
Record 121198 is also correct since the status order follows, even if there is no BOUGHT.CREATE TABLE TBL_A
(
number_id int,
country varchar(50),
status varchar(50),
datetime date
);
INSERT INTO TBL_A
VALUES (121144, 'USA', 'CLICKED', '2021-10-09'),
(121144, 'USA', 'CLAIMED', '2021-10-10'),
(121144, 'USA', 'BOUGHT', '2021-10-11'),
(121111, 'CAD', 'CLICKED', '2021-10-12'),
(121111, 'CAD', 'CLAIMED', '2021-10-12'),
(121111, 'CAD', 'BOUGHT', '2021-10-12'),
(121122, 'PES', 'CLICKED', '2021-09-11'),
(121122, 'PES', 'CLAIMED', '2021-09-09'),
(121122, 'PES', 'BOUGHT', '2021-09-12'),
(121198, 'AU', 'CLICKED', '2021-09-11'),
(121198, 'AU', 'CLAIMED', '2021-09-12'),
(121200, 'POR', 'CLICKED', '2021-09-10'),
(121200, 'POR', 'CLAIMED', '2021-09-11'),
(121200, 'POR', 'BOUGHT', '2021-09-08');发布于 2021-11-12 18:55:05
我的答案包括OP在评论中提到的跳过步骤的可能性。这种方法不是使用严格匹配的序列,而是查找前一步编号较高的相邻对:
with A as (
select *,
case status
when 'CLICKED' then 1
when 'CLAIMED' then 2
when 'BOUGHT' then 3 end as desired_order
from T
), B as (
select *,
row_number() over (
partition by number_id
order by datetime, desired_order) as rn -- handles date ties
from A
), C as (
select *,
-- look for pairs of rows where one is reversed
case when lag(desired_order) over (partition by number_id order by rn) >
desired_order then 'Y' end as flag
from B
)
select number_id, min(country) as country,
case min(flag) when 'Y' then 'Out of order' else 'In order' end as "status"
from C
group by number_id;https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=f0ee1de8e8e81229ddc23acc97bce7d7
正如Thorston指出的那样,您还可以采取生成一对行号,然后将两者进行比较以查找不匹配的方法。浏览一下查询计划,这可能会涉及额外的排序操作,因此值得在数据上尝试这两种方法。
...
), B as (
select *,
row_number() over (
partition by number_id
order by desired_order) as rn1,
row_number() over (
partition by number_id
order by datetime, desired_order) as rn2
from A
)
select
number_id, min(country) as country,
case when max(case when rn1 <> rn2 then 1 else 0 end) = 1
then 'Out of order' else 'In order' end as status
...发布于 2021-11-12 17:52:11
这里是使用一些字符串聚合和操作的一种方法。对于样本数据,这与预期的一样工作,并且还考虑了包括跳过状态、缺失状态和单一状态的边缘情况。
with cte as
(select *,listagg(status,'>') within group (order by datetime,charindex(status,'CLICKED>CLAIMED>BOUGHT')) over (partition by number_id, country) as event_order
from t)
select distinct
number_id,
country,
case when charindex(event_order,'CLICKED>CLAIMED>BOUGHT,CLICKED>BOUGHT')>0 then 'Ordered' else 'Unordered' end as order_flag
from cte
order by number_id;发布于 2021-11-13 09:09:11
使用按日期时间排序的ARRAY_AGG:
SELECT number_id,
ARRAY_AGG(status) WITHIN GROUP(ORDER BY datetime) AS statuses, -- debug
CASE WHEN ARRAY_AGG(status) WITHIN GROUP(ORDER BY datetime)
IN (ARRAY_CONSTRUCT('CLICKED', 'CLAIMED', 'BOUGHT'),
ARRAY_CONSTRUCT('CLICKED', 'CLAIMED')) THEN 'In order'
ELSE 'Out of order'
END AS status
FROM TBL_A
GROUP BY number_id;输出:

https://stackoverflow.com/questions/69946796
复制相似问题