现有骑手id,订单id列表,订单配送距离列表,配送费列表,其中订单id、配送距离、配送费一一对应。
+-----------+---------------------------+----------------------------+-----------------------------+
| rider_id | order_list | distance_list | payment_list |
+-----------+---------------------------+----------------------------+-----------------------------+
| r001 | 0001,0005,0008 | 8.05,2.32,4.35 | 7.50,5.00,15.00 |
| r002 | 0002,0004,0006,0009,0010 | 3.01,10.98,0.78,5.05,6.05 | 13.00,15.00,5.00,9.50,7.00 |
| r003 | 0003,0007 | 4.12,8.11 | 3.50,8.00 |
| r004 | NULL | NULL | NULL |
+-----------+---------------------------+----------------------------+-----------------------------+
原始数据中order_list中的数据,与distance_list、payment_list内的数据,一一对应,请将数据拆解出rider_id、order_id,distance,payment,其中distance和payment为对应订单id的距离和配送费。
期望结果
+-----------+-----------+-----------+----------+
| rider_id | order_id | distance | payment |
+-----------+-----------+-----------+----------+
| r001 | 0001 | 8.05 | 7.50 |
| r001 | 0005 | 2.32 | 5.00 |
| r001 | 0008 | 4.35 | 15.00 |
| r002 | 0002 | 3.01 | 13.00 |
| r002 | 0004 | 10.98 | 15.00 |
| r002 | 0006 | 0.78 | 5.00 |
| r002 | 0009 | 5.05 | 9.50 |
| r002 | 0010 | 6.05 | 7.00 |
| r003 | 0003 | 4.12 | 3.50 |
| r003 | 0007 | 8.11 | 8.00 |
+-----------+-----------+-----------+----------+
我们通过posexplode对order_list 进行炸裂,查看带位置的数据
执行SQL
select rider_id, t2.pos, t2.order_id
from t2_delivery_orders t1
lateral view posexplode(split(order_list, ',')) t2 as pos, order_id
SQL结果
+-----------+------+-----------+
| rider_id | pos | order_id |
+-----------+------+-----------+
| r001 | 0 | 0001 |
| r001 | 1 | 0005 |
| r001 | 2 | 0008 |
| r002 | 0 | 0002 |
| r002 | 1 | 0004 |
| r002 | 2 | 0006 |
| r002 | 3 | 0009 |
| r002 | 4 | 0010 |
| r003 | 0 | 0003 |
| r003 | 1 | 0007 |
+-----------+------+-----------+
上面可以看到,pos列是orderid中每个订单对应的数组下标。
使用posexplode同时对order_list 和 distance_list 进行炸裂处理
执行SQL
select rider_id, t2.pos, t2.order_id, t3.pos as t3_pos, t3.distance
from t2_delivery_orders t1
lateral view posexplode(split(order_list, ',')) t2 as pos, order_id
lateral view posexplode(split(distance_list, ',')) t3 as pos, distance
SQL结果
+-----------+------+-----------+---------+-----------+
| rider_id | pos | order_id | t3_pos | distance |
+-----------+------+-----------+---------+-----------+
| r001 | 0 | 0001 | 0 | 8.05 |
| r001 | 0 | 0001 | 1 | 2.32 |
| r001 | 0 | 0001 | 2 | 4.35 |
| r001 | 1 | 0005 | 0 | 8.05 |
| r001 | 1 | 0005 | 1 | 2.32 |
| r001 | 1 | 0005 | 2 | 4.35 |
| r001 | 2 | 0008 | 0 | 8.05 |
| r001 | 2 | 0008 | 1 | 2.32 |
| r001 | 2 | 0008 | 2 | 4.35 |
| r002 | 0 | 0002 | 0 | 3.01 |
| r002 | 0 | 0002 | 1 | 10.98 |
| r002 | 0 | 0002 | 2 | 0.78 |
| r002 | 0 | 0002 | 3 | 5.05 |
| r002 | 0 | 0002 | 4 | 6.05 |
| r002 | 1 | 0004 | 0 | 3.01 |
| r002 | 1 | 0004 | 1 | 10.98 |
| r002 | 1 | 0004 | 2 | 0.78 |
| r002 | 1 | 0004 | 3 | 5.05 |
| r002 | 1 | 0004 | 4 | 6.05 |
| r002 | 2 | 0006 | 0 | 3.01 |
| r002 | 2 | 0006 | 1 | 10.98 |
| r002 | 2 | 0006 | 2 | 0.78 |
| r002 | 2 | 0006 | 3 | 5.05 |
| r002 | 2 | 0006 | 4 | 6.05 |
| r002 | 3 | 0009 | 0 | 3.01 |
| r002 | 3 | 0009 | 1 | 10.98 |
| r002 | 3 | 0009 | 2 | 0.78 |
| r002 | 3 | 0009 | 3 | 5.05 |
| r002 | 3 | 0009 | 4 | 6.05 |
| r002 | 4 | 0010 | 0 | 3.01 |
| r002 | 4 | 0010 | 1 | 10.98 |
| r002 | 4 | 0010 | 2 | 0.78 |
| r002 | 4 | 0010 | 3 | 5.05 |
| r002 | 4 | 0010 | 4 | 6.05 |
| r003 | 0 | 0003 | 0 | 4.12 |
| r003 | 0 | 0003 | 1 | 8.11 |
| r003 | 1 | 0007 | 0 | 4.12 |
| r003 | 1 | 0007 | 1 | 8.11 |
+-----------+------+-----------+---------+-----------+
可以看到结果中,两列均炸开了,但是炸开的结果order_list和distance_list中的元素数据进行了笛卡尔积。我们想要一一对应,添加where条件限制两个pos相等
执行SQL
select rider_id, t2.pos, t2.order_id, t3.pos as t3_pos, t3.distance
from t2_delivery_orders t1
lateral view posexplode(split(order_list, ',')) t2 as pos, order_id
lateral view posexplode(split(distance_list, ',')) t3 as pos, distance
where t2.pos = t3.pos
SQL结果
+-----------+------+-----------+---------+-----------+
| rider_id | pos | order_id | t3_pos | distance |
+-----------+------+-----------+---------+-----------+
| r001 | 0 | 0001 | 0 | 8.05 |
| r001 | 1 | 0005 | 1 | 2.32 |
| r001 | 2 | 0008 | 2 | 4.35 |
| r002 | 0 | 0002 | 0 | 3.01 |
| r002 | 1 | 0004 | 1 | 10.98 |
| r002 | 2 | 0006 | 2 | 0.78 |
| r002 | 3 | 0009 | 3 | 5.05 |
| r002 | 4 | 0010 | 4 | 6.05 |
| r003 | 0 | 0003 | 0 | 4.12 |
| r003 | 1 | 0007 | 1 | 8.11 |
+-----------+------+-----------+---------+-----------+
可以看到这个是符合我们预期的了。
增加对payment_list的处理,select 去掉pos相关列,得到最终结果
执行SQL
select rider_id, order_id, t3.distance, t4.payment
from t2_delivery_orders t1
lateral view posexplode(split(order_list, ',')) t2 as pos, order_id
lateral view posexplode(split(distance_list, ',')) t3 as pos, distance
lateral view posexplode(split(payment_list, ',')) t4 as pos, payment
where t2.pos = t3.pos
and t2.pos = t4.pos
SQL结果
+-----------+-----------+-----------+----------+
| rider_id | order_id | distance | payment |
+-----------+-----------+-----------+----------+
| r001 | 0001 | 8.05 | 7.50 |
| r001 | 0005 | 2.32 | 5.00 |
| r001 | 0008 | 4.35 | 15.00 |
| r002 | 0002 | 3.01 | 13.00 |
| r002 | 0004 | 10.98 | 15.00 |
| r002 | 0006 | 0.78 | 5.00 |
| r002 | 0009 | 5.05 | 9.50 |
| r002 | 0010 | 6.05 | 7.00 |
| r003 | 0003 | 4.12 | 3.50 |
| r003 | 0007 | 8.11 | 8.00 |
+-----------+-----------+-----------+----------+
--建表语句
CREATE TABLE IF NOT EXISTS t2_delivery_orders
(
rider_id string, -- 骑手ID
order_list string, -- 订单id列表
distance_list STRING, --订单距离列表
payment_list STRING --配送费列表
)
COMMENT '骑手配送订单表';
--插入数据
INSERT INTO t2_delivery_orders VALUES
('r001', '0001,0005,0008', '8.05,2.32,4.35', '7.50,5.00,15.00'),
('r002', '0002,0004,0006,0009,0010', '3.01,10.98,0.78,5.05,6.05', '13.00,15.00,5.00,9.50,7.00'),
('r003', '0003,0007', '4.12,8.11', '3.50,8.00'),
('r004', null, null, null);