前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >滴滴大数据面试SQL-截止目前登陆用户数及登陆用户列表

滴滴大数据面试SQL-截止目前登陆用户数及登陆用户列表

作者头像
数据仓库晨曦
发布2024-06-27 20:40:29
860
发布2024-06-27 20:40:29
举报
文章被收录于专栏:数据仓库技术数据仓库技术

一、题目

已知有用户登录记录表包含登录日期和登录用户ID,请查询出截止到当前日期累积登录用户数及登陆用户列表

样例数据

代码语言:javascript
复制
+-------------+----------+
|  log_date   | user_id  |
+-------------+----------+
| 2024-01-01  | a        |
| 2024-01-02  | a        |
| 2024-01-02  | b        |
| 2024-01-03  | b        |
| 2024-01-04  | c        |
| 2024-01-05  | b        |
| 2024-01-05  | c        |
| 2024-01-05  | d        |
| 2024-01-05  | e        |
+-------------+----------+

期望结果

代码语言:javascript
复制
+-------------+-----------+------------------------+
|  log_date   | user_cnt  |       user_list        |
+-------------+-----------+------------------------+
| 2024-01-01  | 1         | ["a"]                  |
| 2024-01-02  | 2         | ["a","b"]              |
| 2024-01-03  | 2         | ["a","b"]              |
| 2024-01-04  | 3         | ["a","b","c"]          |
| 2024-01-05  | 5         | ["a","b","c","d","e"]  |
+-------------+-----------+------------------------+

二、分析

统计截止到当前行的登录用户数,考察的是聚合函数开窗函数;查询用户列表考察的数据对数据的聚合、数组去重、数组排序等操作。属于深度考察开窗函数、数组操作等知识内容。

维度

评分

题目难度

⭐️⭐️⭐️⭐️

题目清晰度

⭐️⭐️⭐️⭐️⭐

业务常见度

⭐️⭐️⭐️⭐️

三、SQL

1.数据聚合

  1. 通过使用count(distinct )over(order by )的方式,聚合函数开窗,带有排序则统计到当前行的方式 ,完成对截止到当前行的数据统计;
  2. collect_set 也是对数据的聚合,所以也可以使用相同的方式完成截止到当前行的聚合;

执行SQL

代码语言:javascript
复制
select log_date,
       user_id,
       count(user_id) over (order by log_date asc)       as user_cnt,
       collect_list(user_id) over (order by log_date asc) as user_list
from t_user_login

查询结果

代码语言:javascript
复制
+-------------+----------+-----------+----------------------------------------+
|  log_date   | user_id  | user_cnt  |               user_list                |
+-------------+----------+-----------+----------------------------------------+
| 2024-01-01  | a        | 1         | ["a"]                                  |
| 2024-01-02  | b        | 3         | ["a","b","a"]                          |
| 2024-01-02  | a        | 3         | ["a","b","a"]                          |
| 2024-01-03  | b        | 4         | ["a","b","a","b"]                      |
| 2024-01-04  | c        | 5         | ["a","b","a","b","c"]                  |
| 2024-01-05  | e        | 9         | ["a","b","a","b","c","e","d","c","b"]  |
| 2024-01-05  | d        | 9         | ["a","b","a","b","c","e","d","c","b"]  |
| 2024-01-05  | c        | 9         | ["a","b","a","b","c","e","d","c","b"]  |
| 2024-01-05  | b        | 9         | ["a","b","a","b","c","e","d","c","b"]  |
+-------------+----------+-----------+----------------------------------------+

注意

  1. 这里count()函数中特意没有使用去重函数,主要是为了方便大家观察第2行、第3行数据,可以看到user_cnt都是3。这里说明order by 之后 有相同的数据,则取较大值,并且不区分先后顺序;
  2. collect_list()函数进行聚合处理之后的结果,user_list也是一样的,包含到截止相同排序的最后一行数据。
2.数据去重聚合

这里我们加上去重,使用collect_set替换掉collect_list

执行SQL

代码语言:javascript
复制
select log_date,
       user_id,
       count(distinct user_id) over (order by log_date asc) as user_cnt,
       collect_set(user_id) over (order by log_date asc)    as user_list
from t_user_login

查询结果

代码语言:javascript
复制
+-------------+----------+-----------+------------------------+
|  log_date   | user_id  | user_cnt  |       user_list        |
+-------------+----------+-----------+------------------------+
| 2024-01-01  | a        | 1         | ["a"]                  |
| 2024-01-02  | b        | 2         | ["a","b"]              |
| 2024-01-02  | a        | 2         | ["a","b"]              |
| 2024-01-03  | b        | 2         | ["a","b"]              |
| 2024-01-04  | c        | 3         | ["a","b","c"]          |
| 2024-01-05  | e        | 5         | ["a","b","c","e","d"]  |
| 2024-01-05  | d        | 5         | ["a","b","c","e","d"]  |
| 2024-01-05  | c        | 5         | ["a","b","c","e","d"]  |
| 2024-01-05  | b        | 5         | ["a","b","c","e","d"]  |
+-------------+----------+-----------+------------------------+

3.对用户列表排序,然后去重得到最后结果

上面数据观察可知相同日期的结果相同我们只要对用户列表进行排序,然后使用group by 进行去重即可得到最终结果

执行SQL

代码语言:javascript
复制
select log_date,
       user_cnt,
       user_list
from (select log_date,
             user_id,
             count(distinct user_id) over (order by log_date asc)          as user_cnt,
             sort_array(collect_set(user_id) over (order by log_date asc)) as user_list
      from t_user_login) t
group by log_date, user_cnt, user_list

查询结果

代码语言:javascript
复制
+-------------+-----------+------------------------+
|  log_date   | user_cnt  |       user_list        |
+-------------+-----------+------------------------+
| 2024-01-01  | 1         | ["a"]                  |
| 2024-01-02  | 2         | ["a","b"]              |
| 2024-01-03  | 2         | ["a","b"]              |
| 2024-01-04  | 3         | ["a","b","c"]          |
| 2024-01-05  | 5         | ["a","b","c","d","e"]  |
+-------------+-----------+------------------------+

四、建表语句和数据插入

代码语言:javascript
复制
-- 建表语句
CREATE TABLE t_user_login
(
log_date STRING,
user_id STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
-- 数据插入语句
INSERT INTO t_user_login
(log_date, user_id) VALUES
('2024-01-01','a'),
('2024-01-02','a'),
('2024-01-02','b'),
('2024-01-03','b'),
('2024-01-04','c'),
('2024-01-05','b'),
('2024-01-05','c'),
('2024-01-05','d'),
('2024-01-05','e');
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2024-06-26,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 数据仓库技术 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 一、题目
  • 二、分析
  • 三、SQL
    • 1.数据聚合
      • 3.对用户列表排序,然后去重得到最后结果
      • 四、建表语句和数据插入
      相关产品与服务
      大数据
      全栈大数据产品,面向海量数据场景,帮助您 “智理无数,心中有数”!
      领券
      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档