前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >常见大数据面试SQL-查询每个产品每年总销售额

常见大数据面试SQL-查询每个产品每年总销售额

作者头像
数据仓库晨曦
发布2024-06-28 18:10:03
710
发布2024-06-28 18:10:03
举报
文章被收录于专栏:数据仓库技术数据仓库技术

一、题目

已知有表如下,记录了每个产品id、产品名称、产品销售开始日期、产品销售结束日期以及产品日均销售金额,请计算出每个产品每年的销售金额

样例数据

代码语言:javascript
复制
+-------------+---------------+----------------------+----------------------+----------------------+
| product_id  | product_name  |     period_start     |      period_end      | average_daily_sales  |
+-------------+---------------+----------------------+----------------------+----------------------+
| 1           | LC Phone      | 2019-01-25 00:00:00  | 2019-02-28 00:00:00  | 100                  |
| 2           | LC T-Shirt    | 2018-12-01 00:00:00  | 2020-01-01 00:00:00  | 10                   |
| 3           | LC Keychain   | 2019-12-01 00:00:00  | 2020-01-31 00:00:00  | 1                    |
+-------------+---------------+----------------------+----------------------+----------------------+

期望结果

代码语言:javascript
复制
+------------+--------------+-------------+--------------+
| product_id | product_name | report_year | total_amount |
+------------+--------------+-------------+--------------+
| 1          | LC Phone     |    2019     | 3500         |
| 2          | LC T-Shirt   |    2018     | 310          |
| 2          | LC T-Shirt   |    2019     | 3650         |
| 2          | LC T-Shirt   |    2020     | 10           |
| 3          | LC Keychain  |    2019     | 31           |
| 3          | LC Keychain  |    2020     | 31           |
+------------+--------------+-------------+--------------+

说明

  • LC Phone 在 2019-01-25 至 2019-02-28 期间销售,该产品销售时间总计35天。销售总额 35*100 = 3500。
  • LC T-shirt 在 2018-12-01 至 2020-01-01 期间销售,该产品在2018年、2019年、2020年的销售时间分别是31天、365天、1天,2018年、2019年、2020年的销售总额分别是3110=310、36510=3650、1*10=10。
  • LC Keychain 在 2019-12-01 至 2020-01-31 期间销售,该产品在2019年、2020年的销售时间分别是:31天、31天,2019年、2020年的销售总额分别是31*1=31、31*1=31。

二、分析

题目中给出的是每个产品的开始时间和结束时间,这里最关键的问题在于一个时间段可能跨年,并且不知道跨几年。

维度

评分

题目难度

⭐️⭐️⭐️⭐️

题目清晰度

⭐️⭐️⭐️⭐️⭐️

业务常见度

⭐️⭐️⭐️⭐️

三、SQL

1.构建年份维表

首先我们生成一份年份的维表,包含年份、每年开始日期和每年结束日期,这里有多重方式,由于题目中只有三年,咱们直接union all 拼一个出来.

执行SQL

代码语言:javascript
复制
with dim_year as (select '2018' as year, '2018-01-01' as year_first_day, '2018-12-31' as year_end_day
                  union all
                  select '2019' as year, '2019-01-01' as year_first_day, '2019-12-31' as year_end_day
                  union all
                  select '2020' as year, '2020-01-01' as year_first_day, '2020-12-31' as year_end_day)
select year,year_first_day,year_end_day from dim_year

查询结果

代码语言:javascript
复制
+-------+-----------------+---------------+
| year  | year_first_day  | year_end_day  |
+-------+-----------------+---------------+
| 2018  | 2018-01-01      | 2018-12-31    |
| 2019  | 2019-01-01      | 2019-12-31    |
| 2020  | 2020-01-01      | 2020-12-31    |
+-------+-----------------+---------------+

2.维表与原始数据进行关联

把原始数据与年份维表进行笛卡尔积,得到每年与原始数据的一个交叉值.

执行SQL

代码语言:javascript
复制
set hive.strict.checks.cartesian.product = false;
with dim_year as (select '2018' as year, '2018-01-01' as year_first_day, '2018-12-31' as year_end_day
                  union all
                  select '2019' as year, '2019-01-01' as year_first_day, '2019-12-31' as year_end_day
                  union all
                  select '2020' as year, '2020-01-01' as year_first_day, '2020-12-31' as year_end_day)
select
    product_id,
    product_name,
    period_start,
    period_end,
    average_daily_sales,
    year,
    year_first_day,
    year_end_day
from t_product_sales
left join dim_year

查询结果

代码语言:javascript
复制
+-------------+---------------+----------------------+----------------------+----------------------+-------+-----------------+---------------+
| product_id  | product_name  |     period_start     |      period_end      | average_daily_sales  | year  | year_first_day  | year_end_day  |
+-------------+---------------+----------------------+----------------------+----------------------+-------+-----------------+---------------+
| 1           | LC Phone      | 2019-01-25 00:00:00  | 2019-02-28 00:00:00  | 100                  | 2019  | 2019-01-01      | 2019-12-31    |
| 1           | LC Phone      | 2019-01-25 00:00:00  | 2019-02-28 00:00:00  | 100                  | 2020  | 2020-01-01      | 2020-12-31    |
| 1           | LC Phone      | 2019-01-25 00:00:00  | 2019-02-28 00:00:00  | 100                  | 2018  | 2018-01-01      | 2018-12-31    |
| 2           | LC T-Shirt    | 2018-12-01 00:00:00  | 2020-01-01 00:00:00  | 10                   | 2019  | 2019-01-01      | 2019-12-31    |
| 2           | LC T-Shirt    | 2018-12-01 00:00:00  | 2020-01-01 00:00:00  | 10                   | 2020  | 2020-01-01      | 2020-12-31    |
| 2           | LC T-Shirt    | 2018-12-01 00:00:00  | 2020-01-01 00:00:00  | 10                   | 2018  | 2018-01-01      | 2018-12-31    |
| 3           | LC Keychain   | 2019-12-01 00:00:00  | 2020-01-31 00:00:00  | 1                    | 2019  | 2019-01-01      | 2019-12-31    |
| 3           | LC Keychain   | 2019-12-01 00:00:00  | 2020-01-31 00:00:00  | 1                    | 2020  | 2020-01-01      | 2020-12-31    |
| 3           | LC Keychain   | 2019-12-01 00:00:00  | 2020-01-31 00:00:00  | 1                    | 2018  | 2018-01-01      | 2018-12-31    |
+-------------+---------------+----------------------+----------------------+----------------------+-------+-----------------+---------------+

3.比较计算每年每个产品在售天数

我们先观察2中的结果,可以看到原始记录中每行数据都与所有年都有一行记录。我们从销售日期和每年开始日期中取较大日期 得到一个开始时间,然后从销售截止日期和每年的结束日期取较小日期 得到一个结束日期,然后用结束日期减去开始日期。

执行SQL

代码语言:javascript
复制
with dim_year as (select '2018' as year, '2018-01-01' as year_first_day, '2018-12-31' as year_end_day
                  union all
                  select '2019' as year, '2019-01-01' as year_first_day, '2019-12-31' as year_end_day
                  union all
                  select '2020' as year, '2020-01-01' as year_first_day, '2020-12-31' as year_end_day),
     tmp as (select product_id,
                    product_name,
                    period_start,
                    period_end,
                    average_daily_sales,
                    year,
                    year_first_day,
                    year_end_day,
                    datediff(
                            if(to_date(period_end) > to_date(year_end_day), to_date(year_end_day), to_date(period_end)),
                            if(to_date(period_start) > to_date(year_first_day), to_date(period_start),
                               to_date(year_first_day))) as date_diff
             from t_product_sales
                      left join dim_year)
select product_id,
       product_name,
       year,
       year_first_day,
       year_end_day,
       (date_diff + 1) * average_daily_sales
from tmp
where date_diff > 0

查询结果

代码语言:javascript
复制
+-------------+---------------+----------------------+----------------------+----------------------+-------+-----------------+---------------+------------+
| product_id  | product_name  |     period_start     |      period_end      | average_daily_sales  | year  | year_first_day  | year_end_day  | date_diff  |
+-------------+---------------+----------------------+----------------------+----------------------+-------+-----------------+---------------+------------+
| 1           | LC Phone      | 2019-01-25 00:00:00  | 2019-02-28 00:00:00  | 100                  | 2019  | 2019-01-01      | 2019-12-31    | 34         |
| 1           | LC Phone      | 2019-01-25 00:00:00  | 2019-02-28 00:00:00  | 100                  | 2020  | 2020-01-01      | 2020-12-31    | -307       |
| 1           | LC Phone      | 2019-01-25 00:00:00  | 2019-02-28 00:00:00  | 100                  | 2018  | 2018-01-01      | 2018-12-31    | -25        |
| 2           | LC T-Shirt    | 2018-12-01 00:00:00  | 2020-01-01 00:00:00  | 10                   | 2019  | 2019-01-01      | 2019-12-31    | 364        |
| 2           | LC T-Shirt    | 2018-12-01 00:00:00  | 2020-01-01 00:00:00  | 10                   | 2020  | 2020-01-01      | 2020-12-31    | 0          |
| 2           | LC T-Shirt    | 2018-12-01 00:00:00  | 2020-01-01 00:00:00  | 10                   | 2018  | 2018-01-01      | 2018-12-31    | 30         |
| 3           | LC Keychain   | 2019-12-01 00:00:00  | 2020-01-31 00:00:00  | 1                    | 2019  | 2019-01-01      | 2019-12-31    | 30         |
| 3           | LC Keychain   | 2019-12-01 00:00:00  | 2020-01-31 00:00:00  | 1                    | 2020  | 2020-01-01      | 2020-12-31    | 30         |
| 3           | LC Keychain   | 2019-12-01 00:00:00  | 2020-01-31 00:00:00  | 1                    | 2018  | 2018-01-01      | 2018-12-31    | -335       |
+-------------+---------------+----------------------+----------------------+----------------------+-------+-----------------+---------------+------------+

查看上面结果,就可以发现,只有在当年有销售时间的数据date_diff >=0,如果在当年没有销售时间,则date_diff为负。但是这个date_diff 的值比预期小1,这是因为我们算了日期差,所以我们在结果上+1即可。

4.筛选符合条件数据,计算最终结果

得到有销售的年份和天数之后,乘以每天的销售和即可得到最终结果。

执行SQL

代码语言:javascript
复制
with dim_year as (select '2018' as year, '2018-01-01' as year_first_day, '2018-12-31' as year_end_day
                  union all
                  select '2019' as year, '2019-01-01' as year_first_day, '2019-12-31' as year_end_day
                  union all
                  select '2020' as year, '2020-01-01' as year_first_day, '2020-12-31' as year_end_day),
     tmp as (select product_id
                  , product_name
                  , period_start
                  , period_end
                  , average_daily_sales
                  , year
                  , year_first_day
                  , year_end_day
                  , datediff(if(to_date(period_end)
                                    > to_date(year_end_day)
                                 , to_date(year_end_day)
                                 , to_date(period_end))
             , if(to_date(period_start)
                      > to_date(year_first_day)
                                 , to_date(period_start)
                                 , to_date(year_first_day))) as date_diff
             from t_product_sales
                      left join dim_year)
select product_id,
       product_name,
       year,
       (date_diff + 1) * average_daily_sales as total_amount
from tmp
where date_diff >= 0

查询结果

代码语言:javascript
复制
+-------------+---------------+-------+---------------+
| product_id  | product_name  | year  | total_amount  |
+-------------+---------------+-------+---------------+
| 1           | LC Phone      | 2019  | 3500          |
| 2           | LC T-Shirt    | 2018  | 310           |
| 2           | LC T-Shirt    | 2019  | 3650          |
| 2           | LC T-Shirt    | 2020  | 10            |
| 3           | LC Keychain   | 2019  | 31            |
| 3           | LC Keychain   | 2020  | 31            |
+-------------+---------------+-------+---------------+

四、建表语句和数据插入

代码语言:javascript
复制
--建表语句
create table if not exists t_product_sales
(
    product_id          bigint,
    product_name        string,
    period_start        string,
    period_end          string,
    average_daily_sales bigint
)
    ROW FORMAT DELIMITED
        FIELDS TERMINATED BY ','
    STORED AS orc;

--插入数据

insert into t_product_sales(product_id, product_name, period_start, period_end, average_daily_sales)
values (1, 'LC Phone', '2019-01-25 00:00:00', '2019-02-28 00:00:00', 100),
       (2, 'LC T-Shirt', '2018-12-01 00:00:00', '2020-01-01 00:00:00', 10),
       (3, 'LC Keychain', '2019-12-01 00:00:00', '2020-01-31 00:00:00', 1);
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2024-06-27,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 数据仓库技术 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 一、题目
  • 二、分析
  • 三、SQL
    • 1.构建年份维表
      • 2.维表与原始数据进行关联
        • 3.比较计算每年每个产品在售天数
          • 4.筛选符合条件数据,计算最终结果
          • 四、建表语句和数据插入
          相关产品与服务
          大数据
          全栈大数据产品,面向海量数据场景,帮助您 “智理无数,心中有数”!
          领券
          问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档