1.解析json
使用函数 json_tuple,函数参数用法:
json_tuple(json_object,'key1','key2','keyN')
函数使用方法:
select T1.key1,T1.key2,jf_key1,jf_key2 from table1 T1
lateral view json_tuple(json_field_name,'key1','key2') jf as jf_key1,jf_key2
2.多个表按key聚合
例如:
A表:设备的rom升级版本
device_id,time,rom_version
B表:设备的app升级版本
device_id,time,app_version
C表:统计设备今日的rom版本,app版本
device_id,rom_version,app_version
HQL:
SELECT IF(A.device_id is not null,A.device_id,B.device_id) device_id,
IF(A.device_id is not null,A.rom_version,'') rom_version,
IF(B.device_id is not null,B.app_version,'') app_version
FROM (
SELECT device_id,rom_version
FROM device_rom_upgrade
where ds=20180108
) A
full outer join
(
SELECT device_id,app_version
FROM device_app_upgrade
where ds=20180108
)B on (A.device_id=B.device_id)
3.取最近的一条升级记录
SELECT A.device_id,A.rom_version,A.time,A.row_num
FROM
(
SELECT device_rom_upgrade .*,row_number() over(partition by device_id order by time desc) row_num
FROM device_rom_upgrade
)A where A.row_num=1
4.多个select结果join
例如:统计
select A.field1,A.field2,B.field1,B.field2
from
(
select field1,field2
from table1
where condition
)A
left join
(
select field1,field2
from table2
where condition
)B
on (A.field1=B.field1)
注意事项:hive小于2.2.0时, on 里面的条件只能是=或<>,不能是>或<
参见:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
5. hive中解析json数组
例如表A的param字段为json格式的字符串,其json格式如下
{
"arry_key":
[
{
"arr_obj_key1":"arry_obj_value1",
"arr_obj_key2":"arry_obj_value2"
},
{
}
],
"key1":"value1",
"key2":"value2"
}
解析方法:
1)先用json_tuple取出param
2)使用正则表达式抠出数组中的对象[{},{}],用split把对象分成数组;
3)用explode打成多行
4)最后再通过json_tuple取出数组对象里面的值
select A.key1,A.key2,array_inner_obj.arr_obj_key1,array_inner_obj.arr_obj_key2,p.key1,p.key2 from A lateral view json_tuple(params,'arry_key','key1','key2') p as p_arry,key1,key2 lateral view posexplode(split(regexp_replace(regexp_replace(p.p_arry,'\\\\}\\\\,\\\\{','\\\\}\\\\|\\\\|\\\\{'),'\\\\[|\\\\]',''), '\\\\|\\\\|')) p_arry as p_array_index, array_element lateral view json_tuple(p_arry.array_element,'arr_obj_key1','arr_obj_key2') array_inner_obj as arr_obj_key1,arr_obj_key2 where A.ds=20180110
参考http://blog.csdn.net/lfq1532632051/article/details/63262519
注意事项:
1) hive中使用\转义时要使用4个\,例如"\N" ,需要输入"\\\\N"
2) insert overwrite会先清空原来的数据,再插入新数据。