温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。
在前面的博文里,我已经介绍了
Hive的数据定义语言(DDL)无非就是:对数据库和数据表的定义操作。
若我们不创建数据库,则默认就是在default数据库里呗,这个不难。
但是,这个不在/user/hive/warehouse里下。
注意:Hive的操作与传统关系型数据库SQL操作十分类似。
DDL数据定义语句
包括CREATE、ALTER、SHOW、DESCRIBE、DROP等。
详细点,就是
Hive支持大量SQL数据定义语言(Data Definition Language,DDL)中的现有功能,包括以下各种语句:
CREATE DATABASE/SCHEMA,TABLE,VIEW,FUNCTION,INDEX
DROP DATABASE/SCHEMA,TABLE,VIEW,INDEX
TRUNCATE TABLE
ALTER DATABASE/SCHEMA,TABLE,VIEW
MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
SHOW DATABASES/SCHEMAS,TABLES,TBLPROPERTIES,PARTITIONS,FUNCTIONS,INDEX[ES]
DESCRIBE DATABASE,table_name,view_name
具体,见
Hive的数据类型和DDL见(官网)https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
Hive的数据定义操作(DDL)之数据库相关操作
1、 创建数据库
create database if not exists dealer_db;
2、 查看数据库定义
describe database dealer_db;
3、 查看数据库列表
show databases;
4、 删除数据库
drop database if exists testdb cascade;
5、 切换当前数据库
use dealer_db;
或者
1、创建数据库(create)
CREATEDATABASE[IFNOTEXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITHDBPROPERTIES (property_name=property_value, ...)];
默认地址:/user/hive/warehouse/db_name.db/table_name/partition_name/…
2、使用数据库(use)
USE database_name;
3、删除数据库(drop)
DROP DATABASE [IF EXISTS] database_name [RESTRICTCASCADE];
4、修改数据库(alter)
4.1 修改数据库属性
ALTER DATABASE database_name SET DBPROPERTIES (property_name=property_value, …);
4.2 修改数据库所有者
ALTER DATABASE database_name SET OWNER [USERROLE] user_or_role;
Hive的数据定义操作(DDL)之数据表相关操作
1、 创建普通表
create table if not exists dsjtgdk
(id int,
name string
)row format delimited fields terminated by '\t'
stored as textfile;
2、 创建表的其他方法
(1) create table as select ...
(2) create table like tablename1;
3、 创建分区表
create table dealer_action_log
(companyId INT comment '公司 ID',
userid INT comment '销售 ID',
absolutepath STRING comment '绝对路径',
timestamp STRING comment '访问时间戳'
)partitioned by (dt string)
row format delimited fields terminated by ','
stored as textfile;
4、 创建桶表
create table dealer_leads
(leads_id string,
dealer_id string,
create_time string
)clustered by (dealer_id) sorted by(leads_id)
into 10 buckets
row format delimited fields terminated by '\t'
stored as textfile;
分桶表读写过程(非常重要)
5、 查看有哪些表
show tables;
show tables '*info'; --可以用正则表达式筛选
要列出的表
6、 查看表定义信息
describe dealerinfo;
7、 查看表详细信息
describe formatted dealerinfo;
8、 修改表
(1) 修改表名
alter table dealerinfo rename to dealer_info;
(2) 添加字段
alter table dealer_info add columns
(provinceid int );
9、 删除表
drop table if exists dealer_info;
或者
1、创建数据表
# 手动建表CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name-- (Note: TEMPORARY availableinHive0.14.and later) [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASCDESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...)-- (Note: AvailableinHive0.10.and later)] ON ((col_value, col_value, ...), (col_value, col_value, ...), ...) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format] [STORED AS file_format]STORED BY'storage.handler.class.name'[WITH SERDEPROPERTIES (...)] -- (Note: AvailableinHive0.6.and later) ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] -- (Note: AvailableinHive0.6.and later) [AS select_statement];-- (Note: AvailableinHive0.5.and later; not supportedforexternal tables)# 复制表结构CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path];数据类型data_type : primitive_typearray_typemap_typestruct_typeunion_type -- (Note: AvailableinHive0.7.and later)primitive_type : TINYINTSMALLINTINTBIGINTBOOLEANFLOATDOUBLESTRINGBINARY -- (Note: AvailableinHive0.8.and later)TIMESTAMP -- (Note: AvailableinHive0.8.and later)DECIMAL -- (Note: AvailableinHive0.11.and later)DECIMAL(precision, scale) -- (Note: AvailableinHive0.13.and later)DATE -- (Note: AvailableinHive0.12.and later)VARCHAR -- (Note: AvailableinHive0.12.and later)CHAR -- (Note: AvailableinHive0.13.and later)array_type : ARRAY< data_type >map_type : MAP< primitive_type, data_type >struct_type : STRUCT< col_name : data_type [COMMENT col_comment], ...>union_type : UNIONTYPE< data_type, data_type, ... > -- (Note: AvailableinHive0.7.and later)# 行列分隔符row_format : DELIMITED [FIELDS TERMINATED BYchar[ESCAPED BYchar]] [COLLECTION ITEMS TERMINATED BYchar] [MAP KEYS TERMINATED BYchar] [LINES TERMINATED BYchar] [NULL DEFINED ASchar] -- (Note: AvailableinHive0.13and later)SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]# 文件存储格式file_format: : SEQUENCEFILETEXTFILE -- (Default, depending on hive.default.fileformat configuration)RCFILE -- (Note: AvailableinHive0.6.and later)ORC -- (Note: AvailableinHive0.11.and later)PARQUET -- (Note: AvailableinHive0.13.and later)AVRO -- (Note: AvailableinHive0.14.and later)INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
2、行格式,文件存储格式,SerDe
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\001’ STORED AS SEQUENCEFILE;
ROW FORMAT SERDE … STORED AS SEQUENCEFILE;
RegEx SerDe
CREATE TABLE apachelog ( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING)ROW FORMAT SERDE'org.apache.hadoop.hive.serde2.RegexSerDe'WITH SERDEPROPERTIES ("input.regex"="([^]*) ([^]*) ([^]*) (-\\[^\\]*\\]) ([^ \"]*\"[^\"]*\") (-[0-9]*) (-[0-9]*)(?: ([^ \"]*\".*\") ([^ \"]*\".*\"))?")STORED AS TEXTFILE;
3、分区表(partitioned by)
CREATE TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT'IP Address of the User') COMMENT'This is the page view table'PARTITIONED BY(dt STRING, country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY'\001'STORED AS SEQUENCEFILE;
4、外部表(external)
可以自定义HDFS存储地址,drop表时数据不删除,还是要指定分隔符的
CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT'IP Address of the User', country STRING COMMENT'country of origination') COMMENT'This is the staging page view table'ROW FORMAT DELIMITED FIELDS TERMINATED BY'\054'STORED AS TEXTFILE LOCATION'';
5、利用查询结果建表(Create Table As Select (CTAS))
根据查询结果的列和列类型建表,可以自己指定列分隔符和文件存储格式
CREATE TABLE new_key_value_store ROW FORMAT SERDE"org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"STORED AS RCFile ASSELECT (key%1024) new_key, concat(key, value) key_value_pairFROM key_value_storeSORT BY new_key, key_value_pair;
6、复制已有表结构(Create Table Like)
复制已有表的表结构,不复制数据(属性一样,仅表名不同)
CREATE TABLE empty_key_value_storeLIKE key_value_store;
7、分桶排序的表(Bucketed Sorted Tables)
CREATE TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User') COMMENT 'This is the page view table'PARTITIONED BY(dt STRING, country STRING) CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001'COLLECTION ITEMS TERMINATED BY '\002'MAP KEYS TERMINATED BY '\003'STORED AS SEQUENCEFILE;
(1)上表按userid分桶,桶内按viewTime升序排列,可以更有效的取样和让内部操作更加了解数据结构,从而提高运算性能。
(2)建表时的CLUSTERED BY和SORTED BY语句只对数据读取有作用,对写入没有作用,所以在写入数据时需要手动指定reduce数等于分桶数并且使用CLUSTERED BY和SORTED BY语句。
例如:建表语句
CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING)COMMENT'A bucketed copy of user_info'PARTITIONED BY(ds STRING)CLUSTERED BY(user_id) INTO256BUCKETS;
例如,插入语句
8、倾斜表(Skewed Tables)
某列的几个值出现频率非常高,所以相比于其它值他们的运算非常慢,造成倾斜,Hive可以指定将特定的值单独存储到独立文件中来提高性能。
(1)单列举例
CREATE TABLE list_bucket_single (key STRING, value STRING) SKEWED BY (key) ON (1,5,6) [STORED AS DIRECTORIES];
(2)多列举例
CREATE TABLE list_bucket_multiple (col1 STRING, col2int, col3 STRING) SKEWED BY (col1, col2) ON (('s1',1), ('s3',3), ('s13',13), ('s78',78)) [STORED AS DIRECTORIES];
9、临时表(Temporary Tables)
临时表仅这个session可见,存储在用户临时目录,session结束后删除。
如果与永久表重名,则查询时默认为临时表,直到不重名或者删除临时表。
更多详情,见
Hadoop Hive概念学习系列之hive里的HiveQL——查询语言(十五)
https://www.cnblogs.com/zlslch/p/6105543.html
http://www.cnblogs.com/zlslch/和http://www.cnblogs.com/lchzls/
http://www.cnblogs.com/sunnyDream/
看完本文有收获?请转发分享给更多人
关注「大数据躺过的坑」,提升大神技能
觉得不错,请点赞和留言
领取专属 10元无门槛券
私享最新 技术干货