clickhouse准备
本地表
create table student on cluster luopc_mpp_cluster (
id UInt8,
name String,
age UInt8,
create_time Datetime
) engine =ReplicatedMergeTree('/clickhouse/tables/{shard}/student','{replica}')
primary key (id)
order by (id,age);
分布式表
create table student_all on cluster luopc_mpp_cluster(
id UInt8,
name String,
age UInt8,
create_time Datetime
)engine=Distributed(luopc_mpp_cluster,default,student,rand());
插入数据
insert into student_all values
(1,'a',17,'2021-05-08 12:00:00'),
(2,'b',25,'2021-05-08 12:00:00'),
(3,'c',20,'2021-05-08 12:00:00'),
(4,'d',22,'2021-05-08 12:00:00'),
(5,'e',30,'2021-05-08 12:00:00');
说明
本地表建表之后,集群中各个节点均可查询到此表。分布式表是基于本地表的,
作用是相当于是视图,提供全局查询和写入的操作,实际数据是在本地表中存储的。
mysql准备
建表
CREATE TABLE `student` (
`id` int(11) NOT NULL,
`name` varchar(100) NOT NULL,
`age` int(11) NOT NULL,
`create_time` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
插入数据
INSERT INTO test.student VALUES
(6, 'f', 25, '2021-06-28 12:00:00');
执行datax
python datax/bin/datax.py mysqltoclickhousedemo.json
导入之前数据情况
导入之后数据情况
datax执行日志如下