
在构建数据仓库,进行数据分析,实现异构数据库之间数据转存的情境下会涉及到数据的 ETL(Extract-Transform-Load)
Tip: 一般而言如下情况也可以使用 ETL 来解决:
ETL主要分三部:
整个ETL的过程是像管道流一样进行处理的
Since the data extraction takes time, it is common to execute the three phases in parallel. While the data is being extracted, another transformation process executes. It processes the already received data and prepares it for loading. As soon as there is some data ready to be loaded into the target, the data loading kicks off without waiting for the completion of the previous phases
Ruby 的 kiba gem 可以很容易地实现轻量级的 ETL
这里分享一下 kiba 的简单使用,详细可以参考 官方文档 和 How to reformat CSV files with Kiba (in-depth, hands-on tutorial)
Tip: 目前此 gem 的最新版本为 kiba 0.6.1
[root@h102 ~]# cat /etc/issue
CentOS release 6.6 (Final)
Kernel \r on an \m
[root@h102 ~]# uname -a
Linux h102.temp 2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@h102 ~]# ruby -v
ruby 2.3.0p0 (2015-12-25 revision 53290) [x86_64-linux]
[root@h102 ~]# gem --version
2.5.1
[root@h102 ~]# 这里我们根据 How to reformat CSV files with Kiba (in-depth, hands-on tutorial) 中的实验一步步来体验一下 Kiba 的简单使用方法
date_facture;montant_eur;numero_commande
7/3/2015;10,96;FA1986
7/3/2015;85,11;FA1987
8/3/2015;6,41;FA1988invoice_number,invoice_date,amount_eur
FA1986,2015-03-07,10.96
FA1987,2015-03-07,85.11
FA1988,2015-03-08,6.41[root@h102 ~]# mkdir kiba
[root@h102 ~]# cd kiba
[root@h102 kiba]# ls
[root@h102 kiba]# [root@h102 kiba]# vim Gemfile
[root@h102 kiba]# cat Gemfile
source 'https://gems.ruby-china.org'
gem 'kiba', '~> 0.6.0'
gem 'awesome_print'
[root@h102 kiba]# 这里的源我们使用 source 'https://gems.ruby-china.org' 因为 'https://rubygems.org' 会被墙
gem 'kiba', '~> 0.6.0' 是当前最新的 kiba 版本,项目中要使用到
gem 'awesome_print' 是一个很好用的打印工具
下面是它和普通打印的区别
[root@h102 ~]# irb
2.3.0 :001 > require 'awesome_print'
=> true
2.3.0 :002 > p (1..8).to_a
[1, 2, 3, 4, 5, 6, 7, 8]
=> [1, 2, 3, 4, 5, 6, 7, 8]
2.3.0 :003 > ap (1..8).to_a
[
[0] 1,
[1] 2,
[2] 3,
[3] 4,
[4] 5,
[5] 6,
[6] 7,
[7] 8
]
=> nil
2.3.0 :004 >它可以用很友好(便于人类阅读)地方式展示对象的结构和内容,更详细的用法可以参考 awesome_print
[root@h102 kiba]# bundle install
Don't run Bundler as root. Bundler can ask for sudo if it is needed, and installing your bundle
as root will break this application for all non-root users on this machine.
Fetching gem metadata from https://gems.ruby-china.org/..
Fetching version metadata from https://gems.ruby-china.org/.
Resolving dependencies...
Installing awesome_print 1.7.0
Installing kiba 0.6.1
Using bundler 1.12.5
Bundle complete! 2 Gemfile dependencies, 3 gems now installed.
Use `bundle show [gemname]` to see where a bundled gem is installed.
[root@h102 kiba]# echo "puts 'Hello from Kiba'" > convert-csv.etl
[root@h102 kiba]# bundle exec kiba convert-csv.etl
Hello from Kiba
[root@h102 kiba]#Note: 这里必须确保 bundler gem 已经安装好,否则没法使用 bundle 命令
本文系转载,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文系转载,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。