关系代数是Calcite的核心。每个查询都被表示为一棵关系运算符的树。你可以将一条SQL语句翻译为关系代数,也可以直接建立树状结构。
规则器规则使用保留语义的数学特性来转换表达树。例如,如果一个过滤操作没有引入其他输入的列,那么可以将一个过滤器下推至连接之前。
Calcite通过对关系表达式进行反复应用规划器规则来优化查询。一个成本模型指导了优化的整个过程,规划器生成一个替代的表达式,语义与之前表达式相同,但具有更低的成本。
规划过程是可扩展的。你可以添加自己的关系运算符、规划器规则、成本模型和统计数据。
建立关系表达式的最简单方法时使用代数构建器RelBuilder
,下面是一个示例:
final FrameworkConfig config;
final RelBuilder builder = RelBuilder.create(config);
final RelNode node = builder
.scan("EMP")
.build();
System.out.println(RelOptUtil.toString(node));
(你可以从这个文件RelBUilderExample.java中获取更多的例子)
以上代码输出下列内容:
LogicalTableScan(table=[[scott, EMP]])
它相当于对EMP
表进行了全表扫描,相当于这句SQL:
SELECT *
FROM scott.EMP;
现在,让我们添加一个项目,相当于这句SQL:
SELECT ename, deptno
FROM scott.EMP;
我们仅仅只需要在调用build
方法之前调用project
方法即可:
final RelNode node = builder
.scan("EMP")
.project(builder.field("DEPTNO"), builder.field("ENAME"))
.build();
System.out.println(RelOptUtil.toString(node));
代码输出以下内容:
LogicalProject(DEPTNO=[$7], ENAME=[$1])
LogicalTableScan(table=[[scott, EMP]])
两次调用builder.field
方法创建了简单的表达式,从输入的通过调用scan
方法产生的关系表达式(TableScan)中返回字段。
Calcite将字段按照序号进行转换,7和1
一个查询包含了过滤和聚合:
final RelNode node = builder
.scan("EMP")
.aggregate(builder.groupKey("DEPTNO"),
builder.count(false, "C"),
builder.sum(false, "S", builder.field("SAL")))
.filter(
builder.call(SqlStdOperatorTable.GREATER_THAN,
builder.field("C"),
builder.literal(10)))
.build();
System.out.println(RelOptUtil.toString(node));
相等于下列SQL语句:
SELECT deptno, count(*) AS c, sum(sal) AS s
FROM emp
GROUP BY deptno
HAVING count(*) > 10
同时代码输出以下内容:
LogicalFilter(condition=[>($1, 10)])
LogicalAggregate(group=[{7}], C=[COUNT()], S=[SUM($5)])
LogicalTableScan(table=[[scott, EMP]])
构建器使用栈来存储一个步骤所产生的关系表达式,并将其作为输入传递给下一个步骤,这使得产生关系表达式的方法可以产生一个构造器。
大多数时候,你将使用的唯一栈方法是build()
,以获得最后的关系表达式,即树的根部。
有时栈会嵌套的很深,让人感到困惑。为了让事情变得更加简单一些,你可以从栈中移除表达式。例如,在这里我们建立了一个灌木状的连接:
.
join
/ \
join join
/ \ / \
CUSTOMERS ORDERS LINE_ITEMS PRODUCTS
我们分三个步骤来构建它。将中间结果存储在left
,right
两个变量中,当创建最终的Join
时,使用push()
方法将它们放回栈中:
final RelNode left = builder
.scan("CUSTOMERS")
.scan("ORDERS")
.join(JoinRelType.INNER, "ORDER_ID")
.build();
final RelNode right = builder
.scan("LINE_ITEMS")
.scan("PRODUCTS")
.join(JoinRelType.INNER, "PRODUCT_ID")
.build();
final RelNode result = builder
.push(left)
.push(right)
.join(JoinRelType.INNER, "ORDER_ID")
.build();
默认的Relbuilder
创建的是没有公约的逻辑RelNode
,但你可以通过adoptConvention()
切换到使用一个不同的约定:
final RelNode result = builder
.push(input)
.adoptConvention(EnumerableConvention.INSTANCE)
.sort(toCollation)
.build();
在这种情况下,我们在输入的RelNode
之上创建了一个EnumerableSort
。
你可以通过名称或者序号来引用一个字段。
序号是从0开始。每个运算符都保证其输出字段的出现顺序。例如,Project返回由每个标量表达式产生的字段。
操作符的字段名保证是唯一的,但有时这意味着名字并不完全是你期望的那样。例如,当你将EMP连接到DEPT时,其中一个输出字段被称为DEPTNO
,而另一个字段被称为DEPTNO_1
。
一些关系表达式方法让你对字段名称有更多的控制:
project
让你用alias(expr, fieldName)
来包装表达式。它删除了包装器但是保留了建议的名称(只要它是唯一的)values(String[] fieldNames, Object... values)
接受一个字段名的数组。如果数组中的任何元素为空,构建器将生成一个唯一的名称。如果一个表达式使用了一个输入字段,或者转换了一个输入字段,那么它将使用该输入字段的名称。
一旦唯一的字段名被分配,这些名字是不可更改的。如果你有一个特定的RelNode
实例,你可以相信字段名不会改变。事实上,整个关系表达式不可变的。
但是如果一个关系表达式已经通过了几个重写规则(详见RelOptRule),那么产生的表达式的字段名可能与原来的不太一样。在这一点上,最好使用序号来引用字段。
当你建立一个接收多个输入的关系表达式时,你需要建立考虑到这一点的字段引用。这在构建连接条件的时候常常发生。
假设你正在建立一个关于EMP的连接,EMP有8个字段[EMNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO],而DEPT有3个字段[DEPTNO, DNAME, LOC]。在内部,Calcite将这些字段表示为一个有11个字段的组合输入行:左边输入的第一个字段是字段#0(基于0,记住),右边输入的第一个字段是字段#8
但是通过 builder API,你可以指定哪个字段的输入。要引用 "SAL",内部字段#5,可以写 builder.field(2, 0, "SAL"),builder.field(2, "EMP", "SAL"),或者 builder.field(2, 0, 5)。这意味着 "两个输入中的输入#0的字段#5"。(为什么它需要知道有两个输入?因为它们被存储在堆栈中;输入 #1 在堆栈的顶部,而输入 #0 在它的下面。如果我们不告诉构建器有两个输入,它就不知道要到多深的地方去找0号输入)。
同样,要引用 "DNAME",内部字段 #9 (8 + 1),请写 builder.field(2, 1, "DNAME"), builder.field(2, "DEPT", "DNAME"), 或者 builder.field(2, 1, 1)。
警告。目前的API是实验性的,如有变化,恕不另行通知。一个SQL递归查询,例如这个产生序列1, 2, 3, ...10的查询:
WITH RECURSIVE aux(i) AS (
VALUES (1)
UNION ALL
SELECT i+1 FROM aux WHERE i < 10
)
SELECT * FROM aux
可以通过对TransientTable
和RepeatUnion
的扫描来生成:
final RelNode node = builder
.values(new String[] { "i" }, 1)
.transientScan("aux")
.filter(
builder.call(
SqlStdOperatorTable.LESS_THAN,
builder.field(0),
builder.literal(10)))
.project(
builder.call(
SqlStdOperatorTable.PLUS,
builder.field(0),
builder.literal(1)))
.repeatUnion("aux", true)
.build();
System.out.println(RelOptUtil.toString(node));
以上代码输出以下内容:
LogicalRepeatUnion(all=[true])
LogicalTableSpool(readType=[LAZY], writeType=[LAZY], tableName=[aux])
LogicalValues(tuples=[[{ 1 }]])
LogicalTableSpool(readType=[LAZY], writeType=[LAZY], tableName=[aux])
LogicalProject($f0=[+($0, 1)])
LogicalFilter(condition=[<($0, 10)])
LogicalTableScan(table=[[aux]])
以下方法创建一个关系表达式(RelNode),将其推入栈,并返回RelBuilder。
METHOD | DESCRIPTION |
---|---|
scan(tableName) | Creates a TableScan. |
functionScan(operator, n, expr...) functionScan(operator, n, exprList) | Creates a TableFunctionScan of the n most recent relational expressions. |
transientScan(tableName [, rowType]) | Creates a TableScan on a TransientTable with the given type (if not specified, the most recent relational expression’s type will be used). |
values(fieldNames, value...) values(rowType, tupleList) | Creates a Values. |
filter([variablesSet, ] exprList) filter([variablesSet, ] expr...) | Creates a Filter over the AND of the given predicates; if variablesSet is specified, the predicates may reference those variables. |
project(expr...) project(exprList [, fieldNames]) | Creates a Project. To override the default name, wrap expressions using alias, or specify the fieldNames argument. |
projectPlus(expr...) projectPlus(exprList) | Variant of project that keeps original fields and appends the given expressions. |
projectExcept(expr...) projectExcept(exprList) | Variant of project that keeps original fields and removes the given expressions. |
permute(mapping) | Creates a Project that permutes the fields using mapping. |
convert(rowType [, rename]) | Creates a Project that converts the fields to the given types, optionally also renaming them. |
aggregate(groupKey, aggCall...) aggregate(groupKey, aggCallList) | Creates an Aggregate. |
distinct() | Creates an Aggregate that eliminates duplicate records. |
pivot(groupKey, aggCalls, axes, values) | Adds a pivot operation, implemented by generating an Aggregate with a column for each combination of measures and values |
unpivot(includeNulls, measureNames, axisNames, axisMap) | Adds an unpivot operation, implemented by generating a Join to a Values that converts each row to several rows |
sort(fieldOrdinal...) sort(expr...) sort(exprList) | Creates a Sort. In the first form, field ordinals are 0-based, and a negative ordinal indicates descending; for example, -2 means field 1 descending. In the other forms, you can wrap expressions in as, nullsFirst or nullsLast. |
sortLimit(offset, fetch, expr...) sortLimit(offset, fetch, exprList) | Creates a Sort with offset and limit. |
limit(offset, fetch) | Creates a Sort that does not sort, only applies with offset and limit. |
exchange(distribution) | Creates an Exchange. |
sortExchange(distribution, collation) | Creates a SortExchange. |
correlate(joinType, correlationId, requiredField...) correlate(joinType, correlationId, requiredFieldList) | Creates a Correlate of the two most recent relational expressions, with a variable name and required field expressions for the left relation. |
join(joinType, expr...) join(joinType, exprList) join(joinType, fieldName...) | Creates a Join of the two most recent relational expressions. The first form joins on a boolean expression (multiple conditions are combined using AND). The last form joins on named fields; each side must have a field of each name. |
semiJoin(expr) | Creates a Join with SEMI join type of the two most recent relational expressions. |
antiJoin(expr) | Creates a Join with ANTI join type of the two most recent relational expressions. |
union(all [, n]) | Creates a Union of the n (default two) most recent relational expressions. |
intersect(all [, n]) | Creates an Intersect of the n (default two) most recent relational expressions. |
minus(all) | Creates a Minus of the two most recent relational expressions. |
repeatUnion(tableName, all [, n]) | Creates a RepeatUnion associated to a TransientTable of the two most recent relational expressions, with n maximum number of iterations (default -1, i.e. no limit). |
snapshot(period) | Creates a Snapshot of the given snapshot period. |
match(pattern, strictStart, strictEnd, patterns, measures, after, subsets, allRows, partitionKeys, orderKeys, interval) | Creates a Match. |
参数类型详解:
expr
、interval
,RexNodeexpr...
, requiredField...
Array of RexNodeexprList
, measureList
, partitionKeys
, orderKeys
, requiredFieldList
Iterable of RexNodefieldOrdinal
Ordinal of a field within its row (starting from 0)fieldName
Name of a field, unique within its rowfieldName...
Array of StringfieldNames
Iterable of StringrowType
RelDataTypegroupKey
RelBuilder.GroupKeyaggCall...
Array of RelBuilder.AggCallaggCallList
Iterable of RelBuilder.AggCallvalue...
Array of Objectvalue
ObjecttupleList
Iterable of List of RexLiteralall
, distinct
, strictStart
, strictEnd
, allRows
booleanalias
StringcorrelationId
CorrelationIdvariablesSet
Iterable of CorrelationIdvarHolder
Holder of RexCorrelVariablepatterns
Map whose key is String, value is RexNodesubsets
Map whose key is String, value is a sorted set of Stringdistribution
RelDistributioncollation
RelCollationoperator
SqlOperatorjoinType
JoinRelType构造器的方法包含各种优化,包括:
and
和or
操作可能有多个操作数),例如(将x = 1 AND TRUE转换为x = 1)sort
再limit
,效果和你使用sortLimit
一样有一些注释方法可以为栈上的顶级关系表达式添加注释信息:
METHOD | DESCRIPTION |
---|---|
as(alias) | Assigns a table alias to the top relational expression on the stack |
variable(varHolder) | Creates a correlation variable referencing the top relational expression |
METHOD | DESCRIPTION |
---|---|
build() | Pops the most recently created relational expression off the stack |
push(rel) | Pushes a relational expression onto the stack. Relational methods such as scan, above, call this method, but user code generally does not |
pushAll(collection) | Pushes a collection of relational expressions onto the stack |
peek() | Returns the relational expression most recently put onto the stack, but does not remove it |
以下方法返回一个标量表达式(RexNode)。
它们中的许多都使用栈中的内容。例如,field("DEPTNO")返回一个对刚刚添加到堆栈中的关系表达式的"DEPTNO"字段的引用。
METHOD | DESCRIPTION |
---|---|
literal(value) | Constant |
field(fieldName) | Reference, by name, to a field of the top-most relational expression |
field(fieldOrdinal) | Reference, by ordinal, to a field of the top-most relational expression |
field(inputCount, inputOrdinal, fieldName) | Reference, by name, to a field of the (inputCount - inputOrdinal)th relational expression |
field(inputCount, inputOrdinal, fieldOrdinal) | Reference, by ordinal, to a field of the (inputCount - inputOrdinal)th relational expression |
field(inputCount, alias, fieldName) | Reference, by table alias and field name, to a field at most inputCount - 1 elements from the top of the stack |
field(alias, fieldName) | Reference, by table alias and field name, to a field of the top-most relational expressions |
field(expr, fieldName) | Reference, by name, to a field of a record-valued expression |
field(expr, fieldOrdinal) | Reference, by ordinal, to a field of a record-valued expression |
fields(fieldOrdinalList) | List of expressions referencing input fields by ordinal |
fields(mapping) | List of expressions referencing input fields by a given mapping |
fields(collation) | List of expressions, exprList, such that sort(exprList) would replicate collation |
call(op, expr...) call(op, exprList) | Call to a function or operator |
and(expr...) and(exprList) | Logical AND. Flattens nested ANDs, and optimizes cases involving TRUE and FALSE. |
or(expr...) or(exprList) | Logical OR. Flattens nested ORs, and optimizes cases involving TRUE and FALSE. |
not(expr) | Logical NOT |
equals(expr, expr) | Equals |
isNull(expr) | Checks whether an expression is null |
isNotNull(expr) | Checks whether an expression is not null |
alias(expr, fieldName) | Renames an expression (only valid as an argument to project) |
cast(expr, typeName) cast(expr, typeName, precision) cast(expr, typeName, precision, scale) | Converts an expression to a given type |
desc(expr) | Changes sort direction to descending (only valid as an argument to sort or sortLimit) |
nullsFirst(expr) | Changes sort order to nulls first (only valid as an argument to sort or sortLimit) |
nullsLast(expr) | Changes sort order to nulls last (only valid as an argument to sort or sortLimit) |
cursor(n, input) | Reference to inputth (0-based) relational input of a TableFunctionScan with n inputs (see functionScan) |
以下方法将子查询转换为标量值(在in、exists、some、all、unique的情况下是BOOLEAN;对于scalarQuery是任何标量类型。对于arrayQuery是ARRAY,对于mapQuery是MAP,对于multisetQuery是MULTISET)
就如下所示,relFn
是一个函数,它接收一个RelBuilder
参数并返回一个RelNode
。你通常把它作为一个lambda表达式来实现;该方法用一个具有正确上下文的RelBuilder
来调用你的代码,而你的代码则返回将成为子查询的RelNode
。
METHOD | DESCRIPTION |
---|---|
all(expr, op, relFn) | Returns whether expr has a particular relation to all of the values of the sub-query |
arrayQuery(relFn) | Returns the rows of a sub-query as an ARRAY |
exists(relFn) | Tests whether sub-query is non-empty |
in(expr, relFn) in(exprList, relFn) | Tests whether a value occurs in a sub-query |
mapQuery(relFn) | Returns the rows of a sub-query as a MAP |
multisetQuery(relFn) | Returns the rows of a sub-query as a MULTISET |
scalarQuery(relFn) | Returns the value of the sole column of the sole row of a sub-query |
some(expr, op, relFn) | Returns whether expr has a particular relation to one or more of the values of the sub-query |
unique(relFn) | Returns whether the rows of a sub-query are unique |
以下方法返回用于匹配的模式。
METHOD | DESCRIPTION |
---|---|
patternConcat(pattern...) | Concatenates patterns |
patternAlter(pattern...) | Alternates patterns |
patternQuantify(pattern, min, max) | Quantifies a pattern |
patternPermute(pattern...) | Permutes a pattern |
patternExclude(pattern) | Excludes a pattern |
以下方法返回RelBuilder.GroupKey。
METHOD | DESCRIPTION |
---|---|
groupKey(fieldName...) groupKey(fieldOrdinal...) groupKey(expr...) groupKey(exprList) | Creates a group key of the given expressions |
groupKey(exprList, exprListList) | Creates a group key of the given expressions with grouping sets |
groupKey(bitSet [, bitSets]) | Creates a group key of the given input columns, with multiple grouping sets if bitSets is specified |
以下方法返回RelBuilder.AggCall。
METHOD | DESCRIPTION |
---|---|
aggregateCall(op, expr...) aggregateCall(op, exprList) | Creates a call to a given aggregate function |
count([ distinct, alias, ] expr...) count([ distinct, alias, ] exprList) | Creates a call to the COUNT aggregate function |
countStar(alias) | Creates a call to the COUNT(*) aggregate function |
sum([ distinct, alias, ] expr) | Creates a call to the SUM aggregate function |
min([ alias, ] expr) | Creates a call to the MIN aggregate function |
max([ alias, ] expr) | Creates a call to the MAX aggregate function |
要进一步修改AggCall
,请调用以下方法:
METHOD | DESCRIPTION |
---|---|
approximate(approximate) | Allows approximate value for the aggregate of approximate |
as(alias) | Assigns a column alias to this expression (see SQL AS) |
distinct() | Eliminates duplicate values before aggregating (see SQL DISTINCT) |
distinct(distinct) | Eliminates duplicate values before aggregating if distinct |
filter(expr) | Filters rows before aggregating (see SQL FILTER (WHERE ...)) |
sort(expr...) sort(exprList) | Sorts rows before aggregating (see SQL WITHIN GROUP) |
unique(expr...) unique(exprList) | Makes rows unique before aggregating (see SQL WITHIN DISTINCT) |
over() | Converts this AggCall into a windowed aggregate (see OverCall below) |
创建一个 RelBuilder.OverCall,它代表了对窗口聚合函数的调用,创建一个聚合调用,然后调用其over()
方法,例如count().over()
要进一步修改OverCall
,调用以下方法:
METHOD | DESCRIPTION |
---|---|
rangeUnbounded() | Creates an unbounded range-based window, RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING |
rangeFrom(lower) | Creates a range-based window bounded below, RANGE BETWEEN lower AND CURRENT ROW |
rangeTo(upper) | Creates a range-based window bounded above, RANGE BETWEEN CURRENT ROW AND upper |
rangeBetween(lower, upper) | Creates a range-based window, RANGE BETWEEN lower AND upper |
rowsUnbounded() | Creates an unbounded row-based window, ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING |
rowsFrom(lower) | Creates a row-based window bounded below, ROWS BETWEEN lower AND CURRENT ROW |
rowsTo(upper) | Creates a row-based window bounded above, ROWS BETWEEN CURRENT ROW AND upper |
rowsBetween(lower, upper) | Creates a rows-based window, ROWS BETWEEN lower AND upper |
partitionBy(expr...) partitionBy(exprList) | Partitions the window on the given expressions (see SQL PARTITION BY) |
orderBy(expr...) sort(exprList) | Sorts the rows in the window (see SQL ORDER BY) |
allowPartial(b) | Sets whether to allow partial width windows; default true |
nullWhenCountZero(b) | Sets whether whether the aggregate function should evaluate to null if no rows are in the window; default false |
as(alias) | Assigns a column alias (see SQL AS) and converts this OverCall to a RexNode |
toRex() | Converts this OverCall to a RexNode |
本文分享自 Tyrant Lucifer 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!