前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >自己动手写数据库:规划器(Planner)的实现

自己动手写数据库:规划器(Planner)的实现

作者头像
望月从良
发布2023-09-25 16:08:27
2170
发布2023-09-25 16:08:27
举报
文章被收录于专栏:Coding迪斯尼Coding迪斯尼

在上一节我们说明了不同查询树其对应的执行效率不一样。给定 sql 语句,sql 解释器会构造出不同的查询树,因此我们需要专门计算哪种查询树具有最优效率,在数据库系统中,专门负责此工作的模块叫规划器,本节我们研究该模块的实现。

首先我们先给出规划器的接口,在项目目录下创建新文件夹 planner,在里面添加文件 interface.go,然后实现代码如下:

代码语言:javascript
复制
package planner

import (
    "record_manager"
)

type Plan interface {
    Open() interface{}
    BlocksAccessed() int               //对应 B(s)
    RecordsOutput() int                //对应 R(s)
    DistinctValues(fldName string) int //对应 V(s,F)
    Schema() record_manager.SchemaInterface
}

Plan 接口对象跟我们前面的 Scan 对象很像,不同在于 Scan 对象接入表的数据,而 Plan 接口对象接入表的 meta data 数据。在后面的实现中,我们会针对 Select, Project, Product 等关系代数运算去创建对应的 Plan 接口对象,下面我们先看第一个 Plan 实例的实现,创建文件 table_plan.go,实现代码如下:

代码语言:javascript
复制
package planner

import (
    metadata_manager "metadata_management"
    "query"
    "record_manager"
    "tx"
)

type TablePlan struct {
    tx      *tx.Transation
    tblName string
    layout  *record_manager.Layout
    si      *metadata_manager.StatInfo
}

func NewTablePlan(tx *tx.Transation, tblName string, md *metadata_manager.MetaDataManager) *TablePlan {
    tablePlanner := TablePlan{
        tx:      tx,
        tblName: tblName,
    }

    tablePlanner.layout = md.GetLayout(tablePlanner.tblName, tablePlanner.tx)
    tablePlanner.si = md.GetStatInfo(tblName, tablePlanner.layout, tx)

    return &tablePlanner
}

func (t *TablePlan) Open() interface{} {
    return query.NewTableScan(t.tx, t.tblName, t.layout)
}

func (t *TablePlan) RecordsOutput() int {
    return t.si.RecordsOutput()
}

func (t *TablePlan) BlocksAccessed() int {
    return t.si.BlocksAccessed()
}

func (t *TablePlan) DistinctValues(tblName string) int {
    return t.si.DistinctValues(tblName)
}

func (t *TablePlan) Schema() record_manager.SchemaInterface {
    return t.layout.Schema()
}

Plan 的实现在结构上与我们前面说过的 Scan 一样,最底层是 TablePlan,他直接返回对应数据库表的统计信息,实现 SelectPlan, ProjectPlan, ProductPlan 的时候需要传入一个 Plan 接口对象,他们相关接口的调用会转向调用输入 Plan 对象的接口。其中较为复杂的是 SelectPlan 的实现,因为它的执行依赖于传入的 Predicate 对象,其中我们以前在 Predicate 对象中实现的 ReductionFactor 接口就会被用于 RecordsAccessed,以便估计查询条件执行后所返回的数据库表缩小的程度,接口 EquatesWithConstant 用于 DistinctValues 以便用于检测 Predicate 对象对应的查询是否是”A=c”这种类型,其中 A 是字段名,c 是常量。

以前我们为了调试方便,在 Predicate和 Term 类的实现中注释掉了 ReductionFactor ,现在我们回去将他们反注释回来,,对于这两个函数的逻辑,我们将在代码的调试演示视频中再解释一下,相关视频请在 B 站搜索” coding 迪斯尼“。下面我们看 SelectPlan 的实现,创建文件 select_plan.go 实现代码如下:

代码语言:javascript
复制
package planner

import (
    "query"
    "record_manager"
)

type SelectPlan struct {
    p    Plan
    pred *query.Predicate
}

func NewSelectPlan(p Plan, pred *query.Predicate) *SelectPlan {
    return &SelectPlan{
        p:    p,
        pred: pred,
    }
}

func (s *SelectPlan) Open() interface{} {
    scan := s.p.Open()
    return query.NewSelectionScan(scan.(query.UpdateScan), s.pred)
}

func (s *SelectPlan) BlocksAccessed() int {
    return s.p.BlocksAccessed()
}

func (s *SelectPlan) RecordsOutput() int {
    return s.p.RecordsOutput() / s.pred.ReductionFactor(s.p)
}

func (s *SelectPlan) min(a int, b int) int {
    if a <= b {
        return a
    }

    return b
}

func (s *SelectPlan) DistinctValues(fldName string) int {
    if s.pred.EquatesWithConstant(fldName) != nil {
        //如果查询是 A=c 类型,A 是字段,c 是常量,那么查询结果返回一条数据
        return 1
    } else {
        //如果查询是 A=B 类型,A,B 都是字段,那么查询结果返回不同类型数值较小的那个字段
        fldName2 := s.pred.EquatesWithField(fldName)
        if fldName2 != "" {
            return s.min(s.p.DistinctValues(fldName), s.p.DistinctValues(fldName2))
        } else {
            return s.p.DistinctValues(fldName)
        }
    }
}

func (s *SelectPlan) Schema() record_manager.SchemaInterface {
    return s.p.Schema()
}

可以看到 Plan 接口实例的实现跟前面 Scan 接口实例实现的逻辑差不多,很多接口要依赖于传入的 Plan 成员,下面我们看到的 ProjectPlan 跟 ProjectScan 如出一辙,相应接口就是调用到传入的 Plan 对象,创建 project_scan.go,输入代码如下:

代码语言:javascript
复制
package planner

import (
    "query"
    "record_manager"
)

type ProjectPlan struct {
    p      Plan
    schema *record_manager.Schema
}

func NewProjectPlan(p Plan, fieldList []string) *ProjectPlan {
    project_plan := ProjectPlan{
        p:      p,
        schema: record_manager.NewSchema(),
    }

    for _, field := range fieldList {
        project_plan.schema.Add(field, project_plan.p.Schema())
    }

    return &project_plan
}

func (p *ProjectPlan) Open() interface{} {
    s := p.p.Open()
    return query.NewProjectScan(s.(query.Scan), p.schema.Fields())
}

func (p *ProjectPlan) BlocksAccessed() int {
    return p.p.BlocksAccessed()
}

func (p *ProjectPlan) RecordsOutput() int {
    return p.p.RecordsOutput()
}

func (p *ProjectPlan) DistinctValues(fldName string) int {
    return p.DistinctValues(fldName)
}

func (p *ProjectPlan) Schema() record_manager.SchemaInterface {
    return p.schema
}

最后我们看 ProductPlan 的实现,创建 product_plan.go 文件,实现代码如下:

代码语言:javascript
复制
package planner

import (
    "query"
    "record_manager"
)

type ProductScan struct {
    p1     Plan
    p2     Plan
    schema *record_manager.Schema
}

func NewProductScan(p1 Plan, p2 Plan) *ProductScan {
    product_scan := ProductScan{
        p1:     p1,
        p2:     p2,
        schema: record_manager.NewSchema(),
    }

    product_scan.schema.AddAll(p1.Schema())
    product_scan.schema.AddAll(p2.Schema())
    return &product_scan
}

func (p *ProductScan) Open() interface{} {
    s1 := p.p1.Open()
    s2 := p.p2.Open()
    return query.NewProductScan(s1.(query.Scan), s2.(query.Scan))
}

func (p *ProductScan) BlocksAccessed() int {
    return p.p1.BlocksAccessed() + (p.p1.RecordsOutput() * p.p2.BlocksAccessed())
}

func (p *ProductScan) DistinctValues(fldName string) int {
    if p.p1.Schema().HasFields(fldName) {
        return p.p1.DistinctValues(fldName)
    } else {
        return p.p2.DistinctValues(fldName)
    }
}

func (p *ProductScan) Schema() record_manager.SchemaInterface {
    return p.schema
}

为了调用如上代码进行测试,我们完成测试代码如下,在 main.go 中输入如下代码:

代码语言:javascript
复制
package main

import (
    bmg "buffer_manager"
    fm "file_manager"
    "fmt"
    lm "log_manager"
    metadata_manager "metadata_management"
    "planner"
    "query"
    "record_manager"
    "tx"
)

func printStats(n int, p planner.Plan) {
    fmt.Printf("Here are the stats for plan p %d\n", n)
    fmt.Printf("\tR(p%d):%d\n", n, p.RecordsOutput())
    fmt.Printf("\tB(p%d):%d\n", n, p.BlocksAccessed())
}

func createStudentTable() (*tx.Transation, *metadata_manager.MetaDataManager) {
    file_manager, _ := fm.NewFileManager("student", 2048)
    log_manager, _ := lm.NewLogManager(file_manager, "logfile.log")
    buffer_manager := bmg.NewBufferManager(file_manager, log_manager, 3)

    tx := tx.NewTransation(file_manager, log_manager, buffer_manager)
    sch := record_manager.NewSchema()

    sch.AddStringField("sname", 16)
    sch.AddIntField("majorId")
    sch.AddIntField("gradyear")
    layout := record_manager.NewLayoutWithSchema(sch)
    for _, field_name := range layout.Schema().Fields() {
        offset := layout.Offset(field_name)
        fmt.Printf("%s has offset %d\n", field_name, offset)
    }

    ts := query.NewTableScan(tx, "student", layout)
    fmt.Println("Filling the table with 50 random records")
    ts.BeforeFirst()
    val_for_field_sname := make([]int, 0)
    for i := 0; i < 50; i++ {
        ts.Insert() //指向一个可用插槽
        ts.SetInt("majorId", i)
        ts.SetInt("gradyear", 1990+i)
        val_for_field_sname = append(val_for_field_sname, i)
        s := fmt.Sprintf("sname_%d", i)
        ts.SetString("sname", s)
        fmt.Printf("inserting into slot %s: {%d , %s}\n", ts.GetRid().ToString(), i, s)
    }
    mdm := metadata_manager.NewMetaDataManager(false, tx)
    mdm.CreateTable("student", sch, tx)

    return tx, mdm
}

func main() {
    //构造 student 表
    tx, mdm := createStudentTable()
    p1 := planner.NewTablePlan(tx, "student", mdm)
    n := 10
    t := query.NewTerm(query.NewExpressionWithString("majorId"),
        query.NewExpressionWithConstant(query.NewConstantWithInt(&n)))
    pred := query.NewPredicateWithTerms(t)
    p2 := planner.NewSelectPlan(p1, pred)

    n1 := 2000
    t2 := query.NewTerm(query.NewExpressionWithString("gradyear"),
        query.NewExpressionWithConstant(query.NewConstantWithInt(&n1)))
    pred2 := query.NewPredicateWithTerms(t2)
    p3 := planner.NewSelectPlan(p1, pred2)

    c := make([]string, 0)
    c = append(c, "sname")
    c = append(c, "majorId")
    c = append(c, "gradyear")
    p4 := planner.NewProjectPlan(p3, c)

    printStats(1, p1)
    printStats(2, p2)
    printStats(3, p3)
    printStats(4, p4)
}

在上面代码中,我们创建了 student 表,他有三个字段分别为 sname, majorId, gradyear,然后我们创建 50 条记录插入表中,接下来我们创建 TablePlan, SelectPlan, ProjectPlan 来计算表中的查询数值,上面代码运行后输出结果如下:

代码语言:javascript
复制
GOROOT=/usr/local/go #gosetup
GOPATH=/Users/my/go #gosetup
/usr/local/go/bin/go build -o /Users/my/Library/Caches/JetBrains/GoLand2023.2/tmp/GoLand/___1go_build_main_go /Users/my/Documents/b站代码/代码/simple_db/main.go #gosetup
/Users/my/Library/Caches/JetBrains/GoLand2023.2/tmp/GoLand/___1go_build_main_go
sname has offset 8
majorId has offset 32
gradyear has offset 40
Filling the table with 50 random records
inserting into slot [ 0 , 0 ]: {0 , sname_0}
inserting into slot [ 0 , 1 ]: {1 , sname_1}
inserting into slot [ 0 , 2 ]: {2 , sname_2}
inserting into slot [ 0 , 3 ]: {3 , sname_3}
inserting into slot [ 0 , 4 ]: {4 , sname_4}
inserting into slot [ 0 , 5 ]: {5 , sname_5}
inserting into slot [ 0 , 6 ]: {6 , sname_6}
inserting into slot [ 0 , 7 ]: {7 , sname_7}
inserting into slot [ 0 , 8 ]: {8 , sname_8}
inserting into slot [ 0 , 9 ]: {9 , sname_9}
inserting into slot [ 0 , 10 ]: {10 , sname_10}
inserting into slot [ 0 , 11 ]: {11 , sname_11}
inserting into slot [ 0 , 12 ]: {12 , sname_12}
inserting into slot [ 0 , 13 ]: {13 , sname_13}
inserting into slot [ 0 , 14 ]: {14 , sname_14}
inserting into slot [ 0 , 15 ]: {15 , sname_15}
inserting into slot [ 0 , 16 ]: {16 , sname_16}
inserting into slot [ 0 , 17 ]: {17 , sname_17}
inserting into slot [ 0 , 18 ]: {18 , sname_18}
inserting into slot [ 0 , 19 ]: {19 , sname_19}
inserting into slot [ 0 , 20 ]: {20 , sname_20}
inserting into slot [ 0 , 21 ]: {21 , sname_21}
inserting into slot [ 0 , 22 ]: {22 , sname_22}
inserting into slot [ 0 , 23 ]: {23 , sname_23}
inserting into slot [ 0 , 24 ]: {24 , sname_24}
inserting into slot [ 0 , 25 ]: {25 , sname_25}
inserting into slot [ 0 , 26 ]: {26 , sname_26}
inserting into slot [ 0 , 27 ]: {27 , sname_27}
inserting into slot [ 0 , 28 ]: {28 , sname_28}
inserting into slot [ 0 , 29 ]: {29 , sname_29}
inserting into slot [ 0 , 30 ]: {30 , sname_30}
inserting into slot [ 0 , 31 ]: {31 , sname_31}
inserting into slot [ 0 , 32 ]: {32 , sname_32}
inserting into slot [ 0 , 33 ]: {33 , sname_33}
inserting into slot [ 0 , 34 ]: {34 , sname_34}
inserting into slot [ 0 , 35 ]: {35 , sname_35}
inserting into slot [ 0 , 36 ]: {36 , sname_36}
inserting into slot [ 0 , 37 ]: {37 , sname_37}
inserting into slot [ 0 , 38 ]: {38 , sname_38}
inserting into slot [ 0 , 39 ]: {39 , sname_39}
inserting into slot [ 0 , 40 ]: {40 , sname_40}
inserting into slot [ 0 , 41 ]: {41 , sname_41}
inserting into slot [ 1 , 0 ]: {42 , sname_42}
inserting into slot [ 1 , 1 ]: {43 , sname_43}
inserting into slot [ 1 , 2 ]: {44 , sname_44}
inserting into slot [ 1 , 3 ]: {45 , sname_45}
inserting into slot [ 1 , 4 ]: {46 , sname_46}
inserting into slot [ 1 , 5 ]: {47 , sname_47}
inserting into slot [ 1 , 6 ]: {48 , sname_48}
inserting into slot [ 1 , 7 ]: {49 , sname_49}
Here are the stats for plan p 1
        R(p1):50
        B(p1):2
Here are the stats for plan p 2
        R(p2):2
        B(p2):2
Here are the stats for plan p 3
        R(p3):2
        B(p3):2
Here are the stats for plan p 4
        R(p4):2
        B(p4):2

Process finished with the exit code 0

从输出可以看出,50 条记录占据了两个区块,第一个区块存放了 41 条记录,第二个区块存放了 7 条记录,从 TablePlan 的输出我们看到 B(s)=2,也就是它表明数据库表有 2 个区块,R(s)=50,表内有 50 条记录,p2, p3 , p4 的输出我将在视频演示中进行讲解,请在 B 站搜索“Coding 迪斯尼”查看相关视频。

代码下载:

链接: https://pan.baidu.com/s/1ICnk3FImKIsnUMpsuB77CA 提取码: 3abp

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2023-09-25 13:01,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 Coding迪斯尼 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
数据库
云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档