golang源码分析：boltdb（3）

golangLeetcode

发布于 2023-09-06 19:29:54

1520

发布于 2023-09-06 19:29:54

可用的空闲页号信息存储在freelist中，具体位于freelist.go文件中，定义如下：

// freelist represents a list of all pages that are available for allocation.
// It also tracks pages that have been freed but are still in use by open transactions.
type freelist struct {
  ids     []pgid          // all free and available free page ids.
  pending map[txid][]pgid // mapping of soon-to-be free page ids by tx.
  cache   map[pgid]bool   // fast lookup of all free and pending page ids.
}

它的三个属性分别表示所有空闲可用的页，刚刚事务中被释放的可用页，但是事务还没提交，可能还被读事务在用的空闲页，以及每个页id到是否空闲的映射关系。

// newFreelist returns an empty, initialized freelist.
func newFreelist() *freelist {
  return &freelist{
    pending: make(map[txid][]pgid),
    cache:   make(map[pgid]bool),
  }
}

加载一个磁盘页后会进行reindex操作，过程中会对空闲页id进行排序，并建立是否可用的映射关系

    func (f *freelist) read(p *page) {
            ids := ((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[idx:count]
    f.ids = make([]pgid, len(ids))
    copy(f.ids, ids)


    // Make sure they're sorted.
    sort.Sort(pgids(f.ids))
        f.reindex()

// reindex rebuilds the free cache based on available and pending free lists.
func (f *freelist) reindex() {
  f.cache = make(map[pgid]bool, len(f.ids))
  for _, id := range f.ids {
    f.cache[id] = true
  }
  for _, pendingIDs := range f.pending {
    for _, pendingID := range pendingIDs {
      f.cache[pendingID] = true
    }
  }
}

bolt_amd64.go中可以看到最大可以分配空间是2G

// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF

逻辑视图上是通过Bucket管理数据的关联关系的，关系是一个树形结构，具体的定义和数据解析位于bucket.go

func (b *Bucket) dereference() {
  if b.rootNode != nil {
    b.rootNode.root().dereference()
  }


  for _, child := range b.buckets {
    child.dereference()
  }
}

// Bucket represents a collection of key/value pairs inside the database.
type Bucket struct {
  *bucket
  tx       *Tx                // the associated transaction
  buckets  map[string]*Bucket // subbucket cache
  page     *page              // inline page reference
  rootNode *node              // materialized node for the root page.
  nodes    map[pgid]*node     // node cache


  // Sets the threshold for filling nodes when they split. By default,
  // the bucket will fill to 50% but it can be useful to increase this
  // amount if you know that your write workloads are mostly append-only.
  //
  // This is non-persisted across transactions so it must be set in every Tx.
  FillPercent float64
}

每一个Bucket在物理存储上都是一个B+树，它包含了root节点 pageid和全局递增的sequence，意味着每一个Bucket的都不一样。Bucket上存了关联事务的指针，用map存储了key到子Bucket的映射关联。满足内联的条件有两个

该子Bucket再没有嵌套的子Bucket了。
整个子Bucket的大小不能超过page size/4。

如果上述两个条件都符合，就不会为这个Bucket单独申请一个页，而是通过内联的方式，将多个Bucket存储到一个页里面，以节省空间。所以Bucket里面缓存了内联页的指针。

它同时也存储了根页面的指针，以及pageid到node页的映射缓存。

// bucket represents the on-file representation of a bucket.
// This is stored as the "value" of a bucket key. If the bucket is small enough,
// then its root page can be stored inline in the "value", after the bucket
// header. In the case of inline buckets, the "root" will be 0.
type bucket struct {
  root     pgid   // page id of the bucket's root-level page
  sequence uint64 // monotonically incrementing, used by NextSequence()
}

其中node代表了一个具体的磁盘页，它的定义位于node.go

// node represents an in-memory, deserialized page.

type node struct {

  bucket     *Bucket

  isLeaf     bool

  unbalanced bool

  spilled    bool

  key        []byte

  pgid       pgid

  parent     *node

  children   nodes

  inodes     inodes

}

其中key []byte和bucket *Bucket，分别代表存储的key和对应Bucket，parent指向了父亲页，children指向了孩子页列表，共同构成了一个B+树，inodes指向了页内存储的数据列表。每一个项都可以表示为

type inode struct {
  flags uint32
  pgid  pgid
  key   []byte
  value []byte
}
        / inode represents an internal node inside of a node.
// It can be used to point to elements in a page or point
// to an element which hasn't been added to a page yet.

type inodes []inode

key和value分别代表了页中存储的键值对信息。node的解析过程如下：

// dereference causes the node to copy all its inode key/value references to heap memory.
// This is required when the mmap is reallocated so inodes are not pointing to stale data.
func (n *node) dereference() {
  if n.key != nil {
    key := make([]byte, len(n.key))
    copy(key, n.key)
    n.key = key
    _assert(n.pgid == 0 || len(n.key) > 0, "dereference: zero-length node key on existing node")
  }
          for i := range n.inodes {
    inode := &n.inodes[i]


    key := make([]byte, len(inode.key))
    copy(key, inode.key)
    inode.key = key
    _assert(len(inode.key) > 0, "dereference: zero-length inode key")


    value := make([]byte, len(inode.value))
    copy(value, inode.value)
    inode.value = value
  }
          for _, child := range n.children {
    child.dereference()
  }


  // Update statistics.
  n.bucket.tx.stats.NodeDeref++

整个过程是一个递归的过程，获取祖先节点的过程也是一样的：

func (n *node) root() *node {
  if n.parent == nil {
    return n
  }
  return n.parent.root()
}

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2023-05-26，如有侵权请联系 cloudcommunity@tencent.com 删除

存储

本文分享自 golang算法架构leetcode技术php 微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

登录后参与评论

0 条评论

热度

golang源码分析：boltdb（3）

golang源码分析：boltdb（3）

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐