3. 没有UNDO，更新数据中途后悔了怎么办？数据是怎么回滚的？

原创

顽石神

发布于 2022-02-26 07:12:25

1.4K0

文章被收录于专栏：PostgreSQL实战PostgreSQL实战

不同的架构决定了产品不一样的特性，看完了PostgreSQL核心进程会发现并没有喜闻乐见的UNDO模块，既然没有UNDO，那么我在事务修改了一条数据，发现数据改错了，突然不想改了数据还能回退吗？

弄清楚这个问题前，先让我们来看看PostgreSQL存储结构。--下图来自网络

While the HeapTupleHeaderData structure contains seven fields, four fields are required in the subsequent sections.

t_xmin holds the txid of the transaction that inserted this tuple.
t_xmax holds the txid of the transaction that deleted or updated this tuple. If this tuple has not been deleted or updated, t_xmax is set to 0, which means INVALID.
t_cid holds the command id (cid), which means how many SQL commands were executed before this command was executed within the current transaction beginning from 0. For example, assume that we execute three INSERT commands within a single transaction: 'BEGIN; INSERT; INSERT; INSERT; COMMIT;'. If the first command inserts this tuple, t_cid is set to 0. If the second command inserts this, t_cid is set to 1, and so on.
t_ctid holds the tuple identifier (tid) that points to itself or a new tuple. tid, described in Section 1.3, is used to identify a tuple within a table. When this tuple is updated, the t_ctid of this tuple points to the new tuple; otherwise, the t_ctid points to itself.

一条SQL变更过程如下：

通过上图可以看出，当前事务ID（txid）是100，它执行一条UPDATE操作流程是标记原来的数据行t_xmax字段为当前事务ID，值为100，并且构造一条修改后的数据行到页面上，新数据行t_xmin为当前事务ID，值为100。

此时数据库中被修改的数据行有两个‘版本’，第一个版本是数据行修改前的版本，第二个版本是数据行修改后的版本，在读提交的场景下，重新启动新会话（txid=101），查询tbl表中的数据，数据库会先返回A元组，再根据A元组找到B元组。

PostgreSQL在数据行级别实现’数据多版本‘冗余，产生版本链，加上’数据可见性判断‘规则，实现了数据库MVCC机制。

这种实现方式在大事务回滚的场景下非常快，但是对于频繁更新且没有及时清理死元组的记录，性能会下降非常厉害。所以在维护PostgreSQL数据库时，表的死元组指标非常重要，需要合理的调整autovacuum参数来及时回收死元组。

频繁更新或删除操作影响

接下来看一个例子，感受死元组对性能的影响：

建表并插入200万条数据：

postgres=> create table t_mvcc(id int primary key,val text);

postgres=> insert into t_mvcc select generate_series(1,2000000),(random()*26)::integer;

postgres=> \timing

postgres=> select count(*) from t_mvcc;

-[ RECORD 1 ]--

count | 2000000

Time: 287.797 ms

做全表统计耗时不到300毫秒，接下来重复几次，delete和Insert操作：

postgres=> delete from t_mvcc;

postgres=> insert into t_mvcc select generate_series(1,2000000),(random()*26)::integer;

postgres=> select count(*) from t_mvcc;

-[ RECORD 1 ]--

count | 2000000

Time: 4690.441 ms (00:04.690)

经过几次操作后耗时增加接近20倍。

执行了delete操作，为什么表容量不下降？

先从一个实验说起：

postgres=> insert into t_mvcc select generate_series(1,40000000),(random()*26)::integer;

INSERT 0 40000000

Time: 393689.209 ms (06:33.689)

postgres=> SELECT pg_size_pretty(pg_table_size('t_mvcc'));

pg_size_pretty

----------------

1859 MB

(1 row)

Time: 0.707 ms

postgres=> delete from t_mvcc where id<20000000;

DELETE 19999999

Time: 86116.454 ms (01:26.116)

postgres=> SELECT pg_size_pretty(pg_table_size('t_mvcc'));

pg_size_pretty

----------------

1859 MB

(1 row)

Time: 6.890 mspostgres=> insert into t_mvcc select generate_series(1,40000000),(random()*26)::integer;

INSERT 0 40000000

Time: 393689.209 ms (06:33.689)

postgres=> SELECT pg_size_pretty(pg_table_size('t_mvcc'));

pg_size_pretty

----------------

1859 MB

(1 row)

postgres=> delete from t_mvcc where id<20000000;

DELETE 19999999

Time: 86116.454 ms (01:26.116)

postgres=> SELECT pg_size_pretty(pg_table_size('t_mvcc'));

pg_size_pretty

----------------

1859 MB

(1 row)

往表里insert4000万数据，delete删除2000万，删除数据后表大小并没有改变。原因是删除数据只在数据行上打上标识，并没有做真正的删除，真正的清理死元组操作依赖vacuum进程完成。

Vacuum在PostgreSQL数据库运维过程中非常重要，它会影响到收集统计信息，数据块回收，XID回收。。合理使用vacuum参数会对性能起到非常重要影响，之后的文档中聊一聊autovacuum优化典型案例。

在你的业务场景中有没有类似的场景呢？如果有，会怎么优化？

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

云数据库 postgresql

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

云数据库 postgresql

#mvcc

#autovaccum

登录后参与评论

0 条评论

热度