首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >使用esdiff实现ES的数据比对

使用esdiff实现ES的数据比对

原创
作者头像
保持热爱奔赴山海
发布2025-09-18 15:33:37
发布2025-09-18 15:33:37
950
举报
文章被收录于专栏:数据库相关数据库相关

项目地址 https://github.com/olivere/esdiff (该项目已经归档,所以对后续ES可能出现不支持的情况,使用需要小心)

esdiff 工具迭代 Elasticsearch 5.x、6.x 或 7.x 中的两个索引 并在这些索引中的文档之间执行差异。

它通过滚动索引来实现这一点。为了实现稳定的排序 order,它默认使用 _id(在 ES 5.x 中_uid)。

您需要 Go 1.11 或更高版本才能编译。

安装方式

代码语言:txt
复制
go install github.com/olivere/esdiff@latest

用法示例

首先,我们需要设置两个 Elasticsearch 集群进行测试, 然后播种一些文档。

代码语言:txt
复制
$ mkdir -p data

# Create an Elasticsearch 5.x cluster on http://localhost:19200
# Create an Elasticsearch 6.x cluster on http://localhost:29200
# Create an Elasticsearch 7.x cluster on http://localhost:39200

# Increase your docker memory limit (6.0GiB) in Docker App > Preferences > Advanced.
$ docker-compose up -d

Creating esdiff_elasticsearch5_1 ... done
Creating esdiff_elasticsearch6_1 ... done
Creating esdiff_elasticsearch7_1 ... done

# Check docker containers
$ docker-compose ps
         Name                        Command               State                 Ports
----------------------------------------------------------------------------------------------------
esdiff_elasticsearch5_1   /bin/bash bin/es-docker          Up      0.0.0.0:19200->9200/tcp, 9300/tcp
esdiff_elasticsearch6_1   /usr/local/bin/docker-entr ...   Up      0.0.0.0:29200->9200/tcp, 9300/tcp
esdiff_elasticsearch7_1   /usr/local/bin/docker-entr ...   Up      0.0.0.0:39200->9200/tcp, 9300/tcp

# Check docker container logs
$ docker-compose logs -f elasticsearch5
Attaching to esdiff_elasticsearch5_1
elasticsearch5_1  | [2019-07-02T14:17:33,351][WARN ][o.e.b.JNANatives         ] Unable to lock JVM Memory: error=12, reason=Cannot allocate memory
elasticsearch5_1  | [2019-07-02T14:17:33,355][WARN ][o.e.b.JNANatives         ] This can result in part of the JVM being swapped out.
elasticsearch5_1  | [2019-07-02T14:17:33,355][WARN ][o.e.b.JNANatives         ] Increase RLIMIT_MEMLOCK, soft limit: 83968000, hard limit: 83968000
elasticsearch5_1  | [2019-07-02T14:17:33,356][WARN ][o.e.b.JNANatives         ] These can be adjusted by modifying /etc/security/limits.conf, for example:
elasticsearch5_1  | # allow user 'elasticsearch' mlockall
........

# Add some documents
$ ./seed/01.sh

# Compile
$ go build

让我们做一个简单的差异:

相同的集群和相同的文档应仅返回未更改的文档:

代码语言:txt
复制
$ ./esdiff -u=true 'http://localhost:19200/index01/tweet' 'http://localhost:19200/index01/tweet'
Unchanged       1
Unchanged       2
Unchanged       3

以下示例将返回 ES 5.x 和 ES 6.x 中索引之间的差异:

代码语言:txt
复制
$ ./esdiff -u=true 'http://localhost:19200/index01/tweet' 'http://localhost:29200/index01/_doc'
Unchanged       1
Deleted 2
Updated 3       {*diff.Document}.Source["message"]:
        -: "Playing the piano is fun as well"
        +: "Playing the guitar is fun as well"

Created 4       {*diff.Document}:
        -: (*diff.Document)(nil)
        +: &diff.Document{ID: "4", Source: map[string]interface {}{"message": "Climbed that mountain", "user": "sandrae"}}

ES 5.x 和 ES 7.x - 不同的文档:

代码语言:txt
复制
$ ./esdiff -u=true 'http://localhost:19200/index01/tweet' 'http://localhost:39200/index01/_doc'
Unchanged       1
Deleted 2
Updated 3       {*diff.Document}.Source["message"]:
        -: "Playing the piano is fun as well"
        +: "Playing the flute, oh boy"

Created 5       {*diff.Document}:
        -: (*diff.Document)(nil)
        +: &diff.Document{ID: "5", Source: map[string]interface {}{"message": "Ran that marathon", "user": "sandrae"}}

输出选项

请注意,您可以传递其他选项来过滤 您感兴趣的模式类型。例如,如果您还 想要查看所有未更改的文档,但不要查看已更改的文档 deleted,使用 -u=true -d=false:

代码语言:txt
复制
$ ./esdiff -u=true -d=false 'http://localhost:19200/index01/tweet' 'http://localhost:29200/index01/_doc'
Unchanged       1
Updated 3       {*diff.Document}.Source["message"]:
        -: "Playing the piano is fun as well"
        +: "Playing the guitar is fun as well"

Created 4       {*diff.Document}:
        -: (*diff.Document)(nil)
        +: &diff.Document{ID: "4", Source: map[string]interface {}{"message": "Climbed that mountain", "user": "sandrae"}}

格式选项

请改用 JSON 作为输出格式。䋰 jq 和 吉克 这是相当强大的 (以及其他与 JQ 相关的工具)。

代码语言:txt
复制
$ ./esdiff -o=json 'http://localhost:29200/index01/_doc' 'http://localhost:39200/index01/_doc' | jq 'select(.mode | contains("deleted"))'
{
  "mode": "deleted",
  "_id": "4",
  "src": {
    "_id": "4",
    "_source": {
      "message": "Climbed that mountain",
      "user": "sandrae"
    }
  },
  "dst": null
}

过滤选项

您还可以传递查询来过滤源和/或目标, 分别使用 -sf 和 -df 参数:

代码语言:txt
复制
$ $ ./esdiff -o=json -sf='{"term":{"user":"olivere"}}' 'http://localhost:29200/index01/_doc' 'http://localhost:19200/index01/_doc'
{"mode":"deleted","_id":"1","src":{"_id":"1","_source":{"message":"Welcome to Golang","user":"olivere"}},"dst":null}

所有选项

使用 -h 显示所有选项:

代码语言:txt
复制
$ ./esdiff -h
General usage:

        esdiff [flags] <source-url> <destination-url>

General flags:
  -a    Print added docs (default true)
  -c    Print changed docs (default true)
  -d    Print deleted docs (default true)
  -df string
        Raw query for filtering the destination, e.g. {"term":{"name.keyword":"Oliver"}}
  -dsort string
        Field to sort the destination, e.g. "id" or "-id" (prepend with - for descending)
  -exclude string
        Raw source filter for excluding certain fields from the source, e.g. "hash_value,sub.*"
  -include string
        Raw source filter for including certain fields from the source, e.g. "obj.*"
  -o string
        Output format, e.g. json
  -sf string
        Raw query for filtering the source, e.g. {"term":{"user":"olivere"}}
  -size int
        Batch size (default 100)
  -ssort string
        Field to sort the source, e.g. "id" or "-id" (prepend with - for descending)
  -u    Print unchanged docs
  -replace-with string
        Replace the id in the document with the unique field you need from the source,e.g. "unique_key"

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 安装方式
  • 用法示例
    • 输出选项
    • 格式选项
    • 过滤选项
    • 所有选项
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档