文章/答案/技术大牛

发布

社区首页 >专栏 >Elasticsearch 学习笔记

Elasticsearch 学习笔记

学徒漠筱歌

发布于 2022-07-17 02:30:33

62600

代码可运行

文章被收录于专栏：ZMHZMH

运行总次数：0

代码可运行

配置说明
Development 与 Production模式说明
参数修改的第二种方式
elasticsearch.yml: es相关配置
jvm.options: jvm的相关参数
log4j2.properties: 日志相关配置
api
可以直接指定 analyzer 进行测试
可以直接指定索引中的字段进行测试
创建文档
查询文档
批量写入文档
批量查询文档
/_cat/nodes?v
/_cat/nodes
/_cluster/status
Rest API
索引 API
文档 API
Analyze API
Elasticsearch 常用术语
Document
Index
Mapping
API
匹配规则参数
copy_to
index
数据类型
多字段特性 multi-fields
自定义Mapping
Dynamic Mapping
Dynamic Templates
Elasticsearch CRUD
Create
Read
Update
Delete
Elasticsearch Query
Query String
Query DSL
Elasticsearch Ingest Node
插件
Filter Plugin - dissect
预定义分词器
standard
simple
whitespace
stop
keyword
pattern
language
中文分词
IK
jieba
Hanlp
THULAC

配置说明

配置文件位于config目录

elasticsearch.yml: es相关配置

cluster.name: 集群名称，以此作为是否同一集群的判断条件
node.name: 节点名称，以此作为集群中不同节点的区分团建
nerwork.host/http.port 网络地址和端口，用于http和transport服务使用
path.data: 数据存储地址
path.log: 日志存储地址

Development 与 Production模式说明

以transport的地址是否绑定在localhost为判断标准(network.host)
Dev模式下启动时会以warning的方式提示配置检查异常
Production模式下启动会以error的方式提示配置检查异常并退出

参数修改的第二种方式

bin/elasticsearch -E配置名=配置值

jvm.options: jvm的相关参数

log4j2.properties: 日志相关配置

api

/_cat/nodes

输出集群的结点信息

/_cat/nodes?v

输出集群的详细结点信息，其中master栏有*表示主结点

/_cluster/status

输出集群的详细信息

Rest API

REST REpresentational State Transfer，表现层状态转移
URL 指定资源，如 Index、Document 等
Http Method 指明资源操作类型，如GET获取、POST更新、PUT新增、DELETE删除

索引 API

es有专门的Index API,用于创建、更新、删除索引配置等

PUT /${index_name} : 创建索引
GET _cat/indices : 查看现有索引
DELETE //${index_name} : 删除索引

文档 API

创建文档

指定 id 创建文档 api

# 创建文档时，如果索引不存在，es 会自动创建对应index、type# request#索引名index_name/类型type/idPUT /test_index/doc/1  {    "username":"alfred",    "age":1}# response{    "_index":"test_index",    "_type":"doc",    "_id":"1",    "_version":1,  # 每次对文档有变化的操作都会更新+1，包含了锁的机制
    "result":"created",    "_shards":{        "total":2,        "successful":1,        "failed":0
    },    "_seq_no":0,    "_primary_term":1}

不指定 id 创建文档 api

# requestPOST /test_index/doc
{    "username":"tom",    "age":20}# response{    "_index":"test_index",    "_type":"doc",    "_id":"Mj-H2ABSmWv7ZHR8Oa3", # 自动生成
    "_version":1,    "result":"created",    "_shards":{        "total":2,        "successful":1,        "failed":0
    },    "_seq_no":0,    "_promary_term":1}

查询文档

指定要查询的文档id

# request#索引名index_name/类型type/idGET /test_index/doc/1# 200 response{    "_index":"test_index",    "_type":"doc",    "_id":"1",    "_version":1,    "found":true,    "_source":{  # 文档的原始数据
        "username":"alfred",        "age":1
    }
}# 404 response{    "_index":"test_index",    "_type":"doc",    "_id":"2", # 不存在的id    "found":false}

搜索所有文档

# request# 用到_search，并把查询语句作为json格式放到http body中发送到 esGET /test_index/doc/_search{    "query":{        "term":{ # 匹配id为1的
            "_id":"1"
        }
    }
}# response{    "took":0, # 查询耗时，单位ms
    "timed_out":false,    "_shards":{        "total":5,        "successful":5,        "skipped":0,        "failed":0
    },    "hits":{        "total":1, # 符合条件的总文档数
        "max_score":1,        "hits":[
            { # 返回的文档详情数据数组，默认前10个文档
                "_index":"test_index",                "_type":"doc",                "_id":"1",                "_version":1,                "_score":1, # 文档的得分
                "_source":{  # 文档的原始数据
                    "username":"alfred",                    "age":1
                }
            },
            {
                ...
            }
        ]
    }
}

批量写入文档

es允许一次创建多个文档，从而减少网络传输开销，提升写入速率

# repuestPOST _bulk# action_type支持: # index 创建文档，如果已经存在就覆盖# create 创建文档，如果已经存在就报错# update 更新文档# delete 删除文档{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"username":"alfred","age":10}
{"delete":{"_index":"test_index","_type":"doc","_id":1}}
{"update":{"_id":"2","_index":"test_index"."_type":"doc"}}
{"doc":{"age":"20"}}# response{    "took":33, 耗时，单位ms    "errors":false,    "items":[ # 每个bulk操作的返回结果
        {            "index":{                "_index":"test_index",                "_type":"doc",                "_id":"1",                "_version":1,                "result":"created",                "_shards":{                    "total":2,                    "successful":1,                    "failed":0
                },                "_seq_no":0,                "_primary_term":1,                "status":201
            }
        },
        {            "delete":{                "_index":"test_index",                "_type":"doc",                "_id":"1",                "_version":2,                "result":"deleted",                "_shards":{                    "total":2,                    "successful":1,                    "failed":0
                },                "_seq_no":0,                "_primary_term":1,                "status":200
            }
        },
        {            "update":{                "_index":"test_index",                "_type":"doc",                "_id":"1",                "_version":2,                "result":"updated",                "_shards":{                    "total":2,                    "successful":1,                    "failed":0
                },                "_seq_no":0,                "_primary_term":1,                "status":200
            }
        }
    ]
}

批量查询文档

# requestGET /_mget
{    "docs":[
        {            "_index":"test_index",            "_type":"doc",            "_id":"1"
        },
        {            "_index":"test_index",            "_type":"doc",            "_id":2
        }
    ]
}# response{    "docs":[
        {            "index":"test_index",            "_type":"doc",            "_id":"1",            "found":false # 未找到
        },
        {            "index":"test_index",            "_type":"doc",            "_id":"2",            "_version":2,            "found":true,            "_source":{                "username":"lee",                "age":"20"
            }
        }
    ]
}

Analyze API

es提供了一个测试分词的 api 接口，方便验证分词效果，endpoint 是 _analyze

可以直接指定 analyzer 进行测试

# requestPOST _analyze{    "analyzer": "standard", # 分词器
    "text":"hello world!" # 测试文本}# response{    "tokens": [
    {        "token":"hello",  # 分词结果
        "start_offset":0, # 起始偏移
        "end_offset":5, # 结束偏移
        "type":"<ALPHANUM>",        "position":0 # 分词位置
    },
    {        "token":"world",        "start_offset":6,        "end_offset":11,        "type":"<ALPHANUM>",        "position":1
    }
    ]
}

可以直接指定索引中的字段进行测试

# requestPOST test_index/_analyze{    "field":"username",  # 测试字段
    "text":"hello world!" # 测试文本}

可以自定义分词器进行测试

# requestPOST _analyze{    "tokenizer": "standard",    "filter": ["lowercase"], # 自定义 analyzer
    "text":"Hello World!"}

Elasticsearch 常用术语

Document
文档数据，相对于mysql的一行数据
Index
索引: 所有的 Document 都存储在对应的 Index 中
由具有相同字段的文档列表组成
相对于mysql的table
Type 索引中的数据类型，目前一个index只允许有一个Type，后续可能会移除Type的概念
Node
一个es的运行实例，是集群的构成单元
Cluster
由一个或多个节点组成，对外提供服务
Field 字段，文档的属性
Query DSL 查询语法

Document

Json Object,由字段（Field）组成，常见数据类型如下：
字符串：text, keyword
数值：long，integer，short，byte，double，float，half_float，scaled_float
布尔：boolean
日期：date
二进制：binary
范围类型：integer_range，float_range，long_range，double_range，data_range
每个文档有唯一的 id 标识
自行指定
es 自动生成
元数据，用于标准文档的相关信息（Document MetaData）
_index: 文档所在的索引名
_type: 文档所在的类型名
_id: 文档唯一id
_uid: 组合id, 由 _type 和 _id 组成(6.x _type不再起作用，同 _id 一样)
_source: 文档的原始 Json 数据, 可以从这里获取每个字段的内容
_all: 整合所有字段内容到该字段, 默认禁用

Index

类别mysql的table
索引中存储具有相同结构的文档（Document）
每个索引都有自己的mapping 定义，用于定义字段名和类型
一个集群可以有多个索引，如：
nginx-log-2017-01-01
nginx-log-2017-01-02
nginx-log-2017-01-03
nginx 日志存的时候可以按照日期每天生成一个索引来存储

Mapping

类似数据库中的表结构定义：

定义 Index 下的字段名
定义字段的类型，比如数值型、字符串型、布尔型等
定义倒排索引相关的配置，比如是否索引、记录 position 等
测试

# requestGET /test_index/_mapping# response{    "test_index": { # 索引
        "mappings": {            "doc": { # type                "properties": {                    "age": {                        "type": "integer"
                    },                    "username": {                        "type": "keyword"
                    }
                }
            }
        }
    }
}

自定义Mapping

测试：

# requestPUT my_index
{    "mappings": { # mappings 关键词
        "doc": { # type            "properties": {                "title": {                    "type": "text"
                },                "name": {                    "type": "keyword"
                },                "age": {                    "type": "integer"
                }
            }
        }
        
    }
}# response{    "acknowledged": true,    "shards_acknowledge": true,    "index": "my_index"}

类型一旦设定后，禁止直接修改，因为 Lucene 实现的倒排索引生成后不允许修改
重新建立新的索引，然后做 reindex 操作
允许新增字段
通过 dynamic 参数来控制字段的新增
true（默认）: 允许自动新增字段
false: 不允许字段新增字段，但是文档可以正常写入，但无法对字段进行查询等操作
strict: 文档不能写入，报错

# requestPUT my_index{    "mappings": {        "my_type": {            "dynamic": false,            "properties": {                "user": {                    "properties": {                        "name": {                            "type": "text"
                        },                        "social_networds": {                            "dynamic": true,                            "properties": {}
                        }
                    }
                }
            }
        }
    }}

copy_to

将该字段的值复制到目标字段，实现类型 _all 的作用
不会出现在 _source 中，只用来搜索

PUT my_index
{    "mappings": {        "doc": {            "properties":{                "first_name":{                    "type": "text",                    "copy_to": "full_name"
                },                "last_name":{                    "type": "text",                    "copy_to": "full_name"
                },                "full_name":{                    "type":"text"
                }
            }
        }
    }
}

PUT my_index/doc/1{    "first_name":"John",    "last_name":"Smith"}

GET my_index/_search
{    "query":{        "match": {            "full_name":{                "query":"John Smith",                "operator": "and"
            }
        }
    }
}

index

控制当前字段是否索引，默认为true，即记录索引，flase 表示不记录，即不可搜索

# requestPUT my_index
{    "mappings":{        "doc": {            "properties": {                "cookie": {                    "type": "text",                    "index": false
                }
            }
        }
    }
}

PUT my_index/doc/1
{    "cookie":"name=alfred"}GET my_index/_search
{    "query":{        "match": {            "cookie":"name"
        }
    }
}# response{    "error":{        "root_cause":[            ......
            "index": "my_index3",            "caused_by":{                "type":"illegal_argument_exception",                "reason":"Cannot search on field [cookie] since it is not indexed"
            }
        ]
    },    "status":400
}

index_options 用于控制倒排索引记录的内容，有如下4种配置
docs 只记录 doc id
freqs 记录 doc id 和 term ferquencies
positions 记录 doc id、term frequencies、term position 和 character offsets
text 类型默认配置为 positions, 其他默认为 docs
记录内容越多，占用空间越大

# requestPUT my_index{    "mappings":{        "doc":{            "properties":{                "cookie":{                    "type":"text",                    "index_options":"offsets"
                }
            }
        }
    }
}

null_value
当字段遇到 null 值是的处理策略，默认为 null 时，即空值，此时 es 会忽略该值。可以通过设定该值设定字段的默认值。

# requestPUT my_index{    "mappings":{        "my_type":{            "properties": {                "status_code":{                    "type": "keyword".                    "null_value":"NULL"
                }
            }
        }
    }
}

数据类型

核心数据类型
字符串型 text、keyword
数值型 long、integer、short、byte、double、float、half_float、scaled_float
日期类型 date
布尔类型 boolean
二进制类型 binary
范围类型 integer_range、float_range、long_range、double_range、date_range
复杂数据类型
数组类型 array
对象类型 object
嵌套类型 nested object
地理位置数据类型
geo_point
geo_shape
专用类型
ip 记录 ip 地址
completion 实现自动补全
token_count 记录分词数
murmur3 记录字符串 hash 值
percolator
join

多字段特性 multi-fields

允许对同一个自动采用不同的配置，比如分词，场景例子如对人名实现拼音搜索，只需要在人名中新增一个子字段为pinyin 即可

# request{    "test_index":{        "mappings":{            "doc":{                "properties":{                    "username":{                        "type":"text",                        "fields":{                            "pinyin":{                                "type":"text",                                "analyzer":"pinyin"
                            }
                        }
                    }
                }
           }
        }
    }
}GET test_index/_search
{    "query":{        "match":{            "username_pinyin":"hanhan"
        }
    }
}

Dynamic Mapping

es 可以自动识别文档字段类型，从而降低用户使用成本，如下：

# requestPUT /test_index/doc/1{    "username":"alfred",    "age":1}

GET /test_index/_mapping# response{    "test_index":{        "mappings":{            "doc":{                "properties": {                    "age":{                        "type":"long"
                    },                    "username":{                        "type":"test",                        "fields":{                            "keyword":{                                "type":"keyword",  # es自动识别 age 为long 类型，username 为 text 类型
                                "ignore_above":256
                            }
                        }
                    }
                }
            }
        }
    }
}

es 是依靠 JSON 文档的字段类型来实现自动识别字段类型，支持的类型如下:

JSON 类型	es 类型
null	忽略
boolean	boolean
浮点类型	float
整数	long
object	object
array	由第一个非 null 值的类型决定
string	匹配为日期则设定为date 类型（默认开启），匹配为数组的话设为 float 或 long 类型（默认关闭），设为 text 类型，并附带 keyword 的子字段

# requestPUT /test_index/doc/1{    "username":"alfred",    "age":14,    "birth":"1988-10-10",    "married":false,    "year":"18",    "tags":["boy", "fashion"],    "money":100.1}

GET /test_index/_mapping# response{    "test_index":{        "mappings":{            "doc":{                "properties":{                    "age":{                        "type":"long"
                    },                    "birth":{                        "type":"date"
                    },                    "married":{                        "type":"boolean"
                    },                    "money":{                        "type":"float"
                    },                    "tags":{                        "type":"text",                        "fields":{                            "keyword":{                                "type":"keyword",                                "ignore_above":256
                            }
                        }
                    },                    "username":{                        "type":"text",                        "fields":{                            "keyword":{                                "type":"keyword",                                "ignore_above":256
                            }
                        }
                    },                    "year":{                        "type":"text",                        "fields":{                            "keyword":{                                "type":"keyword",                                "ignore_above":256
                            }
                        }
                    }
                }
            }
        }
    }
}

日期的自动识别可以自行配置日期格式，以满足各种需求
YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
默认是["strict_date_optional_time", "yyyy/MM/dd HH:mm:ss Z"]
strict_date_optional_time 是 ISO datetime 格式，完整格式类似下面:
dynamic_date_formats 可以自定义日期类型
date_detection 可以关闭日期的自动识别的机制

# requestPUT my_index{    "mappings":{        "my_type":{            "dynamic_date_formats":["MM/dd/yyyy"]
        }
    }
}

PUT my_index/my_type/1
{    "create_date":"09/25/2015"}# 关闭日期自动识别机制PUT my_index{    "mappings":{        "my_type":{            "date_detection":false
        }
    }
}

字符串是数字时，默认不会自动识别为整数，因为字符串中出现数字是完全合理的
numeric_detection 可以开启字符串中数字的字段识别，如下：

# requestPUT my_index{    "mappings":{        "my_type":{            "numeric_detection":true
        }
    }
}
PUT my_index/my_type/1
{    "my_float":"1.0",    "my_integer":"1"}# responseGET my_index/_mapping{    "my_index1":{        "mappings":{            "my_type":{                "numeric_detection":true,                "properties":{                    "my_float":{                        "type":"float"
                    },                    "my_integer":{                        "type":"long"
                    }
                }
            }
        }
    }
}

Dynamic Templates

允许根据 es 自动识别的数据类型、字段名等来动态设定字段类型，可以实现如下效果：
所有字符串类型都设定为 keyword 类型，即默认不分词
所有以 message 开头的字段都设定为 text 类型，即分词
所有以 long_ 开头的字段都设定为 long 类型
所有字段匹配为 double 类型的都设定为 float 类型，以节省空间

API

# requestPUT test_index{    "mappings":{        "doc":{            "dynamic_templates":[ # 数组，可指定多个匹配规则
            {                "strings":{ # template 的名称
                    "match_mapping_type":"string", # 匹配规则
                    "mapping":{ # 设置 mapping 信息
                        "type":"keyword"
                    }
                }
            }
            ]
        }
    }
}

匹配规则参数

match_mapping_type: 匹配 es 自动识别的字段类型，如boolean,long,string等
match/unmatch: 匹配字段名
path_match/path_unmatch: 匹配路径，用于匹配object类型的内部字段

# 字符串默认使用 keyword 类型# es默认会为字符串设置 text 类型，并增加一个 keyword 的子字段# requestPUT test_index
{    "mappings":{        "doc":{            "dynamic_templates":[
            {                "strings_as_keywords":{                    "match_mapping_type":"string",                    "mapping":{                        "type":"keyword"
                    }
                }
            }
            ]
        }
    }
}

# 以 message 开头的字段都设置为 text 类型# requestPUT test_index
{    "mappings":{        "doc":{            "dynamic_templates":[
            {                "message_as_text":{                    "match_mapping_type":"string",                    "match":"message* ",                    "mapping":{                        "type":"text"
                    }
                }
            }
            ]
        }
    }
}

# double 类型设定为 float，节省空间# requestPUT test_index
{    "mappings":{        "doc": {            "dynamic_templates":[
            {                "double_as_float":{                    "match_mapping_type":"double",                    "mapping":{                        "type":"float"
                    }
                }
            }
            ]
        }
    }
}

Elasticsearch CRUD

Create

# 请求  /{Index}/{Type}/{id}POST /accouts/person/1{    "name": "John",    "lastname": "Doe",    "job_description": "Systems administrator and Linux specialit"}# 响应{    "_index": "accounts",    "_type": "person",    "_id":"1",    "_version": 1,    "result": "created",    "_shards": {        "total": 2,        "successful": 1,        "failed": 0
    },    "created": true}

Read

和Create不同的是，使用GET

# 请求  /{Index}/{Type}/{id}GET /accouts/person/1
{    "name": "John",    "lastname": "Doe",    "job_description": "Systems administrator and Linux specialit"}# 响应{    "_index": "accounts",    "_type": "person",    "_id":"1",    "_version": 1,    "result": "created",    "_shards": {        "total": 2,        "successful": 1,        "failed": 0
    },    "created": true}

Update

# 请求POST /accounts/person/1/_update
{    "doc":{        "job_description": "Systems administrator and Linux specialist"
    }
}# 响应{    "_index": "accounts",    "_type": "person",    "_id": "1",    "_version": 2,    "result": "updated",    "_shards": {        "total": 2,        "successful":1,        "failed":0
    }
}

Delete

# 请求DELETE /accounts/person/1DELETE /accounts# 响应{    "found": true,    "_index": "acounts",    "_type": "person",    "_id": "1",    "_version":3,    "result":"deleted",    "_shards":{        "total":2,        "successful":1,        "failed":0
    }
}

Elasticsearch Query

Query String

# 请求GET /accounts/person/_search?q=john

Query DSL

# 请求GET /accounts/person/_search{    "query": {        "match": {            "name":"json"
        }
    }
}

Elasticsearch Ingest Node

因为 filebeat 缺乏数据转换能力，所以官方新增 Node: Elasticsearch Ingest Node 作为能力补充，在数据写入es前进行数据转换

pipeline api

插件

Filter Plugin - dissect

基于分隔符原理解析数据，解决 grok 解析时消耗过多 cpu 资源的问题

%{clientip} %{ident} %{auth} [%{timestamp}] "%{request}" % {response} %{bytes} "%{referrer}" "%{agent}"

预定义分词器

standard

默认分词器

tokenizer:

standard

token filters:

standard
lower case
stop

特性：

按词切分，支持多语言
小写处理

simple

tokenizer:

lower case

特性：

按照非字母切分
小写处理

whitespace

tokenizer:

whitespace

特性:

按照空格切分

stop

按照 stop word 语气助词等修饰性的词语切分，如 the、an、的、这等等

tokenizer:

lower case

token filters:

stop

特性:

比simple多了stop word处理

keyword

tokenizer: keyword

特性:

不分词，直接将输入作为一个单词输出

pattern

tokenizer:

pattern

token filters:

lower case
stop

特性:

通过正则表达式自定义分隔符
默认是\W+，即非字词的符号作为分隔符

language

特性:

提供了 30+ 常见的分词器

中文分词

IK

实现中英文单词的切分，支持ik_smart、ik_maxword等模式
可自定义词库，支持热更新分词字典

jieba

python 中最流行的分词系统，支持分词和词性标注
支持繁体分词、自定义词典、并行分词等

Hanlp

由一系列模型与算法组成的java工具包

THULAC

由清华大学自然语言处理与社会人文计算实验室研制推出的一套中文词法分析工具包，具有中文分词和词性标注功能

我的博客即将同步至腾讯云开发者社区，邀请大家一同入驻：https://cloud.tencent.com/developer/support-plan?invite_code=1y1u52rqoxs5s

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2020-04-24，如有侵权请联系 cloudcommunity@tencent.com 删除

java

中文分词

Elasticsearch Service

网络安全

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

java

中文分词

Elasticsearch Service

网络安全

登录后参与评论

暂无评论

ElasticSearch 6.x 学习笔记：14.mapping参数

https html Elasticsearch Service 编程算法

官方文档 https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-params.html ElasticSearch提供了丰富的映射参数对字段的映射进行参数设计，比如字段的分词器、字段权重、日期格式、检索模型等等。

程裕强

2022/05/06

1.3K0

数万字长文带你入门elasticsearch

node.js 中文分词网络安全搜索引擎 api

es会根据创建的文档动态生成映射，可以直接将动态生成的映射直接复制到需要自定义的mapping中

没有故事的陈师傅

2022/04/05

1.8K0

Elasticsearch的Index和Mapping（二）

https 网络安全

本文使用的Elasticsearch版本为6.5.4，基本命令以及操作大都通用。下面通过MySQL与Elasticsearch的对比图，让我们更好地理解接下来的增删改操作。

用户3467126

2020/02/25

2.9K0

Elasticsearch实战(六)-mapping映射

网络安全 es ide 编程算法

将该字段的值复制到目标字段，实现类似 _all 的作用，不会出现在 _source 中，只用来搜索

JavaEdge

2021/02/23

7670

PHP操作Elasticsearch

es composer 编程算法 php

1、在 composer.json 文件中引入 elasticsearch-php：

码农编程进阶笔记

2021/07/20

9000

Elasticsearch 6.x Mapping设置

Elasticsearch Service

需要注意的是，在索引中定义太多字段可能会导致索引膨胀，出现内存不足和难以恢复的情况，下面有几个设置：

小旋锋

2019/01/21

3.2K0

ElasticSearch系列18：Mapping 设计指南

analyzer 网络安全文件存储 Elasticsearch Service

ElasticSearch 的 mapping 该如何设计，才能保证检索的高效？想要回答这个问题，就需要全面系统地掌握 mapping 各种参数的含义以及其适用的场景。（ps：本文基于ElasticSearch 7.7.1）

方才编程_公众号同名

2020/11/13

1.6K0

Springboot2.x整合ElasticSearch7.x实战（三）

Elasticsearch Service 大数据

还没开始的同学，建议先读一下系列攻略目录：Springboot2.x整合ElasticSearch7.x实战目录

JavaPub

2021/01/10

3.6K1

ES常用知识点整理第一部分

api 中文分词 ide 编程算法 Elasticsearch Service

第三列倒排索引包含的信息为(文档ID，单词频次，<单词位置>)，比如单词“乔布斯”对应的倒排索引里的第一项(1;1;<1>)意思是，文档1包含了“乔布斯”，并且在这个文档中只出现了1次，位置在第一个。

大忽悠爱学习

2023/02/13

5240

019.Elasticsearch搜索原理

编程算法

搜索"mother like little dog"，首先分词，然后查看这些单词出现过的id，就返回了id为1和2的这两条文档

CoderJed

2020/07/14

3450

ElasticSearch 6.x 学习笔记：13.mapping元字段

Elasticsearch Service https 网络安全 html lucene/solr

mapping元字段官网文档 https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-fields.html#_document_source_meta_fields

程裕强

2022/05/06

5180

Elasticsearch Mapping

analyzer 编程算法存储 Elasticsearch Service

Elasticsearch Mapping用于定义文档。比如：文档所拥有的字段、文档中每个字段的数据类型、哪些字段需要进行索引等。本文将先后从mapping type、mapping parameter、mapping field和mapping explosion这四个维度展开。

程序猿杜小头

2022/12/01

9260

Elasticsearch初检索及高级

http ide Elasticsearch Service 网络安全全文检索

PUT customer/external/1 ：在 customer 索引下的 external 类型下保存 1号数据

乐心湖

2021/01/18

1.1K0

ElasticSearch核心知识讲解

网络安全 es 编程算法 json

倒排索引倒排索引建立流程倒排索引具体组成分词Analysis（文本分析）Analyzer（分词器）分词测试mapping字段数据类型核心类型字符串类型数字类型日期类型二进制类型范围类型复杂类型对象类型嵌套类型地理类型经纬度类型地理区域类型特殊类型字段的公共属性：字符串类型常用的其他属性dynamic动态映射静态映射精确映射查询matchtermmatch_phrase

857技术社区

2022/05/17

1.4K0

ElasticSearch最全详细使用教程：入门、索引管理、映射详解

编程算法 https 网络安全 Elasticsearch Service http

墨墨导读：本文介绍了ElasticSearch的必备知识：从入门、索引管理到映射详解。

数据和云

2019/08/12

3.2K0

elasticsearch创建索引的几种方式及分析

Elasticsearch Service 2023腾讯·技术创作特训营第三期

当elasticsearch返回true时，就代表着我们在elasticsearch中创建了一个名为test_index的索引已经成功，同时在创建索引时没有为该索引指定任何字段。

空洞的盒子

2023/11/15

5.4K2

012.Elasticsearch基础API入门以及term与match综合测试

网络安全命令行工具 http tcp/ip java

当向一个不存在的index中添加document时，可以自动创建索引，也可以根据传入的数据自动创建mapping，ES也会自动对这些文档进行倒排索引

CoderJed

2020/06/19

7890

Elasticsearch学习（五）Elasticsearch中的mapping问题，Search 搜索详解

网络安全搜索引擎 Elasticsearch Service

Mapping在Elasticsearch中是非常重要的一个概念。决定了一个index中的field使用什么数据格式存储，使用什么分词器解析，是否有子字段等。

一写代码就开心

2021/03/02

1.8K0

Elasticsearch学习（五）Elasticsearch中的mapping问题，Search 搜索详解

Elasticsearch调优实践

Elasticsearch Service nosql 搜索引擎全文检索数据分析

本文基于ES 5.6.4，从性能和稳定性两方面，从linux参数调优、ES节点配置和ES使用方式三个角度入手，介绍ES调优的基本方案。当然，ES的调优绝不能一概而论，需要根据实际业务场景做适当的取舍和调整，文中的疏漏之处也随时欢迎批评指正。

技术姐

2018/07/04

13.9K3

ElasticSearch

网络安全 https ide es Elasticsearch Service

保存在某个index下，某种type的一个数据document，文档是json格式的，document就像是mysql中的某个table里面的内容。每一行对应的列叫属性

2022/03/20

1.2K0