Elasticsearch必知必会-基础篇-京东云开发者社区

# 1.索引的定义

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/indices.html

## 索引的全局认知

## 索引的定义

> 定义：
> > 1. 相同文档结构（Mapping）文档的结合
> > 2. 由唯一索引名称标定
> > 3. 一个集群中有多个索引
> > 4. 不同的索引代表不同的业务类型数据
>
> 注意事项：
> > 1. 索引名称不支持**大写**
> > 2. 索引名称最大支持255个字符长度
> > 3. 字段的名称，支持大写，不过建议全部统一小写

## 索引的创建

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-10BxSJ5qzXGIGtUBJ.png)

***

## index-settings 参数解析

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-modules.html

> 注意：
> > 静态参数索引创建后，不再可以修改，动态参数可以修改
>
> 思考：
> > 1. **为什么主分片创建后不可修改？**
> >
> > ```java
> > A document is routed to a particular shard in an index using the following formula:
> > <shard_num = hash(_routing) % num_primary_shards>
> > the defalue value userd for _routing is the document`s _id
> > ```
> >
> > * es中写入数据，是根据上述的公式计算文档应该存储在哪个分片中，后续的文档读取也是根据这个公式，一旦分片数改变，数据也就找不到了
> > * 简单理解 根据ID做Hash 然后再 除以 主分片数 取余，被除数改变，结果就不一样了
> >
> > 1. **如果业务层面根据数据情况，确实需要扩展主分片数，那怎么办？**
> >
> > * reindex 迁移数据到另外一个索引 https://www.elastic.co/guide/en/elasticsearch/reference/8.1/docs-reindex.html

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-10RcuIYluJnZ9A8cZ.png)

***

## 索引的基本操作

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-10xVRZgsuqg6z9xXm.png)

***

# 2.Mapping-Param之dynamic

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic.html

## 核心功能

> 自动检测字段类型后添加字段
> 
> 也就是哪怕你没有在es的mapping中定义该字段，es也会动态的帮你检测字段类型

## 初识dynamic

```shell
# 删除test01索引，保证这个索引现在是干净的
DELETE test01

# 不定义mapping，直接一条插入数据试试看,
POST test01/_doc/1
{
  "name":"kangrui10"
}

# 然后我们查看test01该索引的mapping结构 看看name这个字段被定义成了什么类型
# 由此可以看出，name一级为text类型，二级定义为keyword，但其实这并不是我们想要的结果，
# 我们业务查询中name字段并不会被分词查询，一般都是全匹配(and name = xxx)
# 以下的这种结果，我们想要实现全匹配 就需要 name.keyword = xxx  反而麻烦
GET test01/_mapping
{
  "test01" : {
    "mappings" : {
      "properties" : {
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}
```

## dynamic的可选值

| 可选值 | 说明 | 解释 |
| --- | --- | --- |
| true | New fields are added to the mapping (default). | 创建mapping时，如果不指定dynamic的值，**默认true**，即如果你的字段没有收到指定类型，就会es帮你动态匹配字段类型 |
| false | New fields are ignored. These fields will not be indexed or searchable, but will still appear in the \_source field of returned hits. These fields will not be added to the mapping, and new fields must be added explicitly. | 若设置为**false**，如果你的字段**没有在es的mapping中创建**，那么新的字段，一样可以写入，但是不能被查询，mapping中也不会有这个字段，也就是被写入的字段，不会被创建索引 |
| strict | If new fields are detected, an exception is thrown and the document is rejected. New fields must be explicitly added to the mapping. | 若设置为strict，如果新的字段，**没有在mapping中创建字段，添加会直接报错**，生产环境推荐，更加严谨。示例如下,如要新增字段，就必须手动的新增字段 |

## 动态映射的弊端

* 字段匹配相对准确，但不一定是用户期望的
    * 比如现在有一个text字段，es只会给你设置为默认的standard分词器，但我们一般需要的是ik中文分词器
* 占用多余的存储空间
    * string类型匹配为text和keyword两种类型，意味着会占用更多的存储空间
* mapping爆炸
    * 如果不小心写错了查询语句，get用成了put误操作，就会错误创建很多字段

***

# 3.Mapping-Param之doc\_values

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/doc-values.html

## 核心功能

> DocValue其实是[Lucene](https://so.csdn.net/so/search?q=Lucene&spm=1001.2101.3001.7020)在构建倒排索引时，会额外建立一个有序的**正排索引**（基于document => field value的映射列表）
> 
> DocValue本质上是一个序列化的 列式存储，这个结构非常适用于聚合（aggregations）、排序（Sorting）、脚本（scripts access to field）等操作。而且，这种存储方式也非常便于压缩，特别是数字类型。这样可以减少磁盘空间并且提高访问速度。
> 
> 几乎所有字段类型都支持DocValue，除了text和annotated\_text字段。

## 何为正排索引

> 正排索引其实就是类似于数据库表，通过id和数据进行关联，通过搜索文档id，来获取对应的数据

## doc\_values可选值

* true：默认值，默认开启
* false：需手动指定，设置为false后，sort、aggregate、access the field from script将会无法使用，但会节省磁盘空间

## 真题演练

```shell
// 创建一个索引，test03，字段满足以下条件
//     1. speaker: keyword
//     2. line_id: keyword and not aggregateable
//     3. speech_number: integer

PUT test03
{
  "mappings": {
    "properties": {
      "speaker": {
        "type": "keyword"
      },
      "line_id":{
        "type": "keyword",
        "doc_values": false
      },
      "speech_number":{
        "type": "integer"
      }
    }
  }
}
```

***

# 4.分词器analyzers

## ik中文分词器安装

> https://github.com/medcl/elasticsearch-analysis-ik

## 何为倒排索引

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-11fmN8SrPFKHnsJSH.png)

## 数据索引化的过程

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-11KeteerKzBkbdeeb.png)

## 分词器的分类

> 官网地址: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-analyzers.html

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-11mPHmnxzTAbvSe9U.png)

***

# 5.自定义分词

## 自定义分词器三段论

### 1.Character filters 字符过滤

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-charfilters.html
> 
> 可配置0个或多个

[HTML Strip Character Filter](https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-htmlstrip-charfilter.html)

> 用途：删除HTML元素，如 <b>，并解 码HTML实体，如＆amp</b>

[Mapping Character Filter](https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-mapping-charfilter.html)

> 用途：替换指定字符

[Pattern Replace Character Filter](https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-pattern-replace-charfilter.html)

> 用途：基于正则表达式替换指定字符

### 2.Tokenizer 文本切为分词

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-tokenizers.html#\_word\_oriented\_tokenizers
> 
> 只能配置一个
> 
> 用分词器对文本进行分词

### 3.Token filters 分词后再过滤

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-tokenfilters.html
> 
> 可配置0个或多个
> 
> 分词后再加工，比如转小写、删除某些特殊的停用词、增加同义词等

## 真题演练

> 有一个文档，内容类似 dag & cat, 要求索引这个文档，并且使用match\_parase\_query, 查询dag & cat 或者 dag and cat,都能够查到
> 
> 题目分析：
> > 1.何为match\_parase\_query：match\_phrase **会将检索关键词分词**。match\_phrase的分词结果必**须在被检索字段的分词中都包含**，而且**顺序必须相同，而且**默认必须都是连续的。
> > 
> > 2.要实现 & 和 and 查询结果要等价，那么就需要自定义分词器来实现了，定制化的需求
> > 
> > 3.如何自定义一个分词器：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-custom-analyzer.html
> > 
> > 4.解法1核心使用功能点，[Mapping Character Filter](https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-mapping-charfilter.html)
> > 
> > 5.解法2核心使用功能点，https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-synonym-tokenfilter.html

### 解法1

```shell
# 新建索引
PUT /test01
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "char_filter": [
            "my_mappings_char_filter"
          ],
          "tokenizer": "standard",
        }
      },
      "char_filter": {
        "my_mappings_char_filter": {
          "type": "mapping",
          "mappings": [
            "& => and"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}
// 说明
// 三段论之Character filters，使用char_filter进行文本替换
// 三段论之Token filters，使用默认分词器
// 三段论之Token filters，未设定
// 字段content 使用自定义分词器my_analyzer

# 填充测试数据
PUT test01/_bulk
{"index":{"_id":1}}
{"content":"doc & cat"}
{"index":{"_id":2}}
{"content":"doc and cat"}

# 执行测试,doc & cat || oc and cat 结果输出都为两条
POST test01/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "content": "doc & cat"
          }
        }
      ]
    }
  }
}
```

### 解法2

```shell
# 解题思路，将& 和 and  设定为同义词，使用Token filters
# 创建索引
PUT /test02
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_synonym_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "my_synonym"
          ]
        }
      },
      "filter": {
        "my_synonym": {
          "type": "synonym",
          "lenient": true,
          "synonyms": [
            "& => and"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_synonym_analyzer"
      }
    }
  }
}
// 说明
// 三段论之Character filters，未设定
// 三段论之Token filters，使用whitespace空格分词器，为什么不用默认分词器？因为默认分词器会把&分词后剔除了，就无法在去做分词后的过滤操作了
// 三段论之Token filters，使用synony分词后过滤器，对&和and做同义词
// 字段content 使用自定义分词器my_synonym_analyzer

# 填充测试数据
PUT test02/_bulk
{"index":{"_id":1}}
{"content":"doc & cat"}
{"index":{"_id":2}}
{"content":"doc and cat"}

# 执行测试
POST test02/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "content": "doc & cat"
          }
        }
      ]
    }
  }
}
```

***

# 6.multi-fields

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/multi-fields.html

> 单字段多类型,比如一个字段我想设置两种分词器

```shell
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "analyzer":"standard",
        "fields": {
          "fieldText": { 
            "type":  "text",
            "analyzer":"ik_smart",
          }
        }
      }
    }
  }
}
```

***

# 7.runtime\_field 运行时字段

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime.html

## 产生背景

> 假如业务中需要根据某两个数字类型字段的差值来排序，也就是我需要一个不存在的字段, 那么此时应该怎么办？
> 
> 当然你可以刷数，新增一个差值结果字段来实现，假如此时不允许你刷数新增字段怎么办？

## 解决方案

## 应用场景

* 一、在不重新建立索引的情况下，向现有文档新增字段
* 二、在不了解数据结构的情况下处理数据
* 三、在查询时覆盖从原索引字段返回的值
* 四、为特定用途定义字段而不修改底层架构

## 功能特性

* 一、Lucene完全无感知，因没有被索引化，没有doc\_values
* 二、不支持评分，因为没有倒排索引
* 三、打破传统先定义后使用的方式
* 四、能阻止mapping爆炸
* 五、增加了API的灵活性
* 六、**注意，会使得搜索变慢**

## 实际使用

* 运行时检索指定，即检索环节可使用（也就是哪怕mapping中没有这个字段，我也可以查询）
* 动态或静态mapping指定，即mapping环节可使用（也就是在mapping中添加一个运行时的字段）

## 真题演练1

```shell
# 假定有以下索引和数据
PUT test03
{
  "mappings": {
    "properties": {
      "emotion": {
        "type": "integer"
      }
    }
  }
}
POST test03/_bulk
{"index":{"_id":1}}
{"emotion":2}
{"index":{"_id":2}}
{"emotion":5}
{"index":{"_id":3}}
{"emotion":10}
{"index":{"_id":4}}
{"emotion":3}

# 要求：emotion > 5, 返回emotion_falg = '1',  
# 要求：emotion < 5, 返回emotion_falg = '-1',  
# 要求：emotion = 5, 返回emotion_falg = '0',  
```

### 解法1

> 检索时指定运行时字段: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-search-request.html
> 
> 该字段本质上是不存在的，所以需要检索时要加上 fields \*

```shell
GET test03/_search
{
  "fields": [
    "*"
  ], 
  "runtime_mappings": {
    "emotion_falg": {
      "type": "keyword",
      "script": {
        "source": """
          if(doc['emotion'].value>5)emit('1');
          if(doc['emotion'].value<5)emit('-1');
          if(doc['emotion'].value==5)emit('0');
          """
      }
    }
  }
}
```

### 解法2

> 创建索引时指定运行时字段：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-mapping-fields.html
> 
> 该方式支持通过运行时字段做检索

```shell
# 创建索引并指定运行时字段
PUT test03_01
{
  "mappings": {
    "runtime": {
      "emotion_falg": {
        "type": "keyword",
        "script": {
          "source": """
          if(doc['emotion'].value>5)emit('1');
          if(doc['emotion'].value<5)emit('-1');
          if(doc['emotion'].value==5)emit('0');
          """
        }
      }
    },
    "properties": {
      "emotion": {
        "type": "integer"
      }
    }
  }
}
# 导入测试数据
POST test03_01/_bulk
{"index":{"_id":1}}
{"emotion":2}
{"index":{"_id":2}}
{"emotion":5}
{"index":{"_id":3}}
{"emotion":10}
{"index":{"_id":4}}
{"emotion":3}
# 查询测试
GET test03_01/_search
{
  "fields": [
    "*"
  ]
}
```

## 真题演练2

```shell
# 有以下索引和数据
PUT test04
{
  "mappings": {
    "properties": {
      "A":{
        "type": "long"
      },
      "B":{
        "type": "long"
      }
    }
  }
}
PUT task04/_bulk
{"index":{"_id":1}}
{"A":100,"B":2}
{"index":{"_id":2}}
{"A":120,"B":2}
{"index":{"_id":3}}
{"A":120,"B":25}
{"index":{"_id":4}}
{"A":21,"B":25}

# 需求：在task04索引里，创建一个runtime字段，其值是A-B，名称为A_B； 创建一个range聚合，分为三级：小于0，0-100，100以上；返回文档数
// 使用知识点：
// 1.检索时指定运行时字段: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-search-request.html
// 2.范围聚合 https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-range-aggregation.html
```

### 解法

```shell
# 结果测试
GET task04/_search
{
  "fields": [
    "*"
  ], 
  "size": 0, 
  "runtime_mappings": {
    "A_B": {
      "type": "long",
      "script": {
        "source": """
          emit(doc['A'].value - doc['B'].value);
          """
      }
    }
  },
  "aggs": {
    "price_ranges_A_B": {
      "range": {
        "field": "A_B",
        "ranges": [
          { "to": 0 },
          { "from": 0, "to": 100 },
          { "from": 100 }
        ]
      }
    }
  }
}
```

***

# 8.Search-highlighted

## highlighted语法初识

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/highlighting.html

***

# 9.Search-Order

## Order语法初识

> 官网文档地址： https://www.elastic.co/guide/en/elasticsearch/reference/8.1/sort-search-results.html

```shell
// 注意：text类型默认是不能排或聚合的，如果非要排序或聚合，需要开启fielddata
GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "match": {
      "customer_last_name": "wood"
    }
  },
  "highlight": {
    "number_of_fragments": 3,
    "fragment_size": 150,
    "fields": {
      "customer_last_name": {
        "pre_tags": [
          "<em>"
        ],
        "post_tags": [
          "</em>"
        ]
      }
    }
  },
  "sort": [
    {
      "currency": {
        "order": "desc"
      },
      "_score": {
        "order": "asc"
      }
    }
  ]
}
```

***

# 10.Search-Page

## page语法初识

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/paginate-search-results.html

```shell
# 注意 from的起始值是 0 不是 1
GET kibana_sample_data_ecommerce/_search
{
  "from": 5,
  "size": 20,
  "query": {
    "match": {
      "customer_last_name": "wood"
    }
  }
}
```

## 真题演练1

### 解法

```shell
# 题目
In the spoken lines of the play, highlight the word Hamlet (int the text_entry field) startint the highlihnt with "#aaa#" and ending it with "#bbb#"
return all of speech_number field lines in reverse order; '20' speech lines per page,starting from line '40'

# highlight 处理 text_entry 字段 ； 关键词 Hamlet 高亮
# page分页：from：40；size:20
# speech_number：倒序

POST test09/_search
{
  "from": 40,
  "size": 20,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "text_entry": "Hamlet"
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "text_entry": {
        "pre_tags": [
          "#aaa#"
        ],
        "post_tags": [
          "#bbb#"
        ]
      }
    }
  },
  "sort": [
    {
      "speech_number.keyword": {
        "order": "desc"
      }
    }
  ]
}
```

***

# 11.Search-AsyncSearch

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/async-search.html

## 发行版本

> 7.7.0

## 适用场景

> 允许用户在异步搜索结果时可以检索，从而消除了仅在查询完成后才等待最终响应的情况

## 常用命令

> * 执行异步检索
>     * POST /sales\*/\_async\_search?size=0
> * 查看异步检索
>     * GET /\_async\_search/id值
> * 查看异步检索状态
>     * GET /\_async\_search/id值
> * 删除、终止异步检索
>     * DELETE /\_async\_search/id值

## 异步查询结果说明

| 返回值 | 含义 |
| --- | --- |
| id | 异步检索返回的唯一标识符 |
| is\_partial | 当查询不再运行时，指示再所有分片上搜索是成功还是失败。在执行查询时，is\_partial=true |
| is\_running | 搜索是否仍然再执行 |
| total | 将在多少分片上执行搜索 |
| successful | 有多少分片已经成功完成搜索 |

***

# 12.Aliases索引别名

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/aliases.html

## Aliases的作用

> 在ES中，索引别名（index aliases）就像一个快捷方式或[软连接](https://so.csdn.net/so/search?q=%E8%BD%AF%E8%BF%9E%E6%8E%A5&spm=1001.2101.3001.7020)，可以指向一个或多个索引。别名带给我们极大的灵活性，我们可以使用索引别名实现以下功能：
>
> 1. 在一个运行中的ES集群中无缝的切换一个索引到另一个索引上（无需停机）
> 2. 分组多个索引，比如按月创建的索引，我们可以通过别名构造出一个最近3个月的索引
> 3. 查询一个索引里面的部分数据构成一个类似数据库的视图（views

## 假设没有别名，如何处理多索引的检索

### 解决方案

> 方式1：POST index\_01,index\_02.index\_03/\_search
> 
> 方式2：POST index\_\*/\_search

## 创建别名的三种方式

### 创建索引的同时指定别名

```shell
# 指定test05的别名为 test05_aliases
PUT test05
{
  "mappings": {
    "properties": {
      "name":{
        "type": "keyword"
      }
    }
  },
  "aliases": {
    "test05_aliases": {}
  }
}
```

### 使用索引模板的方式指定别名

```shell
PUT _index_template/template_1
{
  "index_patterns": ["te*", "bar*"],
  "template": {
    "settings": {
      "number_of_shards": 1
    },
    "mappings": {
      "_source": {
        "enabled": true
      },
      "properties": {
        "host_name": {
          "type": "keyword"
        },
        "created_at": {
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss Z yyyy"
        }
      }
    },
    "aliases": {
      "mydata": { }
    }
  },
  "priority": 500,
  "composed_of": ["component_template1", "runtime_component_template"], 
  "version": 3,
  "_meta": {
    "description": "my custom"
  }
}
```

### 对已有的索引创建别名

```shell
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "logs-nginx.access-prod",
        "alias": "logs"
      }
    }
  ]
}
```

## 删除别名

```shell
POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "logs-nginx.access-prod",
        "alias": "logs"
      }
    }
  ]
}
```

## 真题演练1

```shell
# Define an index alias for 'accounts-row' called 'accounts-male': Apply a filter to only show the male account owners
# 为'accounts-row'定义一个索引别名，称为'accounts-male':应用一个过滤器，只显示男性账户所有者

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "accounts-row",
        "alias": "accounts-male",
        "filter": {
          "bool": {
            "filter": [
              {
                "term": {
                  "gender.keyword": "male"
                }
              }
            ]
          }
        }
      }
    }
  ]
}
```

***

# 13.Search-template

> 官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-template.html

## 功能特点

> 模板接受在运行时指定参数。搜索模板存储在服务器端，可以在不更改客户端代码的情况下进行修改。

## 初识search-template

```shell
# 创建检索模板
PUT _scripts/my-search-template
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "{{query_key}}": "{{query_value}}"
        }
      },
      "from": "{{from}}",
      "size": "{{size}}"
    }
  }
}

# 使用检索模板查询
GET my-index/_search/template
{
  "id": "my-search-template",
  "params": {
    "query_key": "your filed",
    "query_value": "your filed value",
    "from": 0,
    "size": 10
  }
}
```

## 索引模板的操作

### 创建索引模板

```shell
PUT _scripts/my-search-template
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "message": "{{query_string}}"
        }
      },
      "from": "{{from}}",
      "size": "{{size}}"
    },
    "params": {
      "query_string": "My query string"
    }
  }
}
```

### 验证索引模板

```shell
POST _render/template
{
  "id": "my-search-template",
  "params": {
    "query_string": "hello world",
    "from": 20,
    "size": 10
  }
}
```

### 执行检索模板

```shell
GET my-index/_search/template
{
  "id": "my-search-template",
  "params": {
    "query_string": "hello world",
    "from": 0,
    "size": 10
  }
}
```

### 获取全部检索模板

> GET \_cluster/state/metadata?pretty&filter\_path=metadata.stored\_scripts

### 删除检索模板

> DELETE \_scripts/my-search-template

***

# 14.Search-dsl 简单检索

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl.html

## 检索选型

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-12g23k8cp6oD0G9Mt.png)

## 检索分类

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-13igDpVr6ddZxjBME.png)

## 自定义评分

### 如何自定义评分

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-13qhJgvNDUUfGTpUP.png)

### 1.index Boost索引层面修改相关性

```java
// 一批数据里，有不同的标签，数据结构一致，不同的标签存储到不同的索引（A、B、C），最后要严格按照标签来分类展示的话，用什么查询比较好?
// 要求：先展示A类，然后B类，然后C类

# 测试数据如下
put /index_a_123/_doc/1
{
  "title":"this is index_a..."
}
put /index_b_123/_doc/1
{
  "title":"this is index_b..."
}
put /index_c_123/_doc/1
{
  "title":"this is index_c..."
}
# 普通不指定的查询方式，该查询方式下，返回的三条结果数据评分是相同的
POST index_*_123/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "this"
          }
        }
      ]
    }
  }
}
```

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-search.html
> 
> **indices\_boost**

```shell
# 也就是索引层面提升权重
POST index_*_123/_search
{
  "indices_boost": [
    {
      "index_a_123": 10
    },
    {
      "index_b_123": 5
    },
    {
      "index_c_123": 1
    }
  ], 
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "this"
          }
        }
      ]
    }
  }
}
```

### 2.boosting 修改文档相关性

```java
某索引index_a有多个字段， 要求实现如下的查询：
1）针对字段title，满足'ssas'或者'sasa’。
2）针对字段tags（数组字段），如果tags字段包含'pingpang',
则提升评分。
要求：写出实现的DSL？

# 测试数据如下
put index_a/_bulk
{"index":{"_id":1}}
{"title":"ssas","tags":"basketball"}
{"index":{"_id":2}}
{"title":"sasa","tags":"pingpang; football"}
```

```she
# 解法1
POST index_a/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "title": "ssas"
                }
              },
              {
                "match": {
                  "title": "sasa"
                }
              }
            ]
          }
        }
      ],
      "should": [
        {
          "match": {
            "tags": {
              "query": "pingpang",
              "boost": 1
            }
            
          }
        }
      ]
    }
  }
}
# 解法2
// https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html
POST index_a/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "function_score": {
            "query": {
              "match": {
                "tags": {
                  "query": "pingpang"
                }
              }
            },
            "boost": 1
          }
        }
      ],
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "title": "ssas"
                }
              },
              {
                "match": {
                  "title": "sasa"
                }
              }
            ]
          }
        }
      ]
    }
  }
}
```

### 3.negative\_boost降低相关性

```java
对于某些结果不满意，但又不想通过 must_not 排除掉，可以考虑可以考虑boosting query的negative_boost。
即：降低评分
negative_boost
(Required, float) Floating point number between 0 and 1.0 used to decrease the relevance scores of documents matching the negative query.
```

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-boosting-query.html

```shell
POST index_a/_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "tags": "football"
        }
      },
      "negative": {
        "term": {
          "tags": "pingpang"
        }
      },
      "negative_boost": 0.5
    }
  }
}
```

### 4.function\_score 自定义评分

```java
如何同时根据 销量和浏览人数进行相关度提升？
问题描述：针对商品，例如有想要有一个提升相关度的计算，同时针对销量和浏览人数？
例如oldScore*(销量+浏览人数)
**************************  
商品		销量		浏览人数  
A		 10		   10      
B		 20		   20
C		 30		   30
************************** 
# 示例数据如下    
put goods_index/_bulk
{"index":{"_id":1}}
{"name":"A","sales_count":10,"view_count":10}
{"index":{"_id":2}}
{"name":"B","sales_count":20,"view_count":20}
{"index":{"_id":3}}
{"name":"C","sales_count":30,"view_count":30}
```

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html
> 
> 知识点：script\_score

```shell
POST goods_index/_search
{
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "script_score": {
        "script": {
          "source": "_score * (doc['sales_count'].value+doc['view_count'].value)"
        }
      }
    }
  }
}
```

## 真题演练

```java
对一个文档的多个字段进行查询，要求最终的算分是几个字段上算分的总和，同时要求对特定字段设置 boosting 值
题目解析
	1. 这里考察点：检索 + 基于字段的评分机制
    2. 考察细节点1：most_field（字段评分之和）； 细节考察点2：boost 提升权重。
    	2.1 most_field、best_field 的区别
    3.官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-multi-search.html
```

***

# 15.Search-del Bool复杂检索

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-bool-query.html

## 基本语法

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-13EE8nXb13sY7qzYFK.png)

## 真题演练

```java
写一个查询，要求某个关键字再文档的四个字段中至少包含两个以上
功能点：bool 查询，should / minimum_should_match
    1.检索的bool查询
    2.细节点 minimum_should_match
注意：minimum_should_match 当有其他子句的时候，默认值为0，当没有其他子句的时候默认值为1
```

```shell
POST test_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "filed1": "kr"
          }
        },
        {
          "match": {
            "filed2": "kr"
          }
        },
        {
          "match": {
            "filed3": "kr"
          }
        },
        {
          "match": {
            "filed4": "kr"
          }
        }
      ],
      "minimum_should_match": 2
    }
  }
}
```

***

# 16.Search-Aggregations

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations.html

## 聚合分类

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-13kPCqYhIWZG6R46Mp.png)
![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-1357WaBPeCzWXKdBse.png)

## 分桶聚合（bucket）

### terms

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-terms-aggregation.html

```shell
# 按照作者统计文档数
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "agg_user": {
      "terms": {
        "field": "user",
        "size": 1
      }
    }
  }
}
```

### date\_histogram

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-datehistogram-aggregation.html

```shell
# 按照up_time 按月进行统计
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "agg_up_time": {
      "date_histogram": {
        "field": "up_time",
        "calendar_interval": "month"
      }
    }
  }
}
```

## 指标聚合 （metrics）

### Max

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-max-aggregation.html

```shell
# 获取up_time最大的
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "agg_max_up_time": {
      "max": {
        "field": "up_time"
      }
    }
  }
}
```

### Top\_hits

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-top-hits-aggregation.html

```shell
# 根据user聚合只取一个聚合结果，并且获取命中数据的详情前3条，并按照指定字段排序
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "terms_agg_user": {
      "terms": {
        "field": "user",
        "size": 1
      },
      "aggs": {
        "top_user_hits": {
          "top_hits": {
            "_source": {
              "includes": [
                "video_time",
                "title",
                "see",
                "user",
                "up_time"
              ]
            }, 
            "sort": [
              {
                "see":{
                  "order": "desc"
                }
              }
            ], 
            "size": 3
          }
        }
      }
    }
  }
}

// 返回结果如下
{
  "took" : 91,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "terms_agg_user" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 975,
      "buckets" : [
        {
          "key" : "Elastic搜索",
          "doc_count" : 25,
          "top_user_hits" : {
            "hits" : {
              "total" : {
                "value" : 25,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "bilili_elasticsearch",
                  "_id" : "5ccCVoQBUyqsIDX6wIcm",
                  "_score" : null,
                  "_source" : {
                    "video_time" : "03:45",
                    "see" : "92",
                    "up_time" : "2021-03-19",
                    "title" : "Elastic 社区大会2021: 用加 Gatling 进行Elasticsearch的负载测试，寓教于乐。",
                    "user" : "Elastic搜索"
                  },
                  "sort" : [
                    "92"
                  ]
                },
                {
                  "_index" : "bilili_elasticsearch",
                  "_id" : "8scCVoQBUyqsIDX6wIgn",
                  "_score" : null,
                  "_source" : {
                    "video_time" : "10:18",
                    "see" : "79",
                    "up_time" : "2020-10-20",
                    "title" : "为Elasticsearch启动htpps访问",
                    "user" : "Elastic搜索"
                  },
                  "sort" : [
                    "79"
                  ]
                },
                {
                  "_index" : "bilili_elasticsearch",
                  "_id" : "7scCVoQBUyqsIDX6wIcm",
                  "_score" : null,
                  "_source" : {
                    "video_time" : "04:41",
                    "see" : "71",
                    "up_time" : "2021-03-19",
                    "title" : "Elastic 社区大会2021: Elasticsearch作为一个地理空间的数据库",
                    "user" : "Elastic搜索"
                  },
                  "sort" : [
                    "71"
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}
```

## 子聚合 （Pipeline）

> Pipeline：基于聚合的聚合
> 
> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline.html

### bucket\_selector

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline-bucket-selector-aggregation.html

```shell
# 根据order_date按月分组，并且求销售总额大于1000
POST kibana_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "date_his_aggs": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sum_aggs": {
          "sum": {
            "field": "total_unique_products"
          }
        },
        "sales_bucket_filter": {
          "bucket_selector": {
            "buckets_path": {
              "totalSales": "sum_aggs"
            },
            "script": "params.totalSales > 1000"
          }
        }
      }
    }
  }
}
```

## 真题演练

```java
earthquakes索引中包含了过去30个月的地震信息，请通过一句查询，获取以下信息
l 过去30个月，每个月的平均 mag
l 过去30个月里，平均mag最高的一个月及其平均mag
l 搜索不能返回任何文档
    
max_bucket 官网地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline-max-bucket-aggregation.html
```

```shell
POST earthquakes/_search
{
  "size": 0, 
  "query": {
    "range": {
      "time": {
        "gte": "now-30M/d",
        "lte": "now"
      }
    }
  },
  "aggs": {
    "agg_time_his": {
      "date_histogram": {
        "field": "time",
        "calendar_interval": "month"
      },
      "aggs": {
        "avg_aggs": {
          "avg": {
            "field": "mag"
          }
        }
      }
    },
    "max_mag_sales": {
      "max_bucket": {
        "buckets_path": "agg_time_his>avg_aggs" 
      }
    }
  }
}
```

***