Elasticsearch必知必会-进阶篇-京东云开发者社区

# 17.跨Cluster检索 - ccr

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html

## 跨Cluster检索的背景和意义

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-18bW24618rJsbJlxX2d.png)

## 跨Cluster检索定义

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-18JszrcAZEgc02JPB.png)

## 跨Cluster检索环境搭建

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html
> 
> 步骤1：搭建两个本地单节点Cluster，本地练习可取消安全配置 `xpack.security.enabled: false`
> 
> 步骤2：每个Cluster都执行以下命令
> 
> PUT \_cluster/settings { "persistent": { "cluster": { "remote": { "cluster\_one": { "seeds": [ "172.21.0.14:9301" ] },"cluster\_two": { "seeds": [ "172.21.0.14:9302" ] } } } } }
> 
> 步骤3：验证Cluster之间是否互通
> 
> 	方案1：Kibana 可视化查看：stack Management -> Remote Clusters -> status 应该是 connected！ 且必须打上绿色的对号。
> 
> 	方案2：GET \_remote/info

## 跨Cluster查询演练

```shell
# 步骤1 在Cluster 1 中添加数据如下
PUT test01/_bulk
{"index":{"_id":1}}
{"title":"this is from cluster01..."}

# 步骤2 在Cluster 2 中添加数据如下：
PUT test01/_bulk
{"index":{"_id":1}}
{"title":"this is from cluster02..."}

# 步骤 3：执行跨Cluster检索如下: 语法：POST Cluster名称1:索引名称,Cluster名称2:索引名称/_search
POST cluster_one:test01,cluster_two:test01/_search
{
  "took" : 7,
  "timed_out" : false,
  "num_reduce_phases" : 3,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "_clusters" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cluster_two:test01",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "this is from cluster02..."
        }
      },
      {
        "_index" : "cluster_one:test01",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "this is from cluster01..."
        }
      }
    ]
  }
}
```

# 18.跨Cluster复制 - ccs - 该功能需付费

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-ccr.html

## 如何保障Cluster的高可用

> 1. 副本机制
> 2. 快照和恢复
> 3. 跨Cluster复制（类似mysql 主从同步）

## 跨Cluster复制概述

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-19LVxyHCOQNuEQzJ19.png)

## 跨Cluster复制配置

> 1. 准备两个Cluster，网络互通

> 1. 开启 license 使用，可试用30天
>     1. 开启位置：Stack Management -> License mangement.

> 1. 定义好谁是LeadsCluster，谁是followerCluster

> 4.在followerCluster配置LeaderCluster
>
> ```shell
> PUT /_cluster/settings
> {
>   "persistent": {
>     "cluster": {
>       "remote": {
>         "leader": {
>           "seeds": [
>             "43.143.79.199:8901"
>           ]
>         }
>       }
>     }
>   }
> }
> ```

> 1. 在followerCluster配置LeaderCluster的索引同步规则（kibana页面配置）
>
> stack Management -> Cross Cluster Replication -> create a follower index.
> 
> ![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-19xkbnaAoEppqDvOb.png)

> 6.启用步骤5的配置
> 
> ![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-19qE36TfE19RrgafDyD.png)

***

# 19.索引模板

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html

## 8.X之组件模板

> 1. 创建组件模板-索引setting相关
>
> ```shell
> # 组件模板 - 索引setting相关
> PUT _component_template/template_sttting_part
> {
>   "template": {
>     "settings": {
>       "number_of_shards": 3,
>       "number_of_replicas": 0
>     }
>   }
> }
> ```
>
> 1. 创建组件模板-索引mapping相关
>
> ```shell
> # 组件模板 - 索引mapping相关
> PUT _component_template/template_mapping_part
> {
>   "template": {
>     "mappings": {
>       "properties": {
>         "hosr_name":{
>           "type": "keyword"
>         },
>         "cratet_at":{
>           "type": "date",
>           "format": "EEE MMM dd HH:mm:ss Z yyyy"
>         }
>       }
>     }
>   }
> }
> ```
>
> 1. 创建组件模板-配置模板和索引之间的关联
> > **注意：composed\_of 如果多个组件模板中的配置项有重复，后面的会覆盖前面的，和配置的顺序有关**
>
> ```shell
> # 基于组件模板，配置模板和索引之间的关联
> # 也就是所有 tem_* 该表达式相关的索引创建时，都会使用到以下规则
> PUT _index_template/template_1
> {
>   "index_patterns": [
>     "tem_*"
>   ],
>   "composed_of": [
>     "template_sttting_part",
>     "template_mapping_part"
>   ]
> }
> ```
>
> 1. 测试
>
> ```shell
> # 创建测试
> PUT tem_001
> ```

## 索引模板基本操作

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-196hc2bQKiqOXF852V.png)

## 实战演练

> 需求1：默认如果不显式指定Mapping,数值类型会被动态映射为long类型，但实际上业务数值都比较小，会存在存储浪费。需要将默认值指定为Integer
> > 索引模板，官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html
> > mapping-动态模板,官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html

# 2. 创建组件模板与索引关联配置
PUT _index_template/template_2
{
  "index_patterns": ["tem1_*"],
  "composed_of": ["template_mapping_part_01"]
}

# 3.创建测试数据
POST tem1_001/_doc/1
{
  "age":18
}

# 4.查看mapping结构验证
get tem1_001/_mapping
```

> 需求2：date\_\*开头的字段，统一匹配为date日期类型。
> > 索引模板，官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html
> > mapping-动态模板,官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html

```shell
# 结合mapping 动态模板 和 索引模板
# 1.创建组件模板之 - mapping模板
PUT _component_template/template_mapping_part_01
{
  "template": {
    "mappings": {
      "dynamic_templates": [
        {
          "integers": {
            "match_mapping_type": "long",
            "mapping": {
              "type": "integer"
            }
          }
        },
        {
        "date_type_process": {
          "match": "date_*",
          "mapping": {
            "type": "date",
            "format":"yyyy-MM-dd HH:mm:ss"
          }
        }
      }
      ]
    }
  }
}

# 2. 创建组件模板与索引关联配置
PUT _index_template/template_2
{
  "index_patterns": ["tem1_*"],
  "composed_of": ["template_mapping_part_01"]
}

# 3.创建测试数据
POST tem1_001/_doc/2
{
  "age":19,
  "date_aoe":"2022-01-01 18:18:00"
}

# 4.查看mapping结构验证
get tem1_001/_mapping
```

## 真题演练

> 定义一个dynamic\_template, 使得 x\_ 开头的是integer类型，
> > 索引模板，官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html
> > mapping-动态模板,官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html

```shell
# 结合mapping 动态模板 和 索引模板
# 1.创建组件模板之 - mapping模板
PUT _component_template/template_mapping_part_02
{
  "template": {
    "mappings": {
      "dynamic_templates": [
        {
          "keywords": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword"
            }
          }
        },
        {
        "integers": {
          "match": "x_*",
          "mapping": {
            "type": "integer"
          }
        }
      }
      ]
    }
  }
}

# 2. 创建组件模板与索引关联配置
PUT _index_template/template_3
{
  "index_patterns": ["tem2_*"],
  "composed_of": ["template_mapping_part_02"]
}
```

***

# 20.LIM 索引生命周期管理

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-lifecycle-management.html

## 什么是索引生命周期

> 索引的 生-> 老 -> 病 -> 死
> 
> 是否有过考虑，如果一个索引，创建之后，就不再去管理了？会发生什么？

## 什么是索引生命周期管理

> **索引太大了会如何?**
> > * 大索引的恢复时间，要远比小索引恢复慢的多的多
> > * 索引大了以后，检索会很慢，写入和更新也会受到不同程度的影响
> > * 索引大到一定程度，当索引出现健康问题，会导致整个Cluster核心业务不可用
>
> **最佳实践**
> > * Cluster的单个分片最大文档数上限：2的32次幂减1，即20亿左右
> > * 官方建议：分片大小控制在30GB-50GB，若索引数据量无限增大，肯定会超过这个值
>
> **用户不关注全量**
> > * 某些业务场景，业务更关注近期的数据，如近3天、近7天
> > * 大索引会将全部历史数据汇集在一起，不利于这种场景的查询

## 索引生命周期管理的历史演变

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-20QY20HUFAoWRqqrdm.png)

## lim前奏 - rollover 滚动索引

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-rollover.html

```shell
# 0.自测前提，lim生命周期rollover频率。默认10分钟
PUT _cluster/settings
{
  "persistent": {
    "indices.lifecycle.poll_interval": "1s"
  }
}

# 1. 创建索引，并指定别名
PUT test_index-0001
{
  "aliases": {
    "my-test-index-alias": {
      "is_write_index": true
    }
  }
}

# 2.批量导入数据
PUT my-test-index-alias/_bulk
{"index":{"_id":1}}
{"title":"testing 01"}
{"index":{"_id":2}}
{"title":"testing 02"}
{"index":{"_id":3}}
{"title":"testing 03"}
{"index":{"_id":4}}
{"title":"testing 04"}
{"index":{"_id":5}}
{"title":"testing 05"}

# 3.rollover 滚动规则配置
POST my-test-index-alias/_rollover
{
  "conditions": {
    "max_age": "7d",
    "max_docs": 5,
    "max_primary_shard_size": "50gb"
  }
}

# 4.在满足条件的前提下创建滚动索引
PUT my-test-index-alias/_bulk
{"index":{"_id":7}}
{"title":"testing 07"}

# 5.查询验证滚动是否成功
POST my-test-index-alias/_search
```

## lim前奏 -shrink 索引压缩

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-shrink.html
> 
> 核心步骤：
>
> 1. 将数据全部迁移至一个独立的节点
> 2. 索引禁止写入
> 3. 方可进行压缩

```shell
# 1.准备测试数据
DELETE kibana_sample_data_logs_ext
PUT kibana_sample_data_logs_ext
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 0
  }
}
POST _reindex
{
  "source": {
    "index": "kibana_sample_data_logs"
  },
  "dest": {
    "index": "kibana_sample_data_logs_ext"
  }
}

# 2.压缩前必要的条件设置
# number_of_replicas :压缩后副本为0
# index.routing.allocation.include._tier_preference 数据分片全部路由到hot节点
# "index.blocks.write 压缩后索引不再允许数据写入
PUT kibana_sample_data_logs_ext/_settings
{
  "settings": {
    "index.number_of_replicas": 0,
    "index.routing.allocation.include._tier_preference": "data_hot",
    "index.blocks.write": true
  }
}

# 3.实施压缩
POST kibana_sample_data_logs_ext/_shrink/kibana_sample_data_logs_ext_shrink
{
  "settings":{
    "index.number_of_replicas": 0,
    "index.number_of_shards": 1,
    "index.codec":"best_compression"
  },
  "aliases":{
    "kibana_sample_data_logs_alias":{}
  }
}
```

## LIM实战

### 全局认知建立 - 四大阶段

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/overview-index-lifecycle-management.html

> 生命周期管理阶段（Policy）：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-index-lifecycle.html
> 
> **Hot阶段** (生)
> > Set priority
> > 
> > Unfollow
> > 
> > Rollover
> > 
> > Read-only
> > 
> > Shrink
> > 
> > Force Merge
> > 
> > Search snapshot
>
> **Warm阶段** （老）
> > Set priority
> > 
> > Unfollow
> > 
> > Read-only
> > 
> > Allocate
> > 
> > migrate
> > 
> > Shirink
> > 
> > Force Merge
>
> **Cold阶段** （病）
> > Search snapshot
>
> **Delete阶段** （死）
> > delete

### 演练

> 官网文档地址：
> 
> https://www.elastic.co/guide/en/elasticsearch/reference/8.1/set-up-lifecycle-policy.html
> 
> https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-actions.html

#### 1.创建policy

> * **Hot**阶段设置，rollover: max\_age:3d，max\_docs:5, max\_size:50gb, 优先级：100
> * Warm阶段设置：min\_age:15s , forcemerage段合并，热节点迁移到warm节点，副本数设置0，优先级：50
> * Cold阶段设置: min\_age 30s, warm迁移到cold阶段
> * Delete阶段设置：min\_age 45s，执行删除操作

```shell
PUT _ilm/policy/kr_20221114_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "set_priority": {
            "priority": 100
          },
          "rollover": {
            "max_size": "50gb",
            "max_primary_shard_size": "50gb",
            "max_age": "3d",
            "max_docs": 5
          }
        }
      },
      "warm": {
        "min_age": "15s",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          },
          "allocate": {
            "number_of_replicas": 0
          }
        }
      },
      "cold": {
        "min_age": "30s",
        "actions": {
          "set_priority": {
            "priority": 0
          }
        }
      },
      "delete": {
        "min_age": "45s",
        "actions": {
          "delete": {
            "delete_searchable_snapshot": true
          }
        }
      }
    }
  }
}
```

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-20Pbcg6U7sH7pAPGP.png)

#### 2.创建index template

```shell
PUT _index_template/kr_20221114_template
{
  "index_patterns": ["kr_index-**"],
  "template": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "kr_20221114_policy",
          "rollover_alias": "kr-index-alias"
        },
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data-hot"
            }
          }
        },
        "number_of_shards": "3",
        "number_of_replicas": "1"
      }
    },
    "aliases": {},
    "mappings": {}
  }
}
```

#### 3.测试需要修改lim rollover刷新频率

```shell
PUT _cluster/settings
{
  "persistent": {
    "indices.lifecycle.poll_interval": "1s"
  }
}
```

#### 4.进行测试

```shell
# 创建索引，并制定可写别名
PUT kr_index-0001
{
  "aliases": {
    "kr-index-alias": {
      "is_write_index": true
    }
  }
}
# 通过别名新增数据
PUT kr-index-alias/_bulk
{"index":{"_id":1}}
{"title":"testing 01"}
{"index":{"_id":2}}
{"title":"testing 02"}
{"index":{"_id":3}}
{"title":"testing 03"}
{"index":{"_id":4}}
{"title":"testing 04"}
{"index":{"_id":5}}
{"title":"testing 05"}
# 通过别名新增数据，触发rollover
PUT kr-index-alias/_bulk
{"index":{"_id":6}}
{"title":"testing 06"}
# 查看索引情况
GET kr_index-0001

get _cat/indices?v
```

#### 过程总结

> 第一步：配置 lim pollicy
>
> * 横向：Phrase 阶段(Hot、Warm、Cold、Delete) 生老病死
> * 纵向：Action 操作（rollover、forcemerge、readlyonly、delete）
>
> 第二步：创建模板 绑定policy,指定别名
> 
> 第三步：创建起始索引
> 
> 第四步：索引基于第一步指定的policy进行滚动

***

# 21.Data Stream

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-actions.html

## 特性解析

> Data Stream让我们跨多个索引存储时序数据，同时给了唯一的对外接口（data stream名称）
>
> * 写入和检索请求发给data stream
> * data stream将这些请求路由至 backing index（后台索引）
>
> ![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-2055lZ6X5tCfudA2PJ.png)

## Backing indices

> 每个data stream由多个隐藏的后台索引构成
>
> * 自动创建
> * 要求模板索引
>
> rollover 滚动索引机制用于自动生成后台索引
>
> * 将成为data stream 新的写入索引

## 应用场景

> 1. 日志、事件、指标等其他持续创建（少更新）的业务数据
> 2. 两大核心特点
>     1. 时序性数据
>     2. 数据极少更新或没有更新

## 创建Data Stream 核心步骤

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-21kWiLOrcnd216aHl21.png)

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/set-up-a-data-stream.html
> 
> Set up a data stream
> 
> To set up a data stream, follow these steps:
>
> 1. [Create an index lifecycle policy](https://www.elastic.co/guide/en/elasticsearch/reference/8.1/set-up-a-data-stream.html#create-index-lifecycle-policy)
> 2. [Create component templates](https://www.elastic.co/guide/en/elasticsearch/reference/8.1/set-up-a-data-stream.html#create-component-templates)
> 3. [Create an index template](https://www.elastic.co/guide/en/elasticsearch/reference/8.1/set-up-a-data-stream.html#create-index-template)
> 4. [Create the data stream](https://www.elastic.co/guide/en/elasticsearch/reference/8.1/set-up-a-data-stream.html#create-data-stream)
> 5. [Secure the data stream](https://www.elastic.co/guide/en/elasticsearch/reference/8.1/set-up-a-data-stream.html#secure-data-stream)

## 演练

> 1. 创建一个data stream，名称为my-data-stream
> 2. index\_template 名称为 my-index-template
> 3. 满足index格式【"my-data-stream\*"】的索引都要被应用到
> 4. 数据插入的时候，在data\_hot节点
> 5. 过3分钟之后要rollover到data\_warm节点
> 6. 再过5分钟要到data\_cold节点

```shell
# 步骤1 。创建 lim policy
PUT _ilm/policy/my-lifecycle-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "3m",
            "max_docs": 5
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "5m",
        "actions": {
          "allocate": {
            "number_of_replicas": 0
          }, 
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "cold": {
        "min_age": "6m",
        "actions": {
          "freeze":{}
        }
      },
      "delete": {
        "min_age": "45s",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

# 步骤2 创建组件模板 - mapping
PUT _component_template/my-mappings
{
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "date_optional_time||epoch_millis"
        },
        "message": {
          "type": "wildcard"
        }
      }
    }
  },
  "_meta": {
    "description": "Mappings for @timestamp and message fields",
    "my-custom-meta-field": "More arbitrary metadata"
  }
}

# 步骤3 创建组件模板 - setting
PUT _component_template/my-settings
{
  "template": {
    "settings": {
      "index.lifecycle.name": "my-lifecycle-policy",
      "index.routing.allocation.include._tier_preference":"data_hot"
    }
  },
  "_meta": {
    "description": "Settings for ILM",
    "my-custom-meta-field": "More arbitrary metadata"
  }
}

# 步骤4 创建索引模板
PUT _index_template/my-index-template
{
  "index_patterns": ["my-data-stream*"],
  "data_stream": { },
  "composed_of": [ "my-mappings", "my-settings" ],
  "priority": 500,
  "_meta": {
    "description": "Template for my time series data",
    "my-custom-meta-field": "More arbitrary metadata"
  }
}

# 步骤5 创建 data stream  并 写入数据测试
PUT my-data-stream/_bulk
{ "create":{ } }
{ "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" }
{ "create":{ } }
{ "@timestamp": "2099-05-06T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }

POST my-data-stream/_doc
{
  "@timestamp": "2099-05-06T16:21:15.000Z",
  "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"
}

# 步骤6 查看data stream 后台索引信息
GET /_resolve/index/my-data-stream*
```

***

# 22.Ingest 数据预处理

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ingest.html

## Cluster节点角色

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-node.html

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-2121Fj6g5GyUUmiQVO.png)

## Ingest 预处理应用场景

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-22TXkfQtq9Z2ydsDl.png)

## Ingest 预处理语法

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-22CM6WSmgkraxuuaD.png)

## 实战演练-Ingest

> 有一个index\_a包含一些文档，要求创建index\_b，通过reindex api 将index\_a的文档索引到index\_b
>
> 1. 要求增加一个整形字段，value是index\_a 的 field\_x的字符长度
> 2. 再增加一个数组类型的字段，value是field\_y的词集合
> 3. field\_y是空格分割的一组词，比如“foo bar"，然后索引到index\_b后，要求变成["foo","bar"]”
>
> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/script-processor.html
> 
> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/split-processor.html

```shell
# 新增测试数据
POST index_a/_bulk
{"index":{"_id":1}}
{"field_x":"test_01","field_y":"foo bar"}

# 创建预处理管道
PUT _ingest/pipeline/my_index_b_pipline
{
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": """
          ctx.field_x = ctx.field_x.length();
        """
      },
      "split": {
        "field": "field_y",
        "separator": " ",
        "target_field":"field_y"
      }
    }
  ]
}

# 执行reindex
POST _reindex
{
  "source": {
    "index": "index_a"
  },
  "dest": {
    "index": "index_b",
    "pipeline": "my_index_b_pipline"
  }
}

# 测试验证
get index_b/_search
```

## Ingest-Enrich

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ingest-enriching-data.html

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-22sifnovzuJZiO0X6.png)

### Ingest-Enrich使用步骤

> 1. 创建源索引
> 2. 创建policy
> 3. 执行policy,生成enrich索引
> 4. 创建ingest pipeline
> 5. 通过reindex或者update\_by\_query执行pipeline
> 6. 生成目标索引

### 实战演练Ingest-Enrich

> 有a，b两索引，均有字段filed\_a，索引a，b各自包含其它字段，建立新索引如c，
> 
> 要求c包含a索引全部文档，且在a和b索引关联字段 field\_a 相同的文档中把b文档其它字段更新到索引c中
> 
> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/match-enrich-policy-type.html

```shell
# 测试数据 - index_test_a
DELETE index_test_a
PUT index_test_a
{
  "mappings": {
    "properties": {
      "field_a": {
        "type": "keyword"
      },
      "title": {
        "type": "keyword"
      },
      "publish_time": {
        "type": "date"
      }
    }
  }
}
POST index_test_a/_bulk
{"index":{"_id":1}}
{"field_a":"aaa","title":"elasticsearch in action","publish_time":"2017-07-01T00:00:00"}

# 测试数据 - index_test_b
PUT index_test_b
{
  "mappings": {
    "properties": {
      "field_a": {
        "type": "keyword"
      },
      "author": {
        "type": "keyword"
      },
      "publisher": {
        "type": "keyword"
      }
    }
  }
}
POST index_test_b/_bulk
{"index":{"_id":1}}
{"field_a":"aaa","author":"jerry","publisher":"Tsinghua"}

# 1. 创建enrich 
PUT /_enrich/policy/index_test_b_enrich
{
  "match": {
    "indices": "index_test_b",
    "match_field": "field_a",
    "enrich_fields": ["author", "publisher"]
  }
}
# 2. 执行enrich 
POST /_enrich/policy/index_test_b_enrich/_execute

# 3.创建policy
PUT /_ingest/pipeline/index_test_b_policy
{
  "processors" : [
    {
      "enrich" : {
        "policy_name": "index_test_b_enrich",
        "field" : "field_a",
        "target_field": "auto_add_fields",
        "max_matches": "1"
      }
    },
    {
      "script": {
        "lang": "painless",
        "source": """
          if(ctx.auto_add_fields!=null && ctx.auto_add_fields.publisher!=null){
            ctx.publisher = ctx.auto_add_fields.publisher;
          }
          if(ctx.auto_add_fields!=null && ctx.auto_add_fields.author!=null){
            ctx.author = ctx.auto_add_fields.author;
          }
        """
      }
    },
    {
      "remove": {
        "field": "auto_add_fields"
      }
    }
  ]
}

# 4.执行reindex
POST _reindex
{
  "source": {
    "index": "index_test_a"
  },
  "dest": {
    "index": "index_test_c",
    "pipeline": "index_test_b_policy"
  }
}

# 进行验证
GET index_test_c/_search
```

***

# 23.Painless 无痛脚本

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-scripting.html
> 
> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/painless/8.1/index.html

## 发展历史

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-22AmsaILvvnRp42Jv.png)

## 全局认知

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-22AGX7RnRvryY8ZJX.png)

## 脚本语法模板

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-23IrJjeo7IIMdTQSl.png)

## 应用场景

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-23rVxxj6ZIDYtzrbQ.png)

### 自定义字段

```shell
# script_field
GET kibana_sample_data_flights/_search
{
  "_source": {
    "includes": [
      "DistanceKilometers"
    ]
  },
  "script_fields": {
    "my_double_field": {
      "script": {
        "lang": "expression",
        "source": "doc['DistanceKilometers'] * multiplier",
        "params": {
          "multiplier": 2
        }
      }
    }
  }
}
```

```shell
# runtime field
PUT kibana_sample_data_flights/_mapping
{
  "runtime":{
    "day_of_week":{
      "type":"keyword",
      "script":{
        "source": "emit(doc['timestamp'].value.getDayOfWeekEnum().toString())"
      }
    }
  }
}

GET kibana_sample_data_flights/_search
{
  "size": 0,
  "fields": [
    "timestamp",
    "day_of_week"
  ],
  "aggs": {
    "day_of_week_agg": {
      "terms": {
        "field": "day_of_week",
        "size": 10
      }
    }
  }
}
```

### 自定义评分

```shell
POST kibana_sample_data_flights/_search
{
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "script_score": {
        "script": {
          "lang": "expression",
          "source": "_score * doc['DistanceKilometers']"
        }
      }
    }
  }
}
```

### 自定义更新/删除

```shell
# 单条更新
POST kibana_sample_data_flights/_update/sJGOwYMBklfFAitU5ZIt
{
  "script": {
    "lang": "painless",
    "source": """
      ctx._source.last = params.last;
      ctx._source.nick = params.nick;
    """,
    "params": {
      "last":"aaa",
      "nick":"bbb"
    }
  }
}

# 批量更新
POST kibana_sample_data_flights/_update_by_query
{
  "query": {
    "term": {
      "OriginWeather": {
        "value": "Sunny"
      }
    }
  },
  "script": {
    "source": """
      ctx._source.last = params.last;
      ctx._source.nick = params.nick;
    """,
    "lang": "painless",
    "params": {
      "last":"aaa",
      "nick":"bbb"
    }
  }
}
```

## 脚本与场景

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-23F23colUxmjn8dn523.png)

## 实战演练

> reindex操作，需满足
>
> 1. 把source index的某个字段（该字段是数组）里的子项都去掉前后的空格
> 2. 增加一个新字段，这个新字段的值是source index的其中两个字段的值的拼接
>
> Ingest pipline foreash :https://www.elastic.co/guide/en/elasticsearch/reference/8.1/foreach-processor.html
> 
> Ingest pipline trim: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/trim-processor.html

```shell
# 1.创建测试source索引
PUT test_index_source
{
  "mappings": {
    "properties": {
      "first_name":{
        "type": "keyword"
      },
      "last_name":{
        "type": "keyword"
      },
      "tags":{
        "type": "keyword"
      }
    }
  }
}
# 2.填充测试数据
PUT test_index_source/_doc/1
{
  "first_name": "kang",
  "last_name": "rui",
  "tags": [
    " aaa ",
    " bbb ",
    " ccc"
  ]
}

# 3.创建预处理  ingest pipline
PUT _ingest/pipeline/test_index_source_pipline
{
  "processors": [
    {
      "foreach": {
        "field": "tags",
        "processor": {
          "trim": {
            "field": "_ingest._value"
          }
        }
      }
    },
    {
      "script": {
        "lang": "painless",
        "source": """
          ctx['full_name'] = ctx['first_name'] + ' '+ ctx['last_name']
        """
      }
    },
    {
      "remove": {
        "field": "first_name"
      }
    },
    {
      "remove": {
        "field": "last_name"
      }
    }
  ]
}

# 4.执行reindex
POST _reindex
{
  "source": {
    "index": "test_index_source"
  },
  "dest": {
    "index": "test_index_source_reindex",
    "pipeline": "test_index_source_pipline"
  }
}

GET test_index_source_reindex/_search
```

***

# 24.update\_by\_query

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/docs-update-by-query.html

## 更新的分类

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-23bOxDUxYEGOIH46wg.png)

## 更新的场景

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-23zmJeGL7WxRvCBta.png)

## 基于painless脚本的批量更新

```shell
PUT mytest
{
  "mappings": {
    "properties": {
      "counter":{
        "type": "long"
      }
    }
  }
}

POST mytest/_bulk
{"index":{"_id":1}}
{"counter":100}
{"index":{"_id":2}}
{"counter":200}

POST mytest/_update_by_query
{
  "query": {
    "term": {
      "counter": {
        "value": 100
      }
    }
  },
  "script": {
    "source": """
      ctx._source.counter++;
    """,
    "lang": "painless"
  }
}

GET mytest/_search
```

## 基于Ingest pipline的批量更新

```shell
PUT _ingest/pipeline/mytest-pipline
{
  "processors": [
    {
      "set": {
        "field": "foo",
        "value": "bar"
      }
    }
  ]
}

POST mytest/_update_by_query?pipeline=mytest-pipline
{
  "query": {
    "match_all": {}
  }
}

GET mytest/_search
```

## 实战演练

> 为索引添加一个新字段e, 是已有字段 a b c d的拼接

```shell
POST mytest01/_bulk
{"index":{"_id":1}}
{"a":"a","b":"b","c":"c","d":"d"}

# 解法1
PUT _ingest/pipeline/mytest01_pipline
{
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": """
           ctx.e = ctx.a +' '+ ctx.b +' ' + ctx.c + ' '+ ctx.d;
        """
      }
    }
  ]
}

POST mytest01/_update_by_query?pipeline=mytest01_pipline
{
  "query": {
    "match_all": {}
  }
}

# 解法2
POST mytest01/_update_by_query
{
  "query": {
    "match_all": {}
  },
  "script": {
    "source": """
      ctx._source.e = ctx._source.a +' '+ ctx._source.b +' ' + ctx._source.c + ' '+ ctx._source.d;
    """,
    "lang": "painless"
  }
}
```

***

# 25.reindex

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/docs-reindex.html

## reindex 使用场景

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-24FmhZXffmQ7Uuy24L.png)

## reindex在源索引设置检索条件

```shell
POST _reindex
{
  "source": {
    "index": "YYY",
    "query": {
      "term": {
        "name": {
          "value": "kr"
        }
      }
    }
  },
  "dest": {
    "index": "XXX"
  }
}
```

## reindex 基于部分字段做索引迁移

```shell
POST _reindex
{
  "source": {
    "index": "YYY",
    "_source":["name","title"]
  },
  "dest": {
    "index": "XXX"
  }
}
```

## 基于特定script处理的索引迁移

```shell
POST text0001/_bulk
{"index":{}}
{"foo":"bar","count":10}

POST _reindex
{
  "source": {
    "index": "text0001"
  },
  "dest": {
    "index": "text0002"
  },
  "script": {
    "lang": "painless", 
    "source": """
      if(ctx._source.count !=null && ctx._source.foo == 'bar'){
        ctx._source.count++;
        ctx._source.remove('foo');
      }
    """
  }
}

GET text0002/_search
```

## 基于pipeline预处理的索引迁移

```shell
POST index_a/_bulk
{
  "index": {
    "_id": 1
  }
}
{
  "title": " foo bar "
}

PUT _ingest/pipeline/my-trim-pipeline
{
  "description": "describe pipeline",
  "processors": [
    {
      "trim": {
        "field": "title"
      }
    }
  ]
}

POST _reindex
{
  "source": {
    "index": "index_a"
  },
  "dest": {
    "index": "index_b",
    "pipeline": "my-trim-pipeline"
  }
}
GET index_b/_search
```

***

# 26.Cluster健康诊断

## Cluster健康状态

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-24wArw0Ey488uWrE9h.png)

## Cluster健康如何诊断

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-24m9SjmCVfdD6MmEm.png)

## 分片未分配解决方案

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-25I97jrOvKIm5CycL.png)

***

# 27.备份和恢复

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/snapshot-restore.html

## 索引备份恢复的方式

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-25McOITt6ptu17M2kZ.png)

## 快照常见命令

![image.png](https://s3.cn-north-1.jdcloud-oss.com/shendengbucket1/2023-06-02-17-258MAlIY5WbMavqZU.png)

## 快照库的注册

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/snapshots-filesystem-repository.html

> 1. es修改配置文件，新增以下配置， 先创建文件夹
>
> ```shell
> path.repo: /home/es_study/elasticsearch-node-2/backup
> ```
>
> 1. 注册快照库
>
> ```shell
> PUT _snapshot/my_fs_backup
> {
>   "type": "fs",
>   "settings": {
>     "location": "/home/es_study/elasticsearch-node-2/backup"
>   }
> }
> ```
>
> 1. 验证是否注册成功
>
> ```shell
> POST /_snapshot/my_fs_backup/_verify
> ```

## 备份恢复演练

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/restore-snapshot-api.html

```shell
# 执行备份
PUT /_snapshot/my_fs_backup/snapshot_1111?wait_for_completion=true
{
  "indices": "my_index_001",
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {
    "taken_by": "user123",
    "taken_because": "backup before upgrading"
  }
}

# 删除索引
DELETE my_index_001

# 执行备份恢复
POST /_snapshot/my_fs_backup/snapshot_1111/_restore?wait_for_completion=true
{
  "indices": "*", 
  "ignore_unavailable": true,
  "include_global_state": false,
  "include_aliases": false
}
```

## SLM 快照生命周期管理

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/snapshots-take-snapshot.html

```shell
PUT _slm/policy/nightly-snapshots
{
  "schedule": "0 30 1 * * ?",         // 定时执行时间
  "name": "<nightly-snap-{now/d}>",   // 生成的快照名称
  "repository": "my_repository",      // 备份库注册的名称
  "config": {
    "indices": "*",                   // 需备份的索引 
    "include_global_state": true    
  },
  "retention": {                       // 删除的配置             
    "expire_after": "30d",
    "min_count": 5,
    "max_count": 50
  }
}
```

# 28.可搜索快照

> 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/searchable-snapshots.html

```shell
# 1.在elasticsearch.yml配置文件中配置
path.repo: "/home/es_study/esBackUp"

# 2.注册快照存储库
PUT _snapshot/my_fs_backup
{
  "type": "fs",
  "settings": {
    "location": "/home/es_study/esBackUp"
  }
}

# 3.新增测试数据
PUT test-search-01/_bulk
{"index":{"_id":1}}
{"title":"test01"}

# 4.创建快照
PUT _snapshot/my_fs_backup/test-search-01-snapshot?wait_for_completion=true
{
  "indices": "test-search-01",
  "ignore_unavailable": true,
  "include_global_state": false,
    "metadata": {
    "taken_by": "kr",
    "taken_because": "backup before upgrading"
  }
}

# 5.删除索引
DELETE test-search-01

# 6.执行快照挂载
# https://www.elastic.co/guide/en/elasticsearch/reference/8.1/searchable-snapshots-api-mount-snapshot.html

POST _snapshot/my_fs_backup/test-search-01-snapshot/_mount?wait_for_completion=true
{
  "index": "test-search-01", 
  "renamed_index": "docs-index-023", 
  "index_settings": { 
    "index.number_of_replicas": 1
  },
  "ignore_index_settings": [ "index.refresh_interval" ] 
}

# 7.执行查询挂载后的索引
GET docs-index/_search
```