当前位置:网站首页>ETCD Single-Node Fault Emergency Recovery
ETCD Single-Node Fault Emergency Recovery
2022-08-11 07:04:00 【!Nine thought & & gentleman!】
系列文章目录
ETCDContainerized to build clusters
文章目录
前言
生产环境中,经常遇到etcd集群出现单节点故障或者集群故障.针对这两种情况,进行故障修复.本文介绍etcd的单节点故障时,Emergency recovery manual
一、总体恢复流程
由于etcd的raft协议,The number of failed nodes that the entire cluster can tolerate is (n-1)/ 2,So in the event of a single node failure,A single cluster is still available,It will not affect the reading and writing of the business.
整体的恢复流程如下
二、Detailed recovery instructions
2.1 环境信息
使用本地的vmstation创建3个虚拟机,信息如下
节点名称 | 节点IP | 节点配置 | 操作系统 | Etcd版本 | Docker版本 |
---|---|---|---|---|---|
etcd1 | 192.168.82.128 | 1c1g 20g | CentOS7.4 | v3.5 | 13.1 |
etcd2 | 192.168.82.129 | 1c1g 20g | CentOS7.4 | v3.5 | 13.1 |
etcd3 | 192.168.82.130 | 1c1g 20g | CentOS7.4 | v3.5 | 13.1 |
假设etcd2节点异常,And the local data has been corrupted.
2.2 The cluster deletes the abnormal node
通过member removeCommand to delete abnormal nodes,At this point the entire cluster has only 2个节点,不会触发master重新选主,集群正常运行.
查看当前集群状态
export ETCDCTL_API=3
export ETCD_ENDPOINTS=192.168.92.128:2379,192.168.92.129:2379,192.168.92.130:2379
etcdctl --endpoints=$ETCD_ENDPOINTS --write-out=table member list
etcdctl --endpoints=$ETCD_ENDPOINTS --write-out=table endpoint status
2.2 Delete abnormal node data
2.2.1 删除异常member
docker stop etcd2
2.2.2 删除数据
由于数据通过-v /data/etcd:/data/etcd的方式挂载,Therefore delete the corresponding data,会清理etcd数据.
rm -rf /data/etcd/*
2.3 Re-add nodes to the cluster
通过如下命令,Add the abnormal node to the cluster,Wait for the corresponding node to start,Cluster data synchronization and master selection are automatically completed
export ETCDCTL_API=3
export ETCD_ENDPOINTS=192.168.92.128:2379,192.168.92.129:2379,192.168.92.130:2379
etcdctl --endpoints=$ETCD_ENDPOINTS member add etcd2 --peer-urls=http://192.168.92.129:2380
2.4 启动节点
2.4.1 The complete startup script is
[[email protected] ~]#
[[email protected] ~]# cat start_etcd.sh
/bin/sh
name="etcd2"
host="192.168.92.129"
cluster="etcd1=http://192.168.92.128:2380,etcd2=http://192.168.92.129:2380,etcd3=http://192.168.92.130:2380"
docker run -d --privileged=true -p 2379:2379 -p 2380:2380 -v /data/etcd:/data/etcd --name $name --net=host quay.io/coreos/etcd:v3.5.0 /usr/local/bin/etcd --name $name --data-dir /data/etcd --listen-client-urls http://$host:2379 --advertise-client-urls http://$host:2379 --listen-peer-urls http://$host:2380 --initial-advertise-peer-urls http://$host:2380 --initial-cluster $cluster --initial-cluster-token tkn --initial-cluster-state existing --log-level info --logger zap --log-outputs stderr
注意,由于etcd的数据已经被删除,So when the current node restarts,Get data from other nodes,因此需要调整参数–initial-cluster-state,从new改成existing
--initial-cluster-state existing
2.4.2 查看日志
docker logs 8bf31834f8ce
2.4 Wait for the cluster data to finish syncing and recover
查看当前集群的member信息
export ETCDCTL_API=3
export ETCD_ENDPOINTS=192.168.92.128:2379,192.168.92.129:2379,192.168.92.130:2379
etcdctl --endpoints=$ETCD_ENDPOINTS --write-out=table member list
etcdctl --endpoints=$ETCD_ENDPOINTS --write-out=table endpoint status
总结
Because the overall cluster has multiple copies,So when a single node is abnormal,It does not cause the entire cluster to be abnormal,It can be recovered as long as the corresponding node is started normally and the data is synchronized.
边栏推荐
- Threatless Technology-TVD Daily Vulnerability Intelligence-2022-8-1
- iptables入门
- 安装cuda10.2下paddlepaddle的安装
- xx is not recognized as internal or external command
- 智能合约 ——— app评分合约
- buildroot设置dhcp
- SECURITY DAY01 (Monitoring Overview, Zabbix Basics, Zabbix Monitoring Services)
- 记录一个刚写的Makefile
- SECURITY DAY01(监控概述 、 Zabbix基础 、 Zabbix监控服 )
- 空间点模式方法_一阶效应和二阶效应
猜你喜欢
Local yum source build
升级到Window11体验
ramdisk实践1:将根文件系统集成到内核中
VMware workstation 16 installation and configuration
buildroot嵌入式文件系统中vi显示行号
SECURITY DAY04 (Prometheus server, Prometheus monitored terminal, Grafana, monitoring database)
MoreFileRename批量文件改名工具
vi display line number in buildroot embedded file system
HCIP-BGP的选路实验
智能合约 ——— app评分合约
随机推荐
ETCD容器化搭建集群
查看内核版本和发行版版本
Threatless Technology-TVD Daily Vulnerability Intelligence-2022-8-6
Threatless Technology-TVD Daily Vulnerability Intelligence-2022-7-27
HCIP--交换基础
ETCD集群故障应急恢复-本地数据可用
TCP 三次握手、四次断开
Record a Makefile just written
xx is not recognized as internal or external command
MySQl进阶之索引结构
China Mobile Communications Group Co., Ltd.: Business Power of Attorney
deepin v20.6+cuda+cudnn+anaconda(miniconda)
命令输出给变量
ansible batch install zabbix-agent
cloudreve使用体验
arcgis填坑_3
无胁科技-TVD每日漏洞情报-2022-7-20
iptables nat
HCIP MPLS/BGP综合实验
空间点模式方法_一阶效应和二阶效应