当前位置:网站首页>RAC环境集群组件gipc无法正确识别心跳网络状态问题分析
RAC环境集群组件gipc无法正确识别心跳网络状态问题分析
2022-04-23 06:02:00 【还不算晕】
近期,某用户环境出现集群数据库一个节点无法启动、加入集群的问题。集群版本为11.2版本,检查集群日志,问题比较明显,集群alert日志中让看CSSD进程日志,CSSD中显示无心跳网络:has a disk HB, but no network HB;按如下步骤排查处理:
1.首先通过hosts文件确认了数据库心跳网络IP,并在操作系统层面确认心跳网卡状态正常并且可以互相PING通、SSH联通。
2.通过gpnptool get确认集群使用的心跳网络即为上一步检查的。
3.根据11.2集群组件功能,GIPC进程负责检测集群网络状态;查看GIPC进程日志,发现GIPC进程标识的心跳网络eth1 - rank 0; 即为异常状态(正常时为eth1 - rank 99)。
4.在步骤1中已经检查心跳网络在主机层面正常;因此结合集群组件的特性,尝试让触发集群重新检测心跳网络的状态(通常可以KILL GIPC进程或者重启集群软件);
5.本次KILL GIPC进程或者重启集群软件均无效,通过在操作系统 层面重启网卡,之后GIPC进程正确识别网卡状态,集群可以正常启动。
相关日志如下:
1.异常时的GPNP中心跳网络信息:
[grid@nphisdb1 gpnpd]$gpnptool get
Warning: some command line parameters were defaulted. Resulting command line:
/u01/app/11.2.0/grid_1/bin/gpnptool.bin get -o-
<?xml version="1.0" encoding="UTF-8"?><gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd" ProfileSequence="4" ClusterUId="a3268b3b769cdf7dbfc43c8ffd69e87f" ClusterName="nphisdb-cluster" PALocation=""><gpnp:Network-Profile><gpnp:HostNetwork id="gen" HostName="*"><gpnp:Network id="net1" IP="192.168.205.0" Adapter="eth0" Use="public"/><gpnp:Network id="net2" IP="10.10.10.0" Adapter="eth1" Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="/dev/oracleasm/disks" SPFile="+CRS/nphisdb-cluster/asmparameterfile/registry.253.1028034033"/><ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><ds:Reference URI=""><ds:Transforms><ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"> <InclusiveNamespaces xmlns="http://www.w3.org/2001/10/xml-exc-c14n#" PrefixList="gpnp orcl xsi"/></ds:Transform></ds:Transforms><ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>bjVFpM9uJREXWTWBP6GSC1A11Zw=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>UN5iBJd7mbmW8usjptRlTXtIBf05z76r+MyCNOSlXAGcsTE/zbb2BFeZkH0LMpyF5jbpQUzHE+U3wjUzZl/VsQS+y9QPeANVz1q1E9XDpfsxJwhRyhv0MNtK4/yy9xr9Y/zgTdg6dO2utm2Hy9pyCoDIrQ75gsmnZCtmPrfwR0A=</ds:SignatureValue></ds:Signature></gpnp:GPnP-Profile>
Success.
2.检查GIPC进程中网络的rank值
2022-03-20 13:30:58.580: [ CLSINET][346261248] Returning NETDATA: 1 interfaces
2022-03-20 13:30:58.580: [ CLSINET][346261248] # 0 Interface 'eth1',ip='10.10.10.1',mac='40-f2-e9-64-24-5e',mask='255.255.255.0',net='10.10.10.0',use='cluster_interconnect'
2022-03-20 13:31:00.903: [GIPCDMON][346261248] gipcdMonitorSaveInfMetrics: inf[ 0] eth1 - rank 0, avgms 30000000000.000000 [ 32 / 0 / 0 ]
2022-03-20 13:31:01.430: [GIPCDCLT][350463744] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 000000000000046d
2022-03-20 13:31:02.431: [GIPCDCLT][350463744] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000199
2022-03-20 13:31:03.432: [GIPCDCLT][350463744] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 000000000000032e
2022-03-20 13:31:03.584: [ CLSINET][346261248] Returning NETDATA: 1 interfaces
2022-03-20 13:31:03.584: [ CLSINET][346261248] # 0 Interface 'eth1',ip='10.10.10.1',mac='40-f2-e9-64-24-5e',mask='255.255.255.0',net='10.10.10.0',use='cluster_interconnect'
2022-03-20 13:31:06.433: [GIPCDCLT][350463744] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 000000000000046d
2022-03-20 13:31:07.434: [GIPCDCLT][350463744] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000199
3.重启集群软件无法解决后,重启网卡
4.检查GIPC进程日志,已经恢复正常rank 99
[grid@nphisdb1 gipcd]$tail -f gipcd.log |grep rank
2022-03-20 13:38:30.626: [GIPCDMON][346261248] gipcdMonitorSaveInfMetrics: inf[ 0] eth1 - rank 99, avgms 1.143791 [ 300 / 306 / 306 ]
2022-03-20 13:39:00.634: [GIPCDMON][346261248] gipcdMonitorSaveInfMetrics: inf[ 0] eth1 - rank 99, avgms 0.628019 [ 204 / 207 / 207 ]
2022-03-20 13:39:30.642: [GIPCDMON][346261248] gipcdMonitorSaveInfMetrics: inf[ 0] eth1 - rank 99, avgms 1.564626 [ 153 / 147 / 147 ]
2022-03-20 13:40:00.642: [GIPCDMON][346261248] gipcdMonitorSaveInfMetrics: inf[ 0] eth1 - rank 99, avgms 1.052632 [ 119 / 114 / 114 ]
2022-03-20 13:40:30.644: [GIPCDMON][346261248] gipcdMonitorSaveInfMetrics: inf[ 0] eth1 - rank 99, avgms 1.016949 [ 121 / 118 / 118 ]
2022-03-20 13:41:00.655: [GIPCDMON][346261248] gipcdMonitorSaveInfMetrics: inf[ 0] eth1 - rank 99, avgms 1.636364 [ 115 / 110 / 110 ]
2022-03-20 13:41:30.658: [GIPCDMON][346261248] gipcdMonitorSaveInfMetrics: inf[ 0] eth1 - rank 99, avgms 1.071429 [ 117 / 112 / 112 ]
版权声明
本文为[还不算晕]所创,转载请带上原文链接,感谢
https://blog.csdn.net/q947817003/article/details/124046037
边栏推荐
- 异常记录-11
- Alertmanager重复/缺失告警现象探究及两个关键参数group_wait和group_interval的释义
- Oracle net service: listener and service name resolution method
- Thanos如何为不同租户配置不同的数据保留时长
- 【不积跬步无以至千里】MySQL报大量unauthenticated user连接错误
- rdma网络介绍
- Prometheus Thanos快速指南
- BCC installation and basic tool instructions
- Web登录小案例(含验证码登录)
- Passerelle haute performance pour l'interconnexion entre VPC et IDC basée sur dpdk
猜你喜欢

Number of stair climbing methods of leetcode

冬季实战营动手实战-上云必备环境准备,动手实操快速搭建LAMP环境 领鼠标 云小宝 背包 无影

Detailed explanation of RDMA programming
![[OSS file upload quick start]](/img/db/9043d1df0163a7154bebac8e79097f.png)
[OSS file upload quick start]

Redis 详解(基础+数据类型+事务+持久化+发布订阅+主从复制+哨兵+缓存穿透、击穿、雪崩)

MySQL【ACID+隔离级别+ redo log + undo log】

rdma网络介绍

Winter combat camp hands-on combat - MySQL database rapid deployment practice lead mouse cloud Xiaobao

Prometheus Cortex架构概述(水平可扩展、高可用、多租户、长期存储)

OVS and OVS + dpdk architecture analysis
随机推荐
Ansible basic commands, roles, built-in variables and tests judgment
SQL学习|集合运算
异常记录-18
qs.stringify 接口里把入参转为&连接的字符串(配合application/x-www-form-urlencoded请求头)
Prometheus和Thanos Receiver的“写多租户”实现
Openvswitch compilation and installation
JS implementation of web page rotation map
用Future与CountDownLatch实现多线程执行多个异步任务,任务全部完成后返回结果
SSM项目在阿里云部署
SQL学习|窗口函数
Basic concepts of database: OLTP / OLAP / HTAP, RPO / RTO, MPP
LeetCode刷题|368最大整除子集(动态规划)
Passerelle haute performance pour l'interconnexion entre VPC et IDC basée sur dpdk
[ES6 quick start]
High performance gateway for interconnection between VPC and IDC based on dpdk
数据库基本概念:OLTP/OLAP/HTAP、RPO/RTO、MPP
【OSS文件上传快速入门】
How to use DBA_ hist_ active_ sess_ History analysis database history performance problems
rdam 原理解析
[Lombok quick start]