当前位置:网站首页>关于Postgres主从复制延迟监控的错误告警问题
关于Postgres主从复制延迟监控的错误告警问题
2022-04-23 06:01:00 【洒满阳光的午后】
在使用Prometheus监控的Postgres数据库时, 会发生主从复制延迟产生告警,但实际上数据库正常的问题,我们使用的exporter为https://github.com/prometheus-community/postgres_exporter,告警表达式为:
pg_replication_lag > 300
这个指标的说明如下:
# HELP pg_replication_lag Replication lag behind master in seconds
# TYPE pg_replication_lag gauge
pg_replication_lag{server=""}
但实际上,这个指标表示的是主从之间有多长时间未发生复制,从https://github.com/prometheus-community/postgres_exporter/blob/master/queries.yaml我们可以查到这个指标使用的SQL为:
SELECT
CASE
WHEN NOT pg_is_in_recovery() THEN 0
ELSE GREATEST(0, EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_TIMESTAMP())))
END
AS LAG
官方文档对两个函数的说明如下:
Name | Return Type | Description |
---|---|---|
pg_is_in_recovery() | bool | True if recovery is still in progress. |
pg_last_xact_replay_TIMESTAMP() | timestamp with time zone | Get time stamp of last transaction replayed during recovery. This is the time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL. |
此SQL的结果为:主库为0,从库为当前时间与最后一次恢复事务时间之差。
所以一个特殊情况是,如主库中无事务提交,那么pg_last_xact_replay_TIMESTAMP()
的值会保持不变,相对应的pg_replication_lag
值则会不断增加,但并不代表主从复制发生故障。
因此,如果想要避免误告警,我们可以在主库创建一张测试表,每分钟更新表中的数据,保持数据库的活跃,这样如果发生告警才真正表示数据发生了严重故障。具体做法如下:
psql (11.7)
Type "help" for help.
postgres=# CREATE DATABASE test;
postgres=# \c test
postgres=# CREATE TABLE test(id INT PRIMARY KEY NOT NULL DEFAULT 1, time varchar(255));
postgres=# INSERT INTO "public"."test"("time") VALUES ('123');
配置定时任务:
# crontab -e
* * * * * time=`date`;/usr/pgsql-11/bin/psql -h localhost -p 18083 -d test -c "UPDATE public.test SET time = '${time}'"
版权声明
本文为[洒满阳光的午后]所创,转载请带上原文链接,感谢
https://zhangrongjie.blog.csdn.net/article/details/124351273
边栏推荐
- 【代码解析(7)】Communication-Efficient Learning of Deep Networks from Decentralized Data
- ES入门学习笔记
- Use the SED command to process text efficiently
- CentOS8搭建PHP8.0.3运行环境
- 模仿扇贝短文阅读页面
- High performance gateway for interconnection between VPC and IDC based on dpdk
- Typescript (top)
- surprise库中evaluate函数弃用解决方法
- Batch modify / batch update the value of a field in the database
- Installing redis using a small leather panel in the window environment
猜你喜欢
随机推荐
用Future与CountDownLatch实现多线程执行多个异步任务,任务全部完成后返回结果
Installing redis using a small leather panel in the window environment
offset和client獲取dom元素比特置信息
MySQL【sql性能分析+sql调优】
【代码解析(4)】Communication-Efficient Learning of Deep Networks from Decentralized Data
数据库基本概念:OLTP/OLAP/HTAP、RPO/RTO、MPP
Ansible basic commands, roles, built-in variables and tests judgment
Curry realization of function continuous call calculation and accumulation
Include of ansible module_ Tasks: why is the imported task not executed after adding tags?
SQL学习|基础查询与排列
关于 synchronized、ThreadLocal、线程池、Atomic 原子类的 JUC 面试题
rdma 编程详解
bcc安装和基本工具使用说明
The arithmetic square root of X in leetcode
时间格式不对,运行sql文件报错
JS handwriting compatibility event binding
[MySQL basics] startup options, system variables and status variables
TP6 的 each 遍历用法
Redis 详解(基础+数据类型+事务+持久化+发布订阅+主从复制+哨兵+缓存穿透、击穿、雪崩)
JQ序列化后PHP后台解析