当前位置:网站首页>关于Postgres主从复制延迟监控的错误告警问题
关于Postgres主从复制延迟监控的错误告警问题
2022-04-23 06:01:00 【洒满阳光的午后】
在使用Prometheus监控的Postgres数据库时, 会发生主从复制延迟产生告警,但实际上数据库正常的问题,我们使用的exporter为https://github.com/prometheus-community/postgres_exporter,告警表达式为:
pg_replication_lag > 300
这个指标的说明如下:
# HELP pg_replication_lag Replication lag behind master in seconds
# TYPE pg_replication_lag gauge
pg_replication_lag{server=""}
但实际上,这个指标表示的是主从之间有多长时间未发生复制,从https://github.com/prometheus-community/postgres_exporter/blob/master/queries.yaml我们可以查到这个指标使用的SQL为:
SELECT
CASE
WHEN NOT pg_is_in_recovery() THEN 0
ELSE GREATEST(0, EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_TIMESTAMP())))
END
AS LAG
官方文档对两个函数的说明如下:
Name | Return Type | Description |
---|---|---|
pg_is_in_recovery() | bool | True if recovery is still in progress. |
pg_last_xact_replay_TIMESTAMP() | timestamp with time zone | Get time stamp of last transaction replayed during recovery. This is the time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL. |
此SQL的结果为:主库为0,从库为当前时间与最后一次恢复事务时间之差。
所以一个特殊情况是,如主库中无事务提交,那么pg_last_xact_replay_TIMESTAMP()
的值会保持不变,相对应的pg_replication_lag
值则会不断增加,但并不代表主从复制发生故障。
因此,如果想要避免误告警,我们可以在主库创建一张测试表,每分钟更新表中的数据,保持数据库的活跃,这样如果发生告警才真正表示数据发生了严重故障。具体做法如下:
psql (11.7)
Type "help" for help.
postgres=# CREATE DATABASE test;
postgres=# \c test
postgres=# CREATE TABLE test(id INT PRIMARY KEY NOT NULL DEFAULT 1, time varchar(255));
postgres=# INSERT INTO "public"."test"("time") VALUES ('123');
配置定时任务:
# crontab -e
* * * * * time=`date`;/usr/pgsql-11/bin/psql -h localhost -p 18083 -d test -c "UPDATE public.test SET time = '${time}'"
版权声明
本文为[洒满阳光的午后]所创,转载请带上原文链接,感谢
https://zhangrongjie.blog.csdn.net/article/details/124351273
边栏推荐
- Baidu map coordinates, Google coordinates and Tencent coordinates are mutually transformed
- 【不积跬步无以至千里】Oracle应用导数Ora-01455报错处理
- Typescript (lower)
- DNA reveals surprise ancestry of mysterious Chinese mummies
- MySQL索引【数据结构+索引创建原则】
- Passerelle haute performance pour l'interconnexion entre VPC et IDC basée sur dpdk
- 使用百度智能云人脸检测接口实现照片质量检测
- 异常记录-5
- 【Lombok快速入门】
- rdam 原理解析
猜你喜欢
随机推荐
PHP 无限极分类和树形
多线程
【不积跬步无以至千里】Oracle应用导数Ora-01455报错处理
Offset et client pour obtenir des informations sur l'emplacement des éléments Dom
Binary sum of leetcode questions
【OSS文件上传快速入门】
Prometheus Cortex多租户读写的实现
mysql密码过期的方法
rdam 原理解析
【MySQL基础篇】启动选项、系统变量、状态变量
[step by step, even thousands of miles] MySQL reports a large number of unauthenticated user connection errors
Tensorflow&&Pytorch常见报错
Practice of openvswitch VLAN network
异常记录-5
Use the SED command to process text efficiently
Introduction to DDoS attack / defense
【代码解析(6)】Communication-Efficient Learning of Deep Networks from Decentralized Data
SQL学习|窗口函数
异常记录-8
异常记录-6