当前位置:网站首页>Error alarm of Postgres master-slave replication delay monitoring
Error alarm of Postgres master-slave replication delay monitoring
2022-04-23 07:05:00 【A sunny afternoon】
In the use of Prometheus Monitored Postgres Database time , An alarm will be generated due to the delay of master-slave replication , But in fact, the problem of normal database , What we use exporter by https://github.com/prometheus-community/postgres_exporter, The alarm expression is :
pg_replication_lag > 300
The description of this indicator is as follows :
# HELP pg_replication_lag Replication lag behind master in seconds
# TYPE pg_replication_lag gauge
pg_replication_lag{server=""}
But actually , This indicator indicates how long there has been no replication between master and slave , from https://github.com/prometheus-community/postgres_exporter/blob/master/queries.yaml We can find out the use of this indicator SQL by :
SELECT
CASE
WHEN NOT pg_is_in_recovery() THEN 0
ELSE GREATEST(0, EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_TIMESTAMP())))
END
AS LAG
The official documents describe the two functions as follows :
| Name | Return Type | Description |
|---|---|---|
| pg_is_in_recovery() | bool | True if recovery is still in progress. |
| pg_last_xact_replay_TIMESTAMP() | timestamp with time zone | Get time stamp of last transaction replayed during recovery. This is the time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL. |
this SQL As the result of the : The main warehouse is 0, From the library is the difference between the current time and the last recovery transaction time .
So a special case is , If there is no transaction commit in the main database , that pg_last_xact_replay_TIMESTAMP() The value of remains unchanged , Corresponding pg_replication_lag The value will continue to increase , However, it does not mean that the master-slave replication fails .
therefore , If you want to avoid false alarms , We can create a test table in the main library , Update the data in the table every minute , Keep the database active , In this way, if an alarm occurs, it really indicates that there is a serious fault in the data . The specific methods are as follows :
psql (11.7)
Type "help" for help.
postgres=# CREATE DATABASE test;
postgres=# \c test
postgres=# CREATE TABLE test(id INT PRIMARY KEY NOT NULL DEFAULT 1, time varchar(255));
postgres=# INSERT INTO "public"."test"("time") VALUES ('123');
Configure scheduled tasks :
# crontab -e
* * * * * time=`date`;/usr/pgsql-11/bin/psql -h localhost -p 18083 -d test -c "UPDATE public.test SET time = '${time}'"
版权声明
本文为[A sunny afternoon]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230600558247.html
边栏推荐
- 异常记录-19
- Construire un blog Cloud basé sur ECS (bénédiction sur le Code Cloud Xiaobao, explication détaillée de la tâche iphone13 gratuite)
- openvswitch 编译安装
- LeetCode刷题|368最大整除子集(动态规划)
- DNA reveals surprise ancestry of mysterious Chinese mummies
- Introduction to common APIs for EBFP programming
- Arranges the objects specified in the array in front of the array
- Winter combat camp hands-on combat - MySQL database rapid deployment practice lead mouse cloud Xiaobao
- virtio 与vhost_net介绍
- 异常记录-22
猜你喜欢

Winter combat camp hands-on combat - first understand the cloud foundation, hands-on practice ECS ECS ECS novice on the road to get the mouse cloud Xiaobao backpack shadowless

Prometheus Cortex使用Block存储时的相关问题

Winter combat camp hands-on combat - cloud essential environment preparation, hands-on practical operation, quickly build lamp environment, lead mouse cloud Xiaobao backpack without shadow

rdma 编程详解

冬季实战营 动手实战-初识上云基础,动手实操ECS云服务器新手上路 领鼠标 云小宝 背包 无影

阿里矢量库的图标使用教程(在线,下载)

Memcached source code analysis

通过源码探究@ModelAndView如何实现数据与页面的转发

阿里云日志服务sls的典型应用场景

Redis 详解(基础+数据类型+事务+持久化+发布订阅+主从复制+哨兵+缓存穿透、击穿、雪崩)
随机推荐
【Lombok快速入门】
qs. In the stringify interface, the input parameter is converted into a & connected string (with the application / x-www-form-urlencoded request header)
将博客搬至CSDN
js 格式化当前时间 日期推算
Practice of openvswitch VLAN network
Use the SED command to process text efficiently
SSM项目在阿里云部署
Thanos compact component test summary (processing historical data)
RAC环境集群组件gipc无法正确识别心跳网络状态问题分析
RAC环境数据库节点参数设置不当导致监听无法连接问题排查
Virtio and Vhost_ Net introduction
【Shell脚本练习】将新加的磁盘批量添加到指定的VG中
OVS and OVS + dpdk architecture analysis
Prometheus的relabel_configs和metric_relabel_configs解释及用法示例
JS format current time and date calculation
异常记录-15
基于ECS搭建云上博客(体验有礼)
[ES6 quick start]
try catch 不能捕获异步错误
一个DG环境的ORA-16047: DGID mismatch between destination setting and target database问题排查及监听VNCR特性