当前位置:网站首页>Error alarm of Postgres master-slave replication delay monitoring
Error alarm of Postgres master-slave replication delay monitoring
2022-04-23 07:05:00 【A sunny afternoon】
In the use of Prometheus Monitored Postgres Database time , An alarm will be generated due to the delay of master-slave replication , But in fact, the problem of normal database , What we use exporter by https://github.com/prometheus-community/postgres_exporter, The alarm expression is :
pg_replication_lag > 300
The description of this indicator is as follows :
# HELP pg_replication_lag Replication lag behind master in seconds
# TYPE pg_replication_lag gauge
pg_replication_lag{server=""}
But actually , This indicator indicates how long there has been no replication between master and slave , from https://github.com/prometheus-community/postgres_exporter/blob/master/queries.yaml We can find out the use of this indicator SQL by :
SELECT
CASE
WHEN NOT pg_is_in_recovery() THEN 0
ELSE GREATEST(0, EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_TIMESTAMP())))
END
AS LAG
The official documents describe the two functions as follows :
Name | Return Type | Description |
---|---|---|
pg_is_in_recovery() | bool | True if recovery is still in progress. |
pg_last_xact_replay_TIMESTAMP() | timestamp with time zone | Get time stamp of last transaction replayed during recovery. This is the time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL. |
this SQL As the result of the : The main warehouse is 0, From the library is the difference between the current time and the last recovery transaction time .
So a special case is , If there is no transaction commit in the main database , that pg_last_xact_replay_TIMESTAMP()
The value of remains unchanged , Corresponding pg_replication_lag
The value will continue to increase , However, it does not mean that the master-slave replication fails .
therefore , If you want to avoid false alarms , We can create a test table in the main library , Update the data in the table every minute , Keep the database active , In this way, if an alarm occurs, it really indicates that there is a serious fault in the data . The specific methods are as follows :
psql (11.7)
Type "help" for help.
postgres=# CREATE DATABASE test;
postgres=# \c test
postgres=# CREATE TABLE test(id INT PRIMARY KEY NOT NULL DEFAULT 1, time varchar(255));
postgres=# INSERT INTO "public"."test"("time") VALUES ('123');
Configure scheduled tasks :
# crontab -e
* * * * * time=`date`;/usr/pgsql-11/bin/psql -h localhost -p 18083 -d test -c "UPDATE public.test SET time = '${time}'"
版权声明
本文为[A sunny afternoon]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230600558247.html
边栏推荐
- Ansible basic commands, roles, built-in variables and tests judgment
- [step by step, even thousands of miles] MySQL reports a large number of unauthenticated user connection errors
- Prometheus Thanos与Cortex组件比较
- How to use DBA_ hist_ active_ sess_ History analysis database history performance problems
- rdma 编程详解
- rdma 介绍
- 冬季实战营动手实战-上云必备环境准备,动手实操快速搭建LAMP环境 领鼠标 云小宝 背包 无影
- 异常记录-17
- Redis FAQ
- 异常记录-14
猜你喜欢
MySQL 【读写锁+表锁+行锁+MVCC】
Thanos Compact组件测试总结(处理历史数据)
Kids and COVID: why young immune systems are still on top
High performance gateway for interconnection between VPC and IDC based on dpdk
rdam 原理解析
Prometheus的relabel_configs和metric_relabel_configs解释及用法示例
Oracle redo log产生量大的查找思路与案例
Thanos Compactor组件使用
使用prom-label-proxy实现Prometheus Thanos的基于标签的多租户读
[OSS file upload quick start]
随机推荐
Using Prom label proxy to implement label based multi tenant reading of Prometheus thanos
OVS and OVS + dpdk architecture analysis
异常记录-8
【不积跬步无以至千里】Oracle应用导数Ora-01455报错处理
异常记录-12
【Shell脚本练习】将新加的磁盘批量添加到指定的VG中
【不积跬步无以至千里】MySQL报大量unauthenticated user连接错误
Introduction to the top 12 domestic databases in 2021
关于 synchronized、ThreadLocal、线程池、Atomic 原子类的 JUC 面试题
MySQL【ACID+隔离级别+ redo log + undo log】
异常记录-18
Kids and COVID: why young immune systems are still on top
How does VirtualBox modify the IP network segment assigned to the virtual machine in the "network address translation (NAT)" network mode
使用sed命令来高效处理文本
Winter combat camp hands-on combat - cloud essential environment preparation, hands-on practical operation, quickly build lamp environment, lead mouse cloud Xiaobao backpack without shadow
[no steps in a small step to a thousand miles] Oracle Application derivative ora-01455 error reporting processing
try catch 不能捕获异步错误
LeetCode刷题|368最大整除子集(动态规划)
将博客搬至CSDN
异常记录-20