当前位置:网站首页>Stability building best practices
Stability building best practices
2022-04-23 06:08:00 【New ape and horse】
Catalog
One Reasonably set the network timeout
Two Separation of core and non core business
3、 ... and The reasonable configuration tomcat Number of threads
Four Try not to retry in the code
5、 ... and Weaken unnecessary dependencies
6、 ... and Database transaction reduction
7、 ... and SQL Key points of performance optimization
8、 ... and The online process should be as smooth as possible
Nine Current limiting 、 Fuse 、 Downgrade 、 Queue to handle abnormal traffic
Ten Improve log and monitoring functions
This article mainly explains , Some experience and summary in high availability and high performance construction .
One Reasonably set the network timeout
1.1 What is network call timeout ?
Such as between application servers 、 Application server and redis Between servers 、 Application server and mq Network requests between servers , These network requests typically have three timeout periods :
- connectRequestTimeout : Get connection timeout from client pool .
- connectTimeout: Timeout for establishing connection between client and server .
- socketTimeout : Timeout time for client and server to read data .
1.2 Why do I need to set the timeout ?
Because the resources of the connection pool or thread pool of the system are limited , Suppose no timeout is set , Due to slow downstream service or abnormal downstream service , There will be a large number of threads waiting for the downstream service to return ,
Some normal requests will wait or be rejected , Slow service response , Throughput drops ,QPS Lower , The user experience is getting worse . This situation can be avoided by setting the timeout .
1.3 How to reasonably set the timeout ?
The simple principle is :socketTimeout, connectTimeout,connectRequestTimeout 3 A timeout , Not more than 300ms, Try to be as short as possible when the system can accept .
According to the system 99 Line to set the timeout . So-called 99 Line , Is the minimum time required to meet 99% of network requests . To put it simply , Suppose we have an interface that requests one day 1 Ten thousand times ,
Calculation can guarantee 9900 The minimum time required for a request is called 99 Line . For specific calculation, please refer to this article (【python】numpy library np.percentile Detailed explanation _brucewong0516 The blog of -CSDN Blog _numpy percentile).
redis Normal reading and writing in 2-3ms, The timeout needs to be set shorter , Try not to exceed 50ms.mq The same goes for timeout .
Two Separation of core and non core business
Every company has core business and non core business , For the core business, we can sort out the core links , The so-called core link should be the most valuable business of the company , On the link, the core service can only call the core service ,
Non core business can only be called . If the company can , Realize dual machine rooms for core business 、 Even multi machine room deployment .
3、 ... and The reasonable configuration tomcat Number of threads
The number of threads is reasonable , such as CPU The intensive type can be configured less ,IO The intensive type can be configured more . Please refer to this article for details (tomcat Of maxThreads、acceptCount( Maximum number of threads 、 The maximum number of queues )_ A new ape -CSDN Blog _max-threads).
Four Try not to retry in the code
If there is no special reason , Please do not try again in the code , Retry should be business retry , Retry by upstream business personnel .
Why not try again in the code ?
If you try again in code , This area is prone to flow amplification , At ordinary times 1 Times of quantity , If you try again 5 Time , The traffic will be normal 5 times . It's easy to hang up the service .
5、 ... and Weaken unnecessary dependencies
What is weak dependency ?
The so-called weak dependency , It is to weakly rely on the process with less impact on the main process .
Such as mq/redis In the event of a timeout exception when , If the main functions are not affected , need catch abnormal , Don't throw it to the top . for instance :
String value = redis.get(“key”);
if(value == null) {
value = dao.getOneColumn(“”);
}
}
without catch Weak dependence redis, stay redis When it breaks down , Throws an exception directly to the upper level , Unable to read data from database .
mq Fault tolerance except catch outside , Need to consider in mq Is not available , How to deal with lost messages ? For example, change hair mq Log for , Post reprocessing .
6、 ... and Database transactions Streamlining
Operations within a transaction should be as few as possible , Reduce transaction execution time , Absolutely not RPC call .
7、 ... and SQL Key points of performance optimization
7.1 How to define slow SQL?
Theoretically, the user side SQL The execution of should be in 10ms Inside , exceed 50ms It can be classified as slow SQL.
7.2 SQL How much is the limit on the number of queries appropriate ?
such as limit The limit is not allowed to exceed 100 perhaps 200,in Of id Restrictions are also 100 perhaps 200.
7.3 How to view new SQL No problem ?
Use explain You can see SQL Implementation plan of .
Add indexes to related query fields , Speed up query .
8、 ... and The online process should be as smooth as possible
For example, database migration , Two core points must be considered in the process of scheme design and online process : Control the influence surface to the greatest extent 、 Fast recovery .
How to control the influence surface to the greatest extent ?
- Consider the grayscale process , Step by step .
- To verify the functionality , You can add whitelists and so on .
How to restore functions quickly ?
- Add some dynamic switches , Can quickly restore functions .
- Be prepared in advance , Online rollback scheme .
Nine Current limiting 、 Fuse 、 Downgrade 、 Queue to handle abnormal traffic
The author has met , The service is temporarily unavailable due to abnormal traffic , Details can be found in ( Remember a lot at a time 499http Occurrence and handling of status code problems _ A new ape -CSDN Blog _499 Status code ).
Yes, of course , For abnormal traffic in the service, you can also use this method 、 Degradation and queuing technology . As long as the problem can be solved , All are ok Of .
Ten Improve log and monitoring functions
We should print reasonable and necessary log data .
10.1 What is a reasonable and necessary log ?
Can you troubleshoot business problems for us 、 Logs for troubleshooting system problems are reasonable and necessary .
10.2 Why do we need to improve the monitoring function ?
If there's no monitoring , We will feel that our services are actually running naked , There is a problem with the hypothesis , Monitoring can be faster 、 More effectively help us find 、 Reappear 、 solve the problem .
版权声明
本文为[New ape and horse]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220533487246.html
边栏推荐
- Multithreading and high concurrency (1) -- basic knowledge of threads (implementation, common methods, state)
- Framework analysis 1 Introduction to system architecture
- Implementation of displaying database pictures to browser tables based on thymeleaf
- Write your own redistemplate
- Pytorch notes - get familiar with the network construction method by building RESNET (complete code)
- Understanding and use of tp50, tp90 and tp99
- 数字图像处理基础(冈萨雷斯)二:灰度变换与空间滤波
- Pytorch学习记录(十):数据预处理+Batch Normalization批处理(BN)
- Pytorch——数据加载和处理
- 治療TensorFlow後遺症——簡單例子記錄torch.utils.data.dataset.Dataset重寫時的圖片維度問題
猜你喜欢
Anaconda
Pyqy5 learning (III): qlineedit + qtextedit
Pytorch learning record (V): back propagation + gradient based optimizer (SGD, adagrad, rmsporp, Adam)
PyTorch笔记——实现线性回归完整代码&手动或自动计算梯度代码对比
Reading of denoising papers - [cvpr2022] blind2blind: self supervised image denoising with visible blind spots
線性代數第二章-矩陣及其運算
Pytorch学习记录(七):处理数据和训练模型的技巧
Software architecture design - software architecture style
JDBC connection database
Pytorch Learning record (XIII): Recurrent Neural Network
随机推荐
Write your own redistemplate
Software architecture design - software architecture style
Chapter 3 of linear algebra - Elementary Transformation of matrix and system of linear equations
Pytorch学习记录(五):反向传播+基于梯度的优化器(SGD,Adagrad,RMSporp,Adam)
Reading of denoising papers - [cvpr2022] blind2blind: self supervised image denoising with visible blind spots
JDBC connection database
How to grow at work
Numpy common function table sorting of data processing
RedHat realizes keyword search in specific text types under the directory and keyword search under VIM mode
Understanding and use of tp50, tp90 and tp99
深入理解去噪论文——FFDNet和CBDNet中noise level与噪声方差之间的关系探索
对比学习论文——[MoCo,CVPR2020]Momentum Contrast for Unsupervised Visual Representation Learning
Programming record - picture rotation function SciPy ndimage. Simple use and effect observation of rotate()
Pytorch学习记录(三):神经网络的结构+使用Sequential、Module定义模型
What is the difference between the basic feasible solution and the basic feasible solution in linear programming?
图解numpy数组矩阵
PyQt5学习(一):布局管理+信号和槽关联+菜单栏与工具栏+打包资源包
Get the value of state in effects in DVA
Practical operation - Nacos installation and configuration
PHP processing JSON_ Decode() parses JSON stringify