当前位置:网站首页>Stability building best practices
Stability building best practices
2022-04-23 06:08:00 【New ape and horse】
Catalog
One Reasonably set the network timeout
Two Separation of core and non core business
3、 ... and The reasonable configuration tomcat Number of threads
Four Try not to retry in the code
5、 ... and Weaken unnecessary dependencies
6、 ... and Database transaction reduction
7、 ... and SQL Key points of performance optimization
8、 ... and The online process should be as smooth as possible
Nine Current limiting 、 Fuse 、 Downgrade 、 Queue to handle abnormal traffic
Ten Improve log and monitoring functions
This article mainly explains , Some experience and summary in high availability and high performance construction .
One Reasonably set the network timeout
1.1 What is network call timeout ?
Such as between application servers 、 Application server and redis Between servers 、 Application server and mq Network requests between servers , These network requests typically have three timeout periods :
- connectRequestTimeout : Get connection timeout from client pool .
- connectTimeout: Timeout for establishing connection between client and server .
- socketTimeout : Timeout time for client and server to read data .
1.2 Why do I need to set the timeout ?
Because the resources of the connection pool or thread pool of the system are limited , Suppose no timeout is set , Due to slow downstream service or abnormal downstream service , There will be a large number of threads waiting for the downstream service to return ,
Some normal requests will wait or be rejected , Slow service response , Throughput drops ,QPS Lower , The user experience is getting worse . This situation can be avoided by setting the timeout .
1.3 How to reasonably set the timeout ?
The simple principle is :socketTimeout, connectTimeout,connectRequestTimeout 3 A timeout , Not more than 300ms, Try to be as short as possible when the system can accept .
According to the system 99 Line to set the timeout . So-called 99 Line , Is the minimum time required to meet 99% of network requests . To put it simply , Suppose we have an interface that requests one day 1 Ten thousand times ,
Calculation can guarantee 9900 The minimum time required for a request is called 99 Line . For specific calculation, please refer to this article (【python】numpy library np.percentile Detailed explanation _brucewong0516 The blog of -CSDN Blog _numpy percentile).
redis Normal reading and writing in 2-3ms, The timeout needs to be set shorter , Try not to exceed 50ms.mq The same goes for timeout .
Two Separation of core and non core business
Every company has core business and non core business , For the core business, we can sort out the core links , The so-called core link should be the most valuable business of the company , On the link, the core service can only call the core service ,
Non core business can only be called . If the company can , Realize dual machine rooms for core business 、 Even multi machine room deployment .
3、 ... and The reasonable configuration tomcat Number of threads
The number of threads is reasonable , such as CPU The intensive type can be configured less ,IO The intensive type can be configured more . Please refer to this article for details (tomcat Of maxThreads、acceptCount( Maximum number of threads 、 The maximum number of queues )_ A new ape -CSDN Blog _max-threads).
Four Try not to retry in the code
If there is no special reason , Please do not try again in the code , Retry should be business retry , Retry by upstream business personnel .
Why not try again in the code ?
If you try again in code , This area is prone to flow amplification , At ordinary times 1 Times of quantity , If you try again 5 Time , The traffic will be normal 5 times . It's easy to hang up the service .
5、 ... and Weaken unnecessary dependencies
What is weak dependency ?
The so-called weak dependency , It is to weakly rely on the process with less impact on the main process .
Such as mq/redis In the event of a timeout exception when , If the main functions are not affected , need catch abnormal , Don't throw it to the top . for instance :
String value = redis.get(“key”);
if(value == null) {
value = dao.getOneColumn(“”);
}
}
without catch Weak dependence redis, stay redis When it breaks down , Throws an exception directly to the upper level , Unable to read data from database .
mq Fault tolerance except catch outside , Need to consider in mq Is not available , How to deal with lost messages ? For example, change hair mq Log for , Post reprocessing .
6、 ... and Database transactions Streamlining
Operations within a transaction should be as few as possible , Reduce transaction execution time , Absolutely not RPC call .
7、 ... and SQL Key points of performance optimization
7.1 How to define slow SQL?
Theoretically, the user side SQL The execution of should be in 10ms Inside , exceed 50ms It can be classified as slow SQL.
7.2 SQL How much is the limit on the number of queries appropriate ?
such as limit The limit is not allowed to exceed 100 perhaps 200,in Of id Restrictions are also 100 perhaps 200.
7.3 How to view new SQL No problem ?
Use explain You can see SQL Implementation plan of .
Add indexes to related query fields , Speed up query .
8、 ... and The online process should be as smooth as possible
For example, database migration , Two core points must be considered in the process of scheme design and online process : Control the influence surface to the greatest extent 、 Fast recovery .
How to control the influence surface to the greatest extent ?
- Consider the grayscale process , Step by step .
- To verify the functionality , You can add whitelists and so on .
How to restore functions quickly ?
- Add some dynamic switches , Can quickly restore functions .
- Be prepared in advance , Online rollback scheme .
Nine Current limiting 、 Fuse 、 Downgrade 、 Queue to handle abnormal traffic
The author has met , The service is temporarily unavailable due to abnormal traffic , Details can be found in ( Remember a lot at a time 499http Occurrence and handling of status code problems _ A new ape -CSDN Blog _499 Status code ).
Yes, of course , For abnormal traffic in the service, you can also use this method 、 Degradation and queuing technology . As long as the problem can be solved , All are ok Of .
Ten Improve log and monitoring functions
We should print reasonable and necessary log data .
10.1 What is a reasonable and necessary log ?
Can you troubleshoot business problems for us 、 Logs for troubleshooting system problems are reasonable and necessary .
10.2 Why do we need to improve the monitoring function ?
If there's no monitoring , We will feel that our services are actually running naked , There is a problem with the hypothesis , Monitoring can be faster 、 More effectively help us find 、 Reappear 、 solve the problem .
版权声明
本文为[New ape and horse]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220533487246.html
边栏推荐
- Delete and truncate
- Pytorch——数据加载和处理
- Pytorch learning record (IX): convolutional neural network in pytorch
- 线性代数第三章-矩阵的初等变换与线性方程组
- Pyqy5 learning (III): qlineedit + qtextedit
- Automatic control (Han min version)
- The problem that the page will refresh automatically after clicking the submit button on the form is solved
- A sharp tool to improve work efficiency
- Reading of denoising paper - [ridnet, iccv19] real image denoising with feature attention
- Treatment of tensorflow sequelae - simple example record torch utils. data. dataset. Picture dimension problem when rewriting dataset
猜你喜欢

Practical operation - Nacos installation and configuration

Pytorch notes - complete code for linear regression & manual or automatic calculation of gradient code comparison

Dva中在effects中获取state的值

PyQy5学习(四):QAbstractButton+QRadioButton+QCheckBox
![Paper on Image Restoration - [red net, nips16] image restoration using very deep revolutionary encoder decoder networks wi](/img/1b/4eea05e2634780f45b44273d2764e3.png)
Paper on Image Restoration - [red net, nips16] image restoration using very deep revolutionary encoder decoder networks wi

Fundamentals of digital image processing (Gonzalez) II: gray transformation and spatial filtering

Algèbre linéaire chapitre 1 - déterminants
![如何利用对比学习做无监督——[CVPR22]Deraining&[ECCV20]Image Translation](/img/33/780b80693f70112eebc10941f7c134.png)
如何利用对比学习做无监督——[CVPR22]Deraining&[ECCV20]Image Translation

The bottom implementation principle of thread - static agent mode
![图像恢复论文——[RED-Net, NIPS16]Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks wi](/img/1b/4eea05e2634780f45b44273d2764e3.png)
图像恢复论文——[RED-Net, NIPS16]Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks wi
随机推荐
EditorConfig
去噪论文阅读——[RIDNet, ICCV19]Real Image Denoising with Feature Attention
Create binary tree
图像恢复论文简记——Uformer: A General U-Shaped Transformer for Image Restoration
Dva中在effects中获取state的值
Multithreading and high concurrency (2) -- detailed explanation of synchronized usage
Pytorch notes - get familiar with the network construction method by building RESNET (complete code)
编写一个自己的 RedisTemplate
去噪论文阅读——[CVPR2022]Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots
在Jupyter notebook中用matplotlib.pyplot出现服务器挂掉、崩溃的问题
LDCT图像重建论文——Eformer: Edge Enhancement based Transformer for Medical Image Denoising
Postfix变成垃圾邮件中转站后的补救
治疗TensorFlow后遗症——简单例子记录torch.utils.data.dataset.Dataset重写时的图片维度问题
What is the difference between the basic feasible solution and the basic feasible solution in linear programming?
SQL injection
RPC must know and know
Pytorch learning record (XI): data enhancement, torchvision Explanation of various functions of transforms
Exception handling: grab and throw model
Linear algebra Chapter 2 - matrices and their operations
Numpy common function table sorting of data processing