当前位置:网站首页>Stability building best practices
Stability building best practices
2022-04-23 06:08:00 【New ape and horse】
Catalog
One Reasonably set the network timeout
Two Separation of core and non core business
3、 ... and The reasonable configuration tomcat Number of threads
Four Try not to retry in the code
5、 ... and Weaken unnecessary dependencies
6、 ... and Database transaction reduction
7、 ... and SQL Key points of performance optimization
8、 ... and The online process should be as smooth as possible
Nine Current limiting 、 Fuse 、 Downgrade 、 Queue to handle abnormal traffic
Ten Improve log and monitoring functions
This article mainly explains , Some experience and summary in high availability and high performance construction .
One Reasonably set the network timeout
1.1 What is network call timeout ?
Such as between application servers 、 Application server and redis Between servers 、 Application server and mq Network requests between servers , These network requests typically have three timeout periods :
- connectRequestTimeout : Get connection timeout from client pool .
- connectTimeout: Timeout for establishing connection between client and server .
- socketTimeout : Timeout time for client and server to read data .
1.2 Why do I need to set the timeout ?
Because the resources of the connection pool or thread pool of the system are limited , Suppose no timeout is set , Due to slow downstream service or abnormal downstream service , There will be a large number of threads waiting for the downstream service to return ,
Some normal requests will wait or be rejected , Slow service response , Throughput drops ,QPS Lower , The user experience is getting worse . This situation can be avoided by setting the timeout .
1.3 How to reasonably set the timeout ?
The simple principle is :socketTimeout, connectTimeout,connectRequestTimeout 3 A timeout , Not more than 300ms, Try to be as short as possible when the system can accept .
According to the system 99 Line to set the timeout . So-called 99 Line , Is the minimum time required to meet 99% of network requests . To put it simply , Suppose we have an interface that requests one day 1 Ten thousand times ,
Calculation can guarantee 9900 The minimum time required for a request is called 99 Line . For specific calculation, please refer to this article (【python】numpy library np.percentile Detailed explanation _brucewong0516 The blog of -CSDN Blog _numpy percentile).
redis Normal reading and writing in 2-3ms, The timeout needs to be set shorter , Try not to exceed 50ms.mq The same goes for timeout .
Two Separation of core and non core business
Every company has core business and non core business , For the core business, we can sort out the core links , The so-called core link should be the most valuable business of the company , On the link, the core service can only call the core service ,
Non core business can only be called . If the company can , Realize dual machine rooms for core business 、 Even multi machine room deployment .
3、 ... and The reasonable configuration tomcat Number of threads
The number of threads is reasonable , such as CPU The intensive type can be configured less ,IO The intensive type can be configured more . Please refer to this article for details (tomcat Of maxThreads、acceptCount( Maximum number of threads 、 The maximum number of queues )_ A new ape -CSDN Blog _max-threads).
Four Try not to retry in the code
If there is no special reason , Please do not try again in the code , Retry should be business retry , Retry by upstream business personnel .
Why not try again in the code ?
If you try again in code , This area is prone to flow amplification , At ordinary times 1 Times of quantity , If you try again 5 Time , The traffic will be normal 5 times . It's easy to hang up the service .
5、 ... and Weaken unnecessary dependencies
What is weak dependency ?
The so-called weak dependency , It is to weakly rely on the process with less impact on the main process .
Such as mq/redis In the event of a timeout exception when , If the main functions are not affected , need catch abnormal , Don't throw it to the top . for instance :
String value = redis.get(“key”);
if(value == null) {
value = dao.getOneColumn(“”);
}
}
without catch Weak dependence redis, stay redis When it breaks down , Throws an exception directly to the upper level , Unable to read data from database .
mq Fault tolerance except catch outside , Need to consider in mq Is not available , How to deal with lost messages ? For example, change hair mq Log for , Post reprocessing .
6、 ... and Database transactions Streamlining
Operations within a transaction should be as few as possible , Reduce transaction execution time , Absolutely not RPC call .
7、 ... and SQL Key points of performance optimization
7.1 How to define slow SQL?
Theoretically, the user side SQL The execution of should be in 10ms Inside , exceed 50ms It can be classified as slow SQL.
7.2 SQL How much is the limit on the number of queries appropriate ?
such as limit The limit is not allowed to exceed 100 perhaps 200,in Of id Restrictions are also 100 perhaps 200.
7.3 How to view new SQL No problem ?
Use explain You can see SQL Implementation plan of .
Add indexes to related query fields , Speed up query .
8、 ... and The online process should be as smooth as possible
For example, database migration , Two core points must be considered in the process of scheme design and online process : Control the influence surface to the greatest extent 、 Fast recovery .
How to control the influence surface to the greatest extent ?
- Consider the grayscale process , Step by step .
- To verify the functionality , You can add whitelists and so on .
How to restore functions quickly ?
- Add some dynamic switches , Can quickly restore functions .
- Be prepared in advance , Online rollback scheme .
Nine Current limiting 、 Fuse 、 Downgrade 、 Queue to handle abnormal traffic
The author has met , The service is temporarily unavailable due to abnormal traffic , Details can be found in ( Remember a lot at a time 499http Occurrence and handling of status code problems _ A new ape -CSDN Blog _499 Status code ).
Yes, of course , For abnormal traffic in the service, you can also use this method 、 Degradation and queuing technology . As long as the problem can be solved , All are ok Of .
Ten Improve log and monitoring functions
We should print reasonable and necessary log data .
10.1 What is a reasonable and necessary log ?
Can you troubleshoot business problems for us 、 Logs for troubleshooting system problems are reasonable and necessary .
10.2 Why do we need to improve the monitoring function ?
If there's no monitoring , We will feel that our services are actually running naked , There is a problem with the hypothesis , Monitoring can be faster 、 More effectively help us find 、 Reappear 、 solve the problem .
版权声明
本文为[New ape and horse]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220533487246.html
边栏推荐
- Class loading and classloader understanding
- LDCT图像重建论文——Eformer: Edge Enhancement based Transformer for Medical Image Denoising
- Postfix变成垃圾邮件中转站后的补救
- Preparedstatement prevents SQL injection
- Pytorch notes - complete code for linear regression & manual or automatic calculation of gradient code comparison
- 无监督去噪——[TMI2022]ISCL: Interdependent Self-Cooperative Learning for Unpaired Image Denoising
- Development environment EAS login license modification
- Custom exception class
- 2. Devops sonar installation
- umi官网yarn create @umijs/umi-app 报错:文件名、目录名或卷标语法不正确
猜你喜欢

Pytorch學習記錄(十三):循環神經網絡((Recurrent Neural Network)

数字图像处理基础(冈萨雷斯)二:灰度变换与空间滤波

Paper on LDCT image reconstruction: edge enhancement based transformer for medical image denoising

Contrôle automatique (version Han min)

Configure domestic image accelerator for yarn

PyQy5学习(三):QLineEdit+QTextEdit

In depth understanding of the relationship between dncblevel and noise denoising in the paper

Pytorch notes - observe dataloader & build lenet with torch to process cifar-10 complete code

Pytorch学习记录(七):处理数据和训练模型的技巧
![无监督去噪——[TMI2022]ISCL: Interdependent Self-Cooperative Learning for Unpaired Image Denoising](/img/cd/10793445e6867eeee613b6ba4b85cf.png)
无监督去噪——[TMI2022]ISCL: Interdependent Self-Cooperative Learning for Unpaired Image Denoising
随机推荐
Pytorch学习记录(十三):循环神经网络((Recurrent Neural Network)
深入理解去噪论文——FFDNet和CBDNet中noise level与噪声方差之间的关系探索
Anaconda
JDBC tool class encapsulation
Pytorch——数据加载和处理
Pytorch learning record (XI): data enhancement, torchvision Explanation of various functions of transforms
Explain of MySQL optimization
Comparative study paper - [Moco, cvpr2020] momentum contract for unsupervised visual representation learning
Chapter 4 of line generation - linear correlation of vector systems
Pytorch学习记录(十一):数据增强、torchvision.transforms各函数讲解
Treatment of tensorflow sequelae - simple example record torch utils. data. dataset. Picture dimension problem when rewriting dataset
Preparedstatement prevents SQL injection
Implementation of displaying database pictures to browser tables based on thymeleaf
Fundamentals of SQL: first knowledge of database and SQL - installation and basic introduction - Alibaba cloud Tianchi
图像恢复论文——[RED-Net, NIPS16]Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks wi
Common programming records - parser = argparse ArgumentParser()
Create enterprise mailbox account command
Use Matplotlib. In Jupiter notebook Pyplot server hangs up and crashes
图解numpy数组矩阵
Dva中在effects中获取state的值