当前位置:网站首页>Chaos takes you to the chaos project quickly
Chaos takes you to the chaos project quickly
2022-04-23 07:05:00 【Alibaba cloud cloud Lab】
The scenario mainly introduces the idea and principle of chaos engineering , Experience fault drill (AHAS Chaos), Alibaba cloud's products in the field of chaos Engineering .,11 month 9 solstice 11 month 23 During the day ,, Complete the experience and get “TOMY A domeca alloy car model ”.
Address :https://developer.aliyun.com/adc/series/activity/1111
This scenario involves the following technologies or products :
Container services ACK:
Container services Kubernetes edition ( abbreviation ACK) Provide high performance and scalable container application management capabilities , Support the whole life cycle management of enterprise containerized applications ; It is the only one selected in China 2020 year Gartner Products reported by public cloud containers , And in 2019 year Forrester The container ranked first in China in the report ; It integrates Alibaba cloud virtualization 、 Storage 、 Network and security capabilities , Help enterprises run the cloud efficiently Kubernetes Containerized applications .
Trouble shooting Chaos:
Trouble shooting (Chaos) It is a cloud native chaos engineering platform , Provides a large-scale 、 Low cost 、 Influence is controllable 、 Diversified fault drill services .Chaos Provide one-stop architecture analysis 、 Fault patrol inspection 、 fault injection 、 System steady-state measurement and other functions , Help users enhance the fault tolerance and recoverability of distributed systems , Help the system go to the cloud smoothly .
Principle introduction
I'm sure you've seen the news of the PLA's XX regiment conducting military actual combat exercises in a certain place in the news broadcast , For the army , The best way to train is to practice . Even if the usual training has been very systematic and perfect , But in real combat, there may still be all kinds of unexpected problems in normal training . Therefore, only real combat exercises can find problems , In order to better plan the next stage of training , Improve the combat effectiveness of the army .
Design for failure
Isn't our software system the same ?“Everything fails, all the time.” In the normal development process , Even though we have envisioned all kinds of scenes , Fixed all the bug, But once online, there will always be all kinds of situations . Our software system , It also needs such practical exercises . You need to consider various failure scenarios at the beginning of the system design phase , Consider failure oriented as part of system design , And prepare strategies to recover from failure , This helps to better improve the availability of the whole system . Only you realize that things will fail over time , And integrate this idea into the architecture , Then when the failure occurs, you can be completely unaffected or minimize the loss of failure .
Trouble shooting
Chaos engineering is born under this failure oriented design idea . In the face of failure, design , Ask us to prepare for failure in advance , But whether these measures we prepared are really effective when the fault really occurs ? Whether the fault recovery tool realizes disaster recovery ? Whether the personnel handling the fault are skilled ? These problems , It's hard to verify , But it is often exposed in the real fault . And this is the meaning of chaos Engineering , Chaos engineering is like a drill , Through purposeful manufacturing failure , Identify possible weaknesses in the system , So as to verify that in a real and complex environment , System 、 Whether the personnel's ability to deal with various unexpected problems meets the expectations , Improve the immunity of the system . Trouble shooting (Chaos) That's what it provides .
Create experimental resources
Alibaba cloud provides ACK+Chaos Cloud product resources
4 Hour resource link :
https://developer.aliyun.com/adc/scenario/e9b27357ab9c4785bc7f43fb62f872e3
Installing the probe
1 Go back to the container service console page , Click... Above the left navigation bar < Icon .
2 In the left navigation bar of the cluster list page , Click Apply directory .
3 On the application directory page , single click ack-ahas-pilot.
4 stay ack-ahas-pilot The details page of , Click create .
Return to the following page , Indicates that the probe has been deployed .
View the overall system architecture through architecture awareness
1 Copy the application high availability service console address , stay Firefox The browser opens a new tab , Paste and access the container service application high availability service console .
https://chaos.console.aliyun.com/
2 At the top of the overview page , Select the region where the resource is located . For example, in the figure below , Switch to East China 1( Hangzhou ).
3 On the left navigation bar , Click fault drill > Architecture awareness .
4 On the architecture map page , single click Kubernetes View view in the monitor view card .
5 On the architecture map page , open Kubernetes Monitoring view drop-down list , Select command space as default, Then click OK to view the experimental resources Kubernetes Monitor view .
Automatic recovery scenario walkthrough
In the design of distributed systems, a fault-tolerant strategy is fault recovery (failback), Through health examination and other mechanisms , It can automatically redeploy when there is a problem with the machine or application . We make use of Chaos Carry out fault drill , Test whether our system has such capability
1. Make steady-state assumptions . Define a steady-state index , To evaluate the health state of the system and monitor and deal with it in the implementation of chaos .
We define steady state as Can visit our frontend Interface , And use all kinds of shopping carts normally 、 Ordering and other functions .
2 Simulate real events .
2.1 Switch the highly available service console for response . In the left navigation bar , Click my space .
2.2 On my space page , Click new blank walkthrough in the new walkthrough drop-down list .
2.3 On the walkthrough configuration page , Do the following :
(1) Set the drill name .
(2) In the walkthrough object configuration wizard , Drill application selection frontend, Apply group selection frontend-group, Select any machine from the machine list , Click Add walkthrough .
(3) In the select walkthrough fault dialog box , choice JAVA application > Delay > In container Java Delay , Click OK .
(4) On the walkthrough configuration page , Click inside the container Java Delay .
(5) In a container Java In the delay panel , Enter the fully qualified name of the class in turn 、 Method name 、 Process keyword and target container name , Click Close .
The fully qualified name of the class : Input com.alibabacloud.hipstershop.web.HealthController.
Method name : Input health.
Process keywords : Input java.
Target container name : choice frontend.
(6) In the drill content area , Click save .
(7) Click next .
(8) In the globally configured monitoring policy area , Click new policy .
(9) In the new policy dialog box , Select business monitoring > Business status observation (Http), Click OK .
(10) Observe in business status (Http) The palette , Request type selection get,URL Input http://<frontend The external endpoint of the >/.
explain :
frontend The external endpoint of the container service ACK Console frontend Access method tab of the service .
(11) In the global configuration wizard , Click next .
(12) In the success dialog , Click drill details .
2.4 On the drill details page , Click walkthrough .
2.5 In the start walkthrough dialog box , Click OK .
3 Test the effect of the experiment .
3.1 On the drill record details page , View business status observations (Http) Sequence diagram . You can see health The call of the interface after encountering a failure , First lower , Then it will automatically return to normal state immediately , It shows that our design worked .
3.2 Switch back to the container service ACK Console , stay frontend Service page , Click the event tab .
You can see frontend Automatic capacity expansion .
4 Terminate the experiment .
4.1 Switch the highly available service console for response . On the drill record details page , Click terminate .
4.2 In the stop walkthrough dialog box , Click OK .
4.3 Wait for the end of the drill scenario , In the result feedback dialog box , Click OK .
The strength depends on the scenario
In the microservices architecture , There are many dependencies between services . But when an unimportant weak dependency goes down , A robust system should still work properly . We make use of Chaos Carry out fault drill , Test our system's ability to handle strong and weak dependencies .
1. Make steady-state assumptions .
1.1 Switch back to the container service ACK Console , single click frontend The external endpoint of the .
1.2 stay Hipster Shop page , Refresh the page many times . You can see that the order of the products on the page is different every time . You can understand that the commodity recommendation service will recommend according to personalization , Make the product have priority . So we define steady state as , Every time you refresh the page , The order of goods is different .
2. Simulate real events .
2.1 Switch the highly available service console for response . On the left navigation bar , Click my space .
2.2 On my space page , Click new blank walkthrough in the new walkthrough drop-down list .
2.3 On the walkthrough configuration page , Do the following :
(1) Set the drill name .
(2) In the walkthrough object configuration wizard , Drill application selection recommendationservice, Apply group selection recommendationservice-group, Select the machine from the machine list , Click Add walkthrough .
(3) In the select walkthrough fault dialog box , choice JAVA application > Delay > In container Java Delay , Click OK .
(4) In the drill content area , Click inside the container Java Delay .
(5) In a container Java In the delay panel , Enter the fully qualified name of the class in turn 、 Method name 、 Process keyword and target container name , Click Close .
The fully qualified name of the class : Input com.alibabacloud.hipstershop.recomendationservice.service.RecommendationServiceImpl.
Method name : Input sortProduct.
Process keywords : Input java.
Target container name : choice recommendationservice.
(6) In the walkthrough object , Click save .
(7) Click next .
(8) In global configuration , Click next .
(9) In the success dialog , Click drill details .
2.4 On the drill details page , Click walkthrough .
2.5 In the start walkthrough dialog box , Click OK .
Test the effect of the experiment .
3.1 Switch back to the container service ACK Console . On the stateless page , single click frontend.
3.2 stay frontend page , Click the access method tab , And then click frontend The external endpoint of the .
3.3 stay Hipster Shop page , Refresh the page many times . You can find that every refresh , The product order will not change . Explain that the recommended service is down , But it didn't affect other services .
Terminate the experiment .
4.1 Switch to the application high availability service console , On the drill record details page , Click terminate .
4.2 In the stop walkthrough dialog box , Click OK .
4.3 In the result feedback dialog box , Click OK .
Failed to retry scenario drill
In the microservices architecture , A large system is split into several small services , There are a large number of small services RPC call , It is often caused by network jitter and other reasons RPC Call failed , At this time, using the retrial mechanism can improve the final success rate of the request , Reduce the impact of failures , Make the system more stable . We use Chaos, Failed to inject into the system , Look at the performance of system failure retry .
Make steady-state assumptions .
1.1 Switch back to the container service ACK Console , On the stateless page , single click cartservice.
1.2 stay cartservice page , Click zoom .
1.3 In the zoom dialog , Change the required number of container groups to 2, Click OK .
Wait for the status to change to Running, Indicates that the container group is successfully expanded .
1.4 Switch to Hispter Shop page , Click the shopping cart .
Return to the following page , Indicates that the shopping cart service is normal . So we define steady state as , Can be used normally frontend Shopping cart function .
Simulate real events .
2.1 Switch the highly available service console for response , On the left navigation bar , Click my space .
2.2 On my space page , Click new blank walkthrough in the new walkthrough drop-down list .
2.3 On the walkthrough configuration page , Do the following :
(1) Set the drill name .
(2) In the walkthrough object , Drill application selection cartservice, Apply group selection cartservice-group, Select any machine from the machine list , Click Add walkthrough .
(3) In the select walkthrough fault dialog box , choice JAVA application > Throw exceptions > In container Java Delay throwing custom exception , Click OK .
(4) In the drill content area , Click inside the container Java Delay throwing custom exception .
(5) In a container Java Delay throwing custom exceptions in the panel , Enter the method name in turn 、 The fully qualified name of the class 、 abnormal 、 Process keyword and target container name , Click Close .
Method name : Input viewCart.
The fully qualified name of the class : Input com.alibabacloud.hipstershop.cartserviceprovider.service.CartServiceImpl.
abnormal : Input java.lang.Exception.
Process keywords : Input java.
Target container name : choice cartservice.
(6) In the walkthrough object , Click save .
(7) Click next .
(8) In global configuration , Click next .
(9) In the success dialog , Click drill details .
2.4 On the drill details page , Click walkthrough .
2.5 In the start walkthrough dialog box , Click OK .
Test the effect of the experiment .
3.1 Switch to Hispter Shop page , Click the shopping cart .
Return to the following page , You find that you cannot access the shopping cart . This is because the traffic is not switched to the machine without downtime , meanwhile It shows that our system does not have the ability to retry failure , Or there was no design at the beginning , Or it didn't take effect . Through this fault injection , We found a flaw in the system .
3.2 Switch to the application high availability service console , On the drill record details page , Click terminate .
3.3 In the stop walkthrough dialog box , Click OK .
3.4 In the result feedback dialog box , Click OK .
Return to the following page , Indicates the end of the drill .
Microservice drill
After experiencing the above three scenarios , We have a preliminary understanding of chaos engineering , Also mastered the basic functions of applying high availability services . However, the process of manually deploying parameters is still cumbersome . Next, let's experience a more convenient and fast strong and weak dependency governance .
1 Switch to the application high availability service console . In the left navigation bar , Click microservice walkthrough .
And select the strong and weak dependency governance page .
2 On the strong / weak dependency governance page , Click Create governance plan .
3 On the configuration wizard page of creating a governance scheme , Do the following .
3.1 In application access , Custom scheme name , Governance application selection frontend, Click next .
3.2 Rely on governance to 30 Days are the governance cycle dialog box , Click OK .
3.3 In dependency analysis , Wait for the analysis to complete , Click next .
3.4 In relying on anticipation , Choose the strength of the dependent object by yourself , for example nacos-standalone and checkoutservice Strong dependence prediction can choose strong dependence , Other dependent objects default to weak dependencies , Then click next .
3.5 In dependency validation , Select any use case for verification . For example, choose frontend And nacos-standalone Strong and weak dependency verification use cases , Click to verify .
3.6 In the parameter confirmation dialog box before validation , Click OK to verify .
Be careful :
If the window does not jump , Please note whether the jump is blocked , Please manually release
4 On the drill details page , Click walkthrough .
5 In the start walkthrough dialog box , Click OK .
6 Switch to Hipster Shop page , Click any function of the web page . You can find that Hipster Shop Web pages and related functions can be accessed normally , explain frontend Services and nacos-standalone Services are weak dependencies .
7 Switch to the application high availability service console , On the drill record details page , Click terminate .
8 In the stop walkthrough dialog box , Click OK .
9 In the result feedback dialog box , Conclusion the choice does not meet the expectation , The verification results are weakly dependent , Click OK , Return to strength dependent governance .
10 In dependency validation , You can validate other use cases , After verification , Click scheme archive .
11 In the are you sure you want to archive this scheme dialog box , Click confirm archive .
Return to the following page , Indicates that the archive is complete .
Related scenes
be based on EMR Offline data analysis
Container services ACK+ Container network file system CNFS Quickly build NGINX Website
版权声明
本文为[Alibaba cloud cloud Lab]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230601372310.html
边栏推荐
猜你喜欢
关于 synchronized、ThreadLocal、线程池、Atomic 原子类的 JUC 面试题
XDP类型的BPF程序
LeetCode刷题|368最大整除子集(动态规划)
[OSS file upload quick start]
Thanos Compactor组件使用
Redis 详解(基础+数据类型+事务+持久化+发布订阅+主从复制+哨兵+缓存穿透、击穿、雪崩)
搭建基于OSS的图片分享网站-反馈有礼
qs.stringify 接口里把入参转为&连接的字符串(配合application/x-www-form-urlencoded请求头)
rdma 编程详解
Construire un blog Cloud basé sur ECS (bénédiction sur le Code Cloud Xiaobao, explication détaillée de la tâche iphone13 gratuite)
随机推荐
用反射与注解获取两个不同对象间的属性值差异
异常记录-12
LeetCode刷题|13罗马数字转整数
将博客搬至CSDN
Oracle数据库性能分析之常用视图
Redis 详解(基础+数据类型+事务+持久化+发布订阅+主从复制+哨兵+缓存穿透、击穿、雪崩)
冬季实战营 动手实战-初识上云基础,动手实操ECS云服务器新手上路 领鼠标 云小宝 背包 无影
JS format current time and date calculation
[step by step, even thousands of miles] MySQL reports a large number of unauthenticated user connection errors
ES入门学习笔记
"Write multi tenant" implementation of Prometheus and thanos receiver
When switch case, concatenate the error case and if of the conventional judgment expression and use L
js 格式化当前时间 日期推算
异常记录-11
The arithmetic square root of X in leetcode
异常记录-20
Introduction to DDoS attack / defense
LeetCode刷题|368最大整除子集(动态规划)
基于ECS搭建云上博客(云小宝码上送祝福,免费抽iphone13任务详解)
[MySQL basics] startup options, system variables and status variables