当前位置:网站首页>EMR Based offline data analysis - polite feedback
EMR Based offline data analysis - polite feedback
2022-04-23 07:02:00 【Alibaba cloud cloud Lab】
“ Walking on the clouds ” The third phase - The feedback is polite
Participate in experience products , Submit feedback , You have the opportunity to get a custom backpack ,T T-shirt , Super cute year of the tiger mouse pad , as well as 5 To 100 Yuan aliyun universal voucher ~ Feedback address :
https://developer.aliyun.com/adc/series/ysmb3
brief introduction
Today, with the explosive growth of data , Digital transformation has become IT Hot spots in the industry , Data needs deeper value mining , Respond to the changing needs of the future . Massive offline data analysis can be applied to a variety of business system environments , For example, e-commerce massive log analysis 、 User behavior portrait analysis 、 Scenarios such as massive offline computing and analysis tasks in the scientific research industry .
This scenario will log in by opening EMR Hadoop colony , Simply hive operation , Use hive Load the data , Calculation and other operations . It shows how to build elastic and low-cost offline big data analysis .
After experiencing this scene , The knowledge you can master is :
1.EMR Basic operation of cluster , Yes EMR Have a preliminary understanding of the product
2.EMR Data transmission and of cluster hive Simple operation of , Have a preliminary grasp of how to conduct offline big data analysis
Background knowledge
E-MapReduce( abbreviation “EMR”) It is a cloud native open source big data platform , Provide customers with simple and easy to integrate Hadoop、Hive、Spark、Flink、Presto、Clickhouse、Delta、Hudi And other open source big data computing and storage engines .EMR Computing resources can be adjusted according to business needs .EMR It can be deployed in the public cloud of Alibaba cloud ECS and ACK、 Proprietary cloud platform . Product documentation address :https://www.aliyun.com/product/emapreduce
Product advantage
Open source ecology : Provide high performance 、 Stable version Hadoop、Spark、Hive、Flink、Kafka、HBase、Presto、Impala、Hudi And other open source big data components , Customers can use it flexibly according to the scene
Engine optimization : Multi engine performance optimization , Such as Spark SQL Compared with the open source version 6 times . use JindoFS+OSS, On the basis of ensuring data reliability , Performance improvement
Convenient operation and maintenance : On Alibaba cloud console and OpenAPI Easily cluster 、 Monitoring, operation and maintenance of nodes and services . Help you greatly improve the efficiency of operation and maintenance , Let data engineers focus more on business development
Cost savings : Cluster resources can be automatically matched on demand , You only need to pay according to the actual usage , Reduce resource waste and cost . Alibaba cloud preemptive instances are supported 、 Reserve instance voucher (RI), Further reduce costs
Elastic resources : Cluster resources can be flexibly adjusted , Create a cloud based server in a few minutes ECS、 Containers ACK The cluster of , Respond quickly to business needs
Safe and reliable : adopt VPC Set the cluster network security policy with the security group , Support Kerberos Identity authentication and data encryption , Use Ranger Data access control . Support data encryption , Ensure data security
| Contrast dimensions | EMR | build by oneself Hadoop |
|---|---|---|
| cost | Pay as you go for resources , Support flexible adjustment of cluster resources , Data hierarchical storage , High utilization of resources . No additional software License cost . | Estimate resources in advance , And the resources are relatively fixed , Low resource utilization . use Hadoop Distribution version , You need to pay extra License cost . |
| performance | Compared with the open source version, the performance is greatly improved , Such as EMR SparkSQL Performance is an open source version 6 times . | Adopt the open source community version , The performance needs to be optimized by itself . |
| Easy to use | Minute level start Hadoop colony , Agile response to business needs . | Purchasing server , Deploy Hadoop Ecological components , The cycle lasts for weeks . |
| elastic | The cluster can be temporarily started and destroyed according to the job . Cluster resources can be dynamically and automatically adjusted according to time cycle or cluster load . be based on JindoFS Computing storage separation architecture , Easily expand computing and storage resources separately . | Computing and storage coupling , Resources are relatively fixed , Unable to flexibly adjust resources . |
| Security | Support enterprise level multi tenant resource management capability , Support table alignment 、 Column 、 Row level permission control and log audit , Support data encryption . | Multi tenant management capability needs to be configured by itself , Imperfect ability , Unable to meet enterprise level needs . |
| reliable | On a large scale 、 Inspection of enterprise environment , Upgrade with the open source version , And through professional compatibility verification test , Provide a better experience than the community version . | You need to update and upgrade the open source version by yourself , Verify the compatibility of each component version , Self repair the community bug. |
| service | Professional and senior big data expert technical service team provides after-sales support . | The community version has no service support ,Hadoop Distribution version , You need to pay extra License And service fees . |
版权声明
本文为[Alibaba cloud cloud Lab]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230601371930.html
边栏推荐
- SQL学习|复杂查询
- 冬季实战营 动手实战-初识上云基础,动手实操ECS云服务器新手上路 领鼠标 云小宝 背包 无影
- Basic concepts of database: OLTP / OLAP / HTAP, RPO / RTO, MPP
- 异常记录-13
- JS regular matching first assertion and last assertion
- 数据库基本概念:OLTP/OLAP/HTAP、RPO/RTO、MPP
- 【MySQL基础篇】数据导出导入权限与local_infile参数
- rdam 原理解析
- 阅读笔记:Secure Federated Matrix Factorization
- Common views of Oracle database performance analysis
猜你喜欢
随机推荐
MySQL Server单机部署手册
[Lombok quick start]
2021年国产数据库12强介绍
redis 常见问题
异常记录-5
异常记录-11
Openvswitch compilation and installation
Prometheus的relabel_configs和metric_relabel_configs解释及用法示例
关于Postgres主从复制延迟监控的错误告警问题
冬季实战营 动手实战-MySQL数据库快速部署实践 领鼠标 云小宝
异常记录-22
Introduction to DDoS attack / defense
Web登录小案例(含验证码登录)
阿里云日志服务sls的典型应用场景
【MySQL基础篇】启动选项、系统变量、状态变量
[shell script exercise] batch add the newly added disks to the specified VG
Imitation scallop essay reading page
【代码解析(3)】Communication-Efficient Learning of Deep Networks from Decentralized Data
修改Jupyter Notebook样式
【漏网之鱼】Ansible AWX调用playbook传参问题
![[ES6 quick start]](/img/9e/4c4be5907c1f7b3485c2f4178b9150.png)








