当前位置:网站首页>EMR Based offline data analysis - polite feedback
EMR Based offline data analysis - polite feedback
2022-04-23 07:02:00 【Alibaba cloud cloud Lab】
“ Walking on the clouds ” The third phase - The feedback is polite
Participate in experience products , Submit feedback , You have the opportunity to get a custom backpack ,T T-shirt , Super cute year of the tiger mouse pad , as well as 5 To 100 Yuan aliyun universal voucher ~ Feedback address :
https://developer.aliyun.com/adc/series/ysmb3
brief introduction
Today, with the explosive growth of data , Digital transformation has become IT Hot spots in the industry , Data needs deeper value mining , Respond to the changing needs of the future . Massive offline data analysis can be applied to a variety of business system environments , For example, e-commerce massive log analysis 、 User behavior portrait analysis 、 Scenarios such as massive offline computing and analysis tasks in the scientific research industry .
This scenario will log in by opening EMR Hadoop colony , Simply hive operation , Use hive Load the data , Calculation and other operations . It shows how to build elastic and low-cost offline big data analysis .
After experiencing this scene , The knowledge you can master is :
1.EMR Basic operation of cluster , Yes EMR Have a preliminary understanding of the product
2.EMR Data transmission and of cluster hive Simple operation of , Have a preliminary grasp of how to conduct offline big data analysis
Background knowledge
E-MapReduce( abbreviation “EMR”) It is a cloud native open source big data platform , Provide customers with simple and easy to integrate Hadoop、Hive、Spark、Flink、Presto、Clickhouse、Delta、Hudi And other open source big data computing and storage engines .EMR Computing resources can be adjusted according to business needs .EMR It can be deployed in the public cloud of Alibaba cloud ECS and ACK、 Proprietary cloud platform . Product documentation address :https://www.aliyun.com/product/emapreduce
Product advantage
Open source ecology : Provide high performance 、 Stable version Hadoop、Spark、Hive、Flink、Kafka、HBase、Presto、Impala、Hudi And other open source big data components , Customers can use it flexibly according to the scene
Engine optimization : Multi engine performance optimization , Such as Spark SQL Compared with the open source version 6 times . use JindoFS+OSS, On the basis of ensuring data reliability , Performance improvement
Convenient operation and maintenance : On Alibaba cloud console and OpenAPI Easily cluster 、 Monitoring, operation and maintenance of nodes and services . Help you greatly improve the efficiency of operation and maintenance , Let data engineers focus more on business development
Cost savings : Cluster resources can be automatically matched on demand , You only need to pay according to the actual usage , Reduce resource waste and cost . Alibaba cloud preemptive instances are supported 、 Reserve instance voucher (RI), Further reduce costs
Elastic resources : Cluster resources can be flexibly adjusted , Create a cloud based server in a few minutes ECS、 Containers ACK The cluster of , Respond quickly to business needs
Safe and reliable : adopt VPC Set the cluster network security policy with the security group , Support Kerberos Identity authentication and data encryption , Use Ranger Data access control . Support data encryption , Ensure data security
Contrast dimensions | EMR | build by oneself Hadoop |
---|---|---|
cost | Pay as you go for resources , Support flexible adjustment of cluster resources , Data hierarchical storage , High utilization of resources . No additional software License cost . | Estimate resources in advance , And the resources are relatively fixed , Low resource utilization . use Hadoop Distribution version , You need to pay extra License cost . |
performance | Compared with the open source version, the performance is greatly improved , Such as EMR SparkSQL Performance is an open source version 6 times . | Adopt the open source community version , The performance needs to be optimized by itself . |
Easy to use | Minute level start Hadoop colony , Agile response to business needs . | Purchasing server , Deploy Hadoop Ecological components , The cycle lasts for weeks . |
elastic | The cluster can be temporarily started and destroyed according to the job . Cluster resources can be dynamically and automatically adjusted according to time cycle or cluster load . be based on JindoFS Computing storage separation architecture , Easily expand computing and storage resources separately . | Computing and storage coupling , Resources are relatively fixed , Unable to flexibly adjust resources . |
Security | Support enterprise level multi tenant resource management capability , Support table alignment 、 Column 、 Row level permission control and log audit , Support data encryption . | Multi tenant management capability needs to be configured by itself , Imperfect ability , Unable to meet enterprise level needs . |
reliable | On a large scale 、 Inspection of enterprise environment , Upgrade with the open source version , And through professional compatibility verification test , Provide a better experience than the community version . | You need to update and upgrade the open source version by yourself , Verify the compatibility of each component version , Self repair the community bug. |
service | Professional and senior big data expert technical service team provides after-sales support . | The community version has no service support ,Hadoop Distribution version , You need to pay extra License And service fees . |
版权声明
本文为[Alibaba cloud cloud Lab]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230601371930.html
边栏推荐
猜你喜欢
virtio 与vhost_net介绍
冬季实战营动手实战-上云必备环境准备,动手实操快速搭建LAMP环境 领鼠标 云小宝 背包 无影
关于 synchronized、ThreadLocal、线程池、Atomic 原子类的 JUC 面试题
Number of stair climbing methods of leetcode
SQL学习|基础查询与排列
实践使用PolarDB和ECS搭建门户网站
Imitation scallop essay reading page
Basic concepts of database: OLTP / OLAP / HTAP, RPO / RTO, MPP
冬季实战营 动手实战-MySQL数据库快速部署实践 领鼠标 云小宝
ubuntu下搭建mysql环境 & 初识SQL
随机推荐
ovs与ovs+dpdk架构分析
Number of stair climbing methods of leetcode
JS realizes modal box dragging
Prometheus和Thanos Receiver的“写多租户”实现
virtio 与vhost_net介绍
阿里矢量库的图标使用教程(在线,下载)
redis 常见问题
Get DOM element location information by offset and client
tc ebpf 实践
【代码解析(7)】Communication-Efficient Learning of Deep Networks from Decentralized Data
异常记录-5
JS performance optimization
修改Jupyter Notebook样式
冬季实战营动手实战-上云必备环境准备,动手实操快速搭建LAMP环境 领鼠标 云小宝 背包 无影
Analysis of Rdam principle
【Lombok快速入门】
如何通过dba_hist_active_sess_history分析数据库历史性能问题
异常记录-15
使用sed命令来高效处理文本
Introduction to common APIs for EBFP programming