当前位置:网站首页>EMR Based offline data analysis - polite feedback
EMR Based offline data analysis - polite feedback
2022-04-23 07:02:00 【Alibaba cloud cloud Lab】
“ Walking on the clouds ” The third phase - The feedback is polite
Participate in experience products , Submit feedback , You have the opportunity to get a custom backpack ,T T-shirt , Super cute year of the tiger mouse pad , as well as 5 To 100 Yuan aliyun universal voucher ~ Feedback address :
https://developer.aliyun.com/adc/series/ysmb3
brief introduction
Today, with the explosive growth of data , Digital transformation has become IT Hot spots in the industry , Data needs deeper value mining , Respond to the changing needs of the future . Massive offline data analysis can be applied to a variety of business system environments , For example, e-commerce massive log analysis 、 User behavior portrait analysis 、 Scenarios such as massive offline computing and analysis tasks in the scientific research industry .
This scenario will log in by opening EMR Hadoop colony , Simply hive operation , Use hive Load the data , Calculation and other operations . It shows how to build elastic and low-cost offline big data analysis .
After experiencing this scene , The knowledge you can master is :
1.EMR Basic operation of cluster , Yes EMR Have a preliminary understanding of the product
2.EMR Data transmission and of cluster hive Simple operation of , Have a preliminary grasp of how to conduct offline big data analysis
Background knowledge
E-MapReduce( abbreviation “EMR”) It is a cloud native open source big data platform , Provide customers with simple and easy to integrate Hadoop、Hive、Spark、Flink、Presto、Clickhouse、Delta、Hudi And other open source big data computing and storage engines .EMR Computing resources can be adjusted according to business needs .EMR It can be deployed in the public cloud of Alibaba cloud ECS and ACK、 Proprietary cloud platform . Product documentation address :https://www.aliyun.com/product/emapreduce
Product advantage
Open source ecology : Provide high performance 、 Stable version Hadoop、Spark、Hive、Flink、Kafka、HBase、Presto、Impala、Hudi And other open source big data components , Customers can use it flexibly according to the scene
Engine optimization : Multi engine performance optimization , Such as Spark SQL Compared with the open source version 6 times . use JindoFS+OSS, On the basis of ensuring data reliability , Performance improvement
Convenient operation and maintenance : On Alibaba cloud console and OpenAPI Easily cluster 、 Monitoring, operation and maintenance of nodes and services . Help you greatly improve the efficiency of operation and maintenance , Let data engineers focus more on business development
Cost savings : Cluster resources can be automatically matched on demand , You only need to pay according to the actual usage , Reduce resource waste and cost . Alibaba cloud preemptive instances are supported 、 Reserve instance voucher (RI), Further reduce costs
Elastic resources : Cluster resources can be flexibly adjusted , Create a cloud based server in a few minutes ECS、 Containers ACK The cluster of , Respond quickly to business needs
Safe and reliable : adopt VPC Set the cluster network security policy with the security group , Support Kerberos Identity authentication and data encryption , Use Ranger Data access control . Support data encryption , Ensure data security
| Contrast dimensions | EMR | build by oneself Hadoop |
|---|---|---|
| cost | Pay as you go for resources , Support flexible adjustment of cluster resources , Data hierarchical storage , High utilization of resources . No additional software License cost . | Estimate resources in advance , And the resources are relatively fixed , Low resource utilization . use Hadoop Distribution version , You need to pay extra License cost . |
| performance | Compared with the open source version, the performance is greatly improved , Such as EMR SparkSQL Performance is an open source version 6 times . | Adopt the open source community version , The performance needs to be optimized by itself . |
| Easy to use | Minute level start Hadoop colony , Agile response to business needs . | Purchasing server , Deploy Hadoop Ecological components , The cycle lasts for weeks . |
| elastic | The cluster can be temporarily started and destroyed according to the job . Cluster resources can be dynamically and automatically adjusted according to time cycle or cluster load . be based on JindoFS Computing storage separation architecture , Easily expand computing and storage resources separately . | Computing and storage coupling , Resources are relatively fixed , Unable to flexibly adjust resources . |
| Security | Support enterprise level multi tenant resource management capability , Support table alignment 、 Column 、 Row level permission control and log audit , Support data encryption . | Multi tenant management capability needs to be configured by itself , Imperfect ability , Unable to meet enterprise level needs . |
| reliable | On a large scale 、 Inspection of enterprise environment , Upgrade with the open source version , And through professional compatibility verification test , Provide a better experience than the community version . | You need to update and upgrade the open source version by yourself , Verify the compatibility of each component version , Self repair the community bug. |
| service | Professional and senior big data expert technical service team provides after-sales support . | The community version has no service support ,Hadoop Distribution version , You need to pay extra License And service fees . |
版权声明
本文为[Alibaba cloud cloud Lab]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230601371930.html
边栏推荐
- Prometheus Cortex架构概述(水平可扩展、高可用、多租户、长期存储)
- 异常记录-8
- qs.stringify 接口里把入参转为&连接的字符串(配合application/x-www-form-urlencoded请求头)
- AttributeError: ‘dict‘ object has no attribute ‘iteritems‘
- 异常记录-17
- ovs与ovs+dpdk架构分析
- JS implementation of web page rotation map
- Introduction to the top 12 domestic databases in 2021
- 异常记录-18
- 异常记录-9
猜你喜欢

通过源码探究@ModelAndView如何实现数据与页面的转发

Kids and COVID: why young immune systems are still on top

Basic concepts of database: OLTP / OLAP / HTAP, RPO / RTO, MPP

Prometheus Cortex架构概述(水平可扩展、高可用、多租户、长期存储)

关于 synchronized、ThreadLocal、线程池、Atomic 原子类的 JUC 面试题

JS implementation of web page rotation map

SQL学习|基础查询与排列

Prometheus Cortex使用Block存储时的相关问题

Detailed explanation of RDMA programming

LeetCode刷题|38外观数组
随机推荐
try catch 不能捕获异步错误
Leetcode integer plus one
Introduction to RDMA
阿里云日志服务sls的典型应用场景
How to use tiup to deploy a tidb V5 0 cluster
Introduction to common APIs for EBFP programming
【OSS文件上传快速入门】
Oracle Performance Analysis Tool: oswatcher
SSM项目在阿里云部署
【Lombok快速入门】
LeetCode刷题|两个链表的第一个公共节点
基于EMR离线数据分析-反馈有礼
Include of ansible module_ Tasks: why is the imported task not executed after adding tags?
异常记录-13
Implementation of leetcode question brushing str ()
异常记录-12
异常记录-15
基于DPDK实现VPC和IDC间互联互通的高性能网关
Oracle数据库性能分析之常用视图
实践使用PolarDB和ECS搭建门户网站