当前位置:网站首页>Handwritten Google's first generation distributed computing framework MapReduce
Handwritten Google's first generation distributed computing framework MapReduce
2022-04-23 20:15:00 【nobody234】
6.824 lab1 Handwriting Google First generation distributed computing framework MapReduce
lab1 For almost a week , From the first look at a loss , I have no idea what to do , I learned a little go After reading the knowledge of the paper again, sort out the ideas and design the code , Now I gradually have my own understanding of this framework , It's a really disgusting process (java Contestant's initial writing go It made me sick when I was , But now I feel go It's really a very good language , Xi xi xi ), thank XiyouLinux The partners of the interest group , It's really hard for them to stay alone .
Here is the text
----------------------------------- Split line -------------------------------------------
1. MapReduce What is it , Will it be better than my yogurt bun ?
MapReduce yes Google In succession 2003 Google distributed file system was published in (GFS) After the paper , On 2004 The second heavyweight article on distributed released in , Described in the paper MapReduce Is a distributed parallel computing framework , Popular speaking From the perspective of users ( If there is no special indication in the following content, the user refers to the framework user ),MapReduce The frame is such a thing , He gives the following two interfaces for users to use
map(key,value)
reduce(key,list[value])// this key The 1 pet. key
The user gives the program a task , The framework enlarges and reduces the tasks formally , Then through the user-defined Map Function to split the task in connotation , Thus, a series of key value pairs are generated as return values , The return values of these key value pairs will eventually be aggregated by the framework produce key,list[value] As Reduce The input of the function is processed , After processing, it will be output to the hard disk for users to view , Of course, how to aggregate , How to call map and reduce Distributed processing , Users don't need to know how to output , Users only need to complete these two interface processing tasks to get the results .
The above words are very abstract , Of course, it is abstract to discuss any problem without a specific problem scenario , So if you don't understand it, it doesn't matter. Let's first see what problem he has solved . Look back at this part .
2. Why do we need MapReduce( Distributed computing framework ) Or what problems he solved
2.1 Let's think about these questions first
- What is distributed
My definition of him is : By disassembling and distributing a huge computing or storage task , Cooperate with multiple machines through the network , Process the distributed tasks .( It's simple, but at least you can understand it ) - Why do we do distributed
First of all, we can reach such a consensus , That is, when the amount of data is relatively small , What kind of distributed stuff is bullshit , Single machine performance must be better than multi machine cooperation , Only when the single machine cannot meet the demand, we need to find another way to solve the problem through distribution , It can be said that distribution is actually a helpless move ( That is to say, if those who engage in hardware can help us create machines with unlimited memory and unlimited computing power , There is no need for us who engage in software to optimize , So it's all their fault !!! dog's head ) - Why Google First MapRduce Not me
It's simple because I don't have this need , Google as a universal search engine in the world , He not only crawls a huge amount of Web information on the Internet every day , And save this information , After saving, you have to build an index and analyze the content , To be sure, a single machine can't meet his needs , He's definitely going to do it , But you know , We programmers usually sell very expensive , And we always like fishing, which leads to a long project development cycle ,Google Those bosses are very angry , Some of these bosses know technology, such as Larry - page ( Um. ? page ) When he understood his situation, he quickly summed up the following needs- Need a kind of software with horizontal expansion ability , This lateral expansion capability can be understood as , By buying machines and equipment, we can improve the performance of the service , People don't have to rely on software to optimize
- This machine had better not be too expensive, just use it
Page is a technology savvy. He found that it is a coincidence that distributed systems have such capabilities , But you know , Every programmer has his own expertise , Not everyone will be distributed , So is it possible that those students who can do distributed can do something so that other students who can't do distributed can also do distributed ( A little around, ha ha ),MapReduce emerge as the times require .
Let's talk about it as a framework, which shields the details of its own distributed implementation , Distributed storage of data 、 data communication 、 Fault tolerant processing and other parallel computing involves many complex details at the bottom of the system, which are handled by the system , It greatly reduces the burden of software developers . How does he abstract to separate distributed systems from specific tasks
2.2 Our goal is to abstract map reduce
My mother likes to buy this kind of eight treasure rice and cook porridge , This kind of rice is stirred evenly by people in the supermarket and then sold , Out of curiosity, I'd like to see how much it costs , If it's expensive, buy your own ingredients and mix them , So I divided the rice my mother bought into three piles , A pile for my dad , Give me a pile , A pile for my mother , The three of them , The rice in front of them is divided into different types ( millet , Red bean , Mung bean , Chinese wolfberry , Lily …) Divided into 8 Little heap (partition), Three people are 24 Pile up , I think they'll count this 24 Small piles are synthesized according to the type 8 Little heap (shuffle), My sister calculated the price of red bean pile by multiplying the price of ingredients by the weight, and then told me the price of red bean pile ( return ) I wrote it down in my book ( Discs ), My mother is Xiaomi , My father is a mung bean .... I finally calculated how much these eight treasure rice should be sold if they were not mixed according to the records (reduce). Finally, I found that the price difference was not too much
I can't work out this result alone , Fortunately, I will. I can call my family to help , They also don't need to know what the working process of the whole system is , they (worker) as long as
- Perform sorting and splitting (map) Eight little piles
- Execute and calculate the price according to each pile (reduce) And tell me
I (master) It only needs
- Formally split a large pile into three small piles (nMap)
- take 24 Pile mixing 8 Pile up (nReduce)
- Write the calculation results of each pile in the book ( Discs )
In this way, we completed the calculation task happily .
I mean, all concrete things can be abstracted , If necessary , Abstract things can also be embodied , As long as the model is well built , Any concrete problem can be set on the abstract template , A large task can certainly be split in form or in task volume , But if the problem requires the aggregation of certain features to operate together , Splitting is not a mindless thing , We need to break it up formally first , Let the task get smaller first , Then aggregate from the characteristics (Map), Aggregate and then split , In this way, we can meet our calculation requirements (reduce) This is our abstraction .
3. MapReduce Frame design ideas
Now let's change our perspective , Not from the user's point of view , Instead, think about how the framework should be designed , Let's consider the following questions
- form
- Need a central control Coordinator Scheduling tasks
- It takes a lot of worker Perform tasks
- Execute the process
- Coordinator
- After startup, initialize the generation according to the provided file mapjob list
- take DistributeJob Method registration for external use rpc call
- When the outside world has a request , distribution map
- When mapJob Initialize after distribution reducejob list
- distribution reduceJob
- work
- Ask for a task
- Determine the task category (reduce/map)
- inform coorfinator The task that oneself completes , And reapply for the task
- Coordinator
- Possible problems
- coordinator What to do if I hang up , Can't , single master,master Hang up and the system will collapse
- worker What to do if I hang up , Reassign what originally belonged to this worker The task of
4. Code presentation
The details are here https://github.com/nobodyw-cell/labs/tree/main/MIT_6.824/src
5. Optimized details and optimized details
- Concurrent processing is not particularly good , Used a lot of locks , Not based on go Of CSP Concurrency model , Use pipes to do
- worker inform coordinator I didn't tell you the type of task I completed when I completed the task , Easy to cause confusion
版权声明
本文为[nobody234]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204210555176566.html
边栏推荐
- R语言ggplot2可视化:ggplot2可视化散点图并使用geom_mark_ellipse函数在数据簇或数据分组的数据点周围添加椭圆进行注释
- Vericrypt file hard disk encryption tutorial
- Design of warehouse management database system
- Kubernetes getting started to proficient - install openelb on kubernetes
- Possible root causes include a too low setting for -Xss and illegal cyclic inheritance dependencies
- Error reported by Azkaban: Azkaban jobExecutor. utils. process. ProcessFailureException: Process exited with code 64
- Devops integration - environment variables and building tools of Jenkins service
- Project training of Software College of Shandong University - Innovation Training - network security shooting range experimental platform (V)
- Unity general steps for creating a hyper realistic 3D scene
- PCL点云处理之直线与平面的交点计算(五十三)
猜你喜欢
考研英语唐叔的语法课笔记
Unity创建超写实三维场景的一般步骤
selenium.common.exceptions.WebDriverException: Message: ‘chromedriver‘ executable needs to be in PAT
Distinction between pointer array and array pointer
CVPR 2022 | QueryDet:使用级联稀疏query加速高分辨率下的小目标检测
山东大学软件学院项目实训-创新实训-网络安全靶场实验平台(八)
MySQL 进阶 锁 -- MySQL锁概述、MySQL锁的分类:全局锁(数据备份)、表级锁(表共享读锁、表独占写锁、元数据锁、意向锁)、行级锁(行锁、间隙锁、临键锁)
Shanda Wangan shooting range experimental platform project - personal record (IV)
Project training of Software College of Shandong University - Innovation Training - network security shooting range experimental platform (VII)
Kubernetes entry to mastery - bare metal loadbalance 80 443 port exposure precautions
随机推荐
NC basic usage 2
Shanda Wangan shooting range experimental platform project - personal record (V)
2022 - Data Warehouse - [time dimension table] - year, week and holiday
Kibana reports an error server is not ready yet. Possible causes
Vericrypt file hard disk encryption tutorial
論文寫作 19: 會議論文與期刊論文的區別
MySQL 进阶 锁 -- MySQL锁概述、MySQL锁的分类:全局锁(数据备份)、表级锁(表共享读锁、表独占写锁、元数据锁、意向锁)、行级锁(行锁、间隙锁、临键锁)
R语言使用econocharts包创建微观经济或宏观经济图、indifference函数可视化无差异曲线、自定义计算交叉点、自定义配置indifference函数的参数丰富可视化效果
Cadence OrCAD capture batch change component packaging function introduction graphic tutorial and video demonstration
Error reported by Azkaban: Azkaban jobExecutor. utils. process. ProcessFailureException: Process exited with code 64
[problem solving] 'ASCII' codec can't encode characters in position XX XX: ordinal not in range (128)
PCL点云处理之直线与平面的交点计算(五十三)
An error is reported when sqoop imports data from Mysql to HDFS: sqlexception in nextkeyvalue
【数值预测案例】(3) LSTM 时间序列电量预测,附Tensorflow完整代码
How to create bep-20 pass on BNB chain
Introduction to link database function of cadence OrCAD capture CIS replacement components, graphic tutorial and video demonstration
MySQL advanced lock - overview of MySQL locks and classification of MySQL locks: global lock (data backup), table level lock (table shared read lock, table exclusive write lock, metadata lock and inte
考研英语唐叔的语法课笔记
Sqoop imports tinyint type fields to boolean type
R语言ggplot2可视化分面图(facet_wrap)、使用lineheight参数自定义设置分面图标签栏(灰色标签栏)的高度