当前位置:网站首页>Flink Yarn Per Job - RM启动SlotManager
Flink Yarn Per Job - RM启动SlotManager
2022-08-05 11:05:00 【hyunbar】

ResourceManager
public final void onStart() throws Exception {
try {
startResourceManagerServices();
} catch (Throwable t) {
final ResourceManagerException exception = new ResourceManagerException(String.format("Could not start the ResourceManager %s", getAddress()), t);
onFatalError(exception);
throw exception;
}
}
private void startResourceManagerServices() throws Exception {
try {
leaderElectionService = highAvailabilityServices.getResourceManagerLeaderElectionService();
// 创建了Yarn的RM和NM的客户端,初始化并启动
initialize();
// 通过选举服务,启动ResourceManager
leaderElectionService.start(this);
jobLeaderIdService.start(new JobLeaderIdActionsImpl());
registerTaskExecutorMetrics();
} catch (Exception e) {
handleStartResourceManagerServicesException(e);
}
}
创建了Yarn的RM和NM的客户端,初始化并启动
通过选举服务,启动ResourceManager
创建了Yarn的RM和NM的客户端
ActiveResourceManager
@Override
protected void initialize() throws ResourceManagerException {
try {
resourceManagerDriver.initialize(
this,
new GatewayMainThreadExecutor(),
ioExecutor);
} catch (Exception e) {
throw new ResourceManagerException("Cannot initialize resource provider.", e);
}
}
AbstractResourceManagerDriver
@Override
public final void initialize(
ResourceEventHandler<WorkerType> resourceEventHandler,
ScheduledExecutor mainThreadExecutor,
Executor ioExecutor) throws Exception {
this.resourceEventHandler = Preconditions.checkNotNull(resourceEventHandler);
this.mainThreadExecutor = Preconditions.checkNotNull(mainThreadExecutor);
this.ioExecutor = Preconditions.checkNotNull(ioExecutor);
// 下追
initializeInternal();
}
protected abstract void initializeInternal() throws Exception;
YarnResourceManagerDriver
@Override
protected void initializeInternal() throws Exception {
final YarnContainerEventHandler yarnContainerEventHandler = new YarnContainerEventHandler();
try {
// 创建Yarn的ResourceManager的客户端,并且初始化和启动
resourceManagerClient = yarnResourceManagerClientFactory.createResourceManagerClient(
yarnHeartbeatIntervalMillis,
yarnContainerEventHandler);
resourceManagerClient.init(yarnConfig);
resourceManagerClient.start();
final RegisterApplicationMasterResponse registerApplicationMasterResponse = registerApplicationMaster();
getContainersFromPreviousAttempts(registerApplicationMasterResponse);
taskExecutorProcessSpecContainerResourcePriorityAdapter =
new TaskExecutorProcessSpecContainerResourcePriorityAdapter(
registerApplicationMasterResponse.getMaximumResourceCapability(),
ExternalResourceUtils.getExternalResources(flinkConfig, YarnConfigOptions.EXTERNAL_RESOURCE_YARN_CONFIG_KEY_SUFFIX));
} catch (Exception e) {
throw new ResourceManagerException("Could not start resource manager client.", e);
}
// 创建yarn的 NodeManager的客户端,并且初始化和启动
nodeManagerClient = yarnNodeManagerClientFactory.createNodeManagerClient(yarnContainerEventHandler);
nodeManagerClient.init(yarnConfig);
nodeManagerClient.start();
}
创建Yarn的ResourceManager的客户端,并且初始化和启动
创建yarn的 NodeManager的客户端,并且初始化和启动
启动SlotManager
StandaloneLeaderElectionService
@Override
public void start(LeaderContender newContender) throws Exception {
if (contender != null) {
// Service was already started
throw new IllegalArgumentException("Leader election service cannot be started multiple times.");
}
contender = Preconditions.checkNotNull(newContender);
// directly grant leadership to the given contender
contender.grantLeadership(HighAvailabilityServices.DEFAULT_LEADER_ID);
}
ResourceManager
public void grantLeadership(final UUID newLeaderSessionID) {
final CompletableFuture<Boolean> acceptLeadershipFuture = clearStateFuture
// 下追
.thenComposeAsync((ignored) -> tryAcceptLeadership(newLeaderSessionID), getUnfencedMainThreadExecutor());
... ...
}
private CompletableFuture<Boolean> tryAcceptLeadership(final UUID newLeaderSessionID) {
if (leaderElectionService.hasLeadership(newLeaderSessionID)) {
... ...
startServicesOnLeadership();
return prepareLeadershipAsync().thenApply(ignored -> true);
} else {
return CompletableFuture.completedFuture(false);
}
}
private void startServicesOnLeadership() { // 启动心跳服务:TaskManager、JobMaster startHeartbeatServices(); // 启动slotManager slotManager.start(getFencingToken(), getMainThreadExecutor(), new ResourceActionsImpl()); onLeadership(); } |
启动心跳服务:TaskManager、JobMaster
启动slotManager
private void startHeartbeatServices() {
taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
resourceId,
new TaskManagerHeartbeatListener(),
getMainThreadExecutor(),
log);
jobManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
resourceId,
new JobManagerHeartbeatListener(),
getMainThreadExecutor(),
log);
}
作为资源的老大,肯定要跟task小弟和job去通信
SlotManagerImpl
public void start(ResourceManagerId newResourceManagerId, Executor newMainThreadExecutor, ResourceActions newResourceActions) {
LOG.info("Starting the SlotManager.");
this.resourceManagerId = Preconditions.checkNotNull(newResourceManagerId);
mainThreadExecutor = Preconditions.checkNotNull(newMainThreadExecutor);
resourceActions = Preconditions.checkNotNull(newResourceActions);
started = true;
taskManagerTimeoutsAndRedundancyCheck = scheduledExecutor.scheduleWithFixedDelay(
() -> mainThreadExecutor.execute(
// 检查超时和多余的TaskManager
() -> checkTaskManagerTimeoutsAndRedundancy()),
0L,
taskManagerTimeout.toMilliseconds(),
TimeUnit.MILLISECONDS);
slotRequestTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(
() -> mainThreadExecutor.execute(
() -> checkSlotRequestTimeouts()),
0L,
slotRequestTimeout.toMilliseconds(),
TimeUnit.MILLISECONDS);
registerSlotManagerMetrics();
}
检查超时和多余的TaskManager
void checkTaskManagerTimeoutsAndRedundancy() {
if (!taskManagerRegistrations.isEmpty()) {
long currentTime = System.currentTimeMillis();
ArrayList<TaskManagerRegistration> timedOutTaskManagers = new ArrayList<>(taskManagerRegistrations.size());
// first retrieve the timed out TaskManagers
for (TaskManagerRegistration taskManagerRegistration : taskManagerRegistrations.values()) {
if (currentTime - taskManagerRegistration.getIdleSince() >= taskManagerTimeout.toMilliseconds()) {
// we collect the instance ids first in order to avoid concurrent modifications by the
// ResourceActions.releaseResource call
timedOutTaskManagers.add(taskManagerRegistration);
}
}
int slotsDiff = redundantTaskManagerNum * numSlotsPerWorker - freeSlots.size();
if (freeSlots.size() == slots.size()) {
// No need to keep redundant taskManagers if no job is running.
// 如果没有job在运行,释放taskmanager
releaseTaskExecutors(timedOutTaskManagers, timedOutTaskManagers.size());
} else if (slotsDiff > 0) {
// Keep enough redundant taskManagers from time to time.
// 保证随时有足够额taskmanager
int requiredTaskManagers = MathUtils.divideRoundUp(slotsDiff, numSlotsPerWorker);
allocateRedundantTaskManagers(requiredTaskManagers);
} else {
// second we trigger the release resource callback which can decide upon the resource release
int maxReleaseNum = (-slotsDiff) / numSlotsPerWorker;
releaseTaskExecutors(timedOutTaskManagers, Math.min(maxReleaseNum, timedOutTaskManagers.size()));
}
}
}

边栏推荐
- 【加密解密】明文加密解密-已实现【已应用】
- 2022技能大赛训练题:交换机snmp配置
- 智能算力的枢纽如何构建?中国云都的淮海智算中心打了个样
- How to choose coins and determine the corresponding strategy research
- 四、kubeadm单master
- How OpenHarmony Query Device Type
- 七夕来袭!还要做CDH数据迁移怎么办?来看看DistCp
- 支持向量机SVM
- MMDetection in action: MMDetection training and testing
- #yyds干货盘点#【愚公系列】2022年08月 Go教学课程 001-Go语言前提简介
猜你喜欢
随机推荐
DocuWare平台——文档管理的内容服务和工作流自动化的平台详细介绍(下)
微服务结合领域驱动设计落地
解决2022Visual Studio中scanf返回值被忽略问题
Google启动通用图像嵌入挑战赛
365天挑战LeetCode1000题——Day 050 在二叉树中增加一行 二叉树
The fuse: OAuth 2.0 four authorized login methods must read
TiDB 6.0 Placement Rules In SQL 使用实践
软件测试之集成测试
朴素贝叶斯
Guys, I am a novice. I use flinksql to write a simple count of user visits according to the document, but it ends after executing it once.
sqlserver编写通用脚本实现获取一年前日期的方法
What are the standards for electrical engineering
Android development with Kotlin programming language - basic data types
poj2287 Tian Ji -- The Horse Racing(2016xynu暑期集训检测 -----C题)
秘乐短视频挖矿系统开发详情
学生信息管理系统(第一次.....)
The host computer develops C# language: simulates the STC serial port assistant to receive the data sent by the microcontroller
MySQL 中 auto_increment 自动插入主键值
智能算力的枢纽如何构建?中国云都的淮海智算中心打了个样
Linux:记一次CentOS7安装MySQL8(博客合集)







