当前位置：网站首页>Yarn core parameter configuration

Yarn core parameter configuration

2022-04-23 10:12:00 【zhaojiew】

Because there is no environment to test , Shangshan Silicon Valley hadoop material , Leave a hole to fill

Yarn Configure the case and related parameters

demand ： from 1G In the data , Count the number of times each word appears . The server 3 platform , Each set is equipped with 4G Memory , 4 nucleus CPU, 4 Threads .

1G / 128m = 8 individual MapTask; 1 individual ReduceTask; 1 individual mrAppMaster, Average operation per node 10 individual / 3 platform ≈ 3 A mission （4 3 3）

modify yarn-site.xml

<!--  Select the scheduler , The default volume  -->
<property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capaci
ty.CapacityScheduler</value>
</property>
<!-- ResourceManager  Number of threads processing scheduler requests , Default  50; If the number of tasks submitted is greater than  50, Sure   Increase the value , But not more than  3  platform  * 4  Threads  = 12  Threads （ Removing other applications can't actually be more than  8） -->
<property>
    <description>Number of threads to handle scheduler interface.</description>
    <name>yarn.resourcemanager.scheduler.client.thread-count</name>
    <value>8</value>
</property>
<!--  Whether to let  yarn  Automatic detection of hardware configuration , The default is  false, If the node has many other applications , Suggest   Manual configuration . If the node has no other applications , You can use automatic  -->
<property>
    <description>Enable auto-detection of node capabilities such as memory and CPU.
    </description>
    <name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
    <value>false</value>
</property>
<!--  Whether to regard virtual core as  CPU  Check the number , The default is  false, Using physics  CPU  Check the number  -->
<property>
    <description>Flag to determine if logical processors(such as
hyperthreads) should be counted as cores. Only applicable on Linux
when yarn.nodemanager.resource.cpu-vcores is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true.
    </description>
    <name>yarn.nodemanager.resource.count-logical-processors-ascores</name>
    <value>false</value>
</property>
<!--  Virtual kernel and physical kernel multiplier , The default is  1.0 -->
<property>
    <description>Multiplier to determine how to convert phyiscal cores to
vcores. This value is used if yarn.nodemanager.resource.cpu-vcores
is set to -1(which implies auto-calculate vcores) and
yarn.nodemanager.resource.detect-hardware-capabilities is set to true.
The number of vcores will be calculated as number of CPUs * multiplier.
    </description>
    <name>yarn.nodemanager.resource.pcores-vcores-multiplier</name>
    <value>1.0</value>
</property>
<!-- NodeManager  The amount of memory used , Default  8G, It is amended as follows  4G  Memory  -->
<property>
    <description>Amount of physical memory, in MB, that can be allocated
for containers. If set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically calculated(in case of Windows and Linux).
In other cases, the default is 8192MB.
    </description>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>4096</value>
</property>
<!-- nodemanager  Of  CPU  Check the number ,  If it is not set automatically according to the hardware environment, the default is  8  individual , It is amended as follows  4  individual  -->
<property>
    <description>Number of vcores that can be allocated
for containers. This is used by the RM scheduler when allocating
resources for containers. This is not used to limit the number of
CPUs used by YARN containers. If it is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically determined from the hardware in case of Windows and Linux.
In other cases, number of vcores is 8 by default.</description>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>4</value>
</property>
<!--  Container minimum memory , Default  1G -->
<property>
    <description>The minimum allocation for every container request at the
RM in MBs. Memory requests lower than this will be set to the value of
this property. Additionally, a node manager that is configured to have
less memory than this value will be shut down by the resource manager.
    </description>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1024</value>
</property>
<!--  Container maximum memory , Default  8G, It is amended as follows  2G -->
<property>
    <description>The maximum allocation for every container request at the
RM in MBs. Memory requests higher than this will throw an
InvalidResourceRequestException.
    </description>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
</property>
<!--  The container is the smallest  CPU  Check the number , Default  1  individual  -->
<property>
    <description>The minimum allocation for every container request at the
RM in terms of virtual CPU cores. Requests lower than this will be set to
the value of this property. Additionally, a node manager that is configured
to have fewer virtual cores than this value will be shut down by the
resource manager.
    </description>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value>
</property>
<!--  The container is the largest  CPU  Check the number , Default  4  individual ,  It is amended as follows  2  individual  -->
<property>
    <description>The maximum allocation for every container request at the
RM in terms of virtual CPU cores. Requests higher than this will throw an
InvalidResourceRequestException.</description>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>2</value>
</property>
<!--  Virtual memory check , The default , Change to close  -->
<property>
    <description>Whether virtual memory limits will be enforced for
containers.</description>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>
<!--  Virtual memory and physical memory setting ratio , Default  2.1 -->
<property>
    <description>Ratio between virtual memory to physical memory when
setting memory limits for containers. Container allocations are
expressed in terms of physical memory, and virtual memory usage is
allowed to exceed this allocation by this ratio.
    </description>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>
</property>

Capacity scheduler multi queue configuration

demand 1： default The queue accounts for% of the total memory 40%, The maximum resource capacity accounts for 60%, hive The queue accounts for% of the total memory 60%, The maximum resource capacity accounts for 80%.

demand 2： Configure queue priority

Since the default is the capacity scheduler configuration , Therefore, there is no need to specify the configuration file as capacity-scheduler.xml

stay capacity-scheduler.xml The configuration in is as follows ：

Modify the configuration

<!--  Specify multiple queues , increase  hive  queue  -->
<property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,hive</value>
    <description>The queues at the this level (root is the root queue).
    </description>
</property>
<!--  Reduce  default  The rated capacity of queue resources is  40%, Default  100% -->
<property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>40</value>
</property>
<!--  Reduce  default  The maximum capacity of queue resources is  60%, Default  100% -->
<property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>60</value>
</property>

Add properties to the newly added queue

Also in capacity-scheduler.xml Middle configuration , Direct copy default Configuration modification of , Equivalent to writing the same attribute twice

<!--  Appoint  hive  The rated resource capacity of the queue  -->
<property>
    <name>yarn.scheduler.capacity.root.hive.capacity</name>
    <value>60</value>
</property>
<!--  The maximum number of resources a user can use the queue , 1  Express  -->
<property>
    <name>yarn.scheduler.capacity.root.hive.user-limit-factor</name>
    <value>1</value>
</property>
<!--  Appoint  hive  The maximum resource capacity of the queue  -->
<property>
    <name>yarn.scheduler.capacity.root.hive.maximum-capacity</name>
    <value>80</value>
</property>
<!--  start-up  hive  queue  -->
<property>
    <name>yarn.scheduler.capacity.root.hive.state</name>
    <value>RUNNING</value>
</property>
<!--  Which users have the right to submit jobs to the queue  -->
<property>
    <name>yarn.scheduler.capacity.root.hive.acl_submit_applications</name>
    <value>*</value>
</property>
<!--  Which users have access to the queue , Administrator rights （ see / Kill ） -->
<property>
    <name>yarn.scheduler.capacity.root.hive.acl_administer_queue</name>
    <value>*</value>
</property>
<!--  Which users have the right to configure the priority of submitting tasks  -->
<property>
    <name>yarn.scheduler.capacity.root.hive.acl_application_max_priority</nam e>
    <value>*</value>
</property>
<!--  Timeout setting of the task ： yarn application -appId appId -updateLifetime Timeout-->
<!--  If  application  Timeout specified , Submitted to the queue  application  The maximum timeout that can be specified cannot exceed this value .-->
<property>
    <name>yarn.scheduler.capacity.root.hive.maximum-applicationlifetime</name>
    <value>-1</value>
</property>
<!--  If  application  No timeout specified , Then use  default-application-lifetime  As default  -->
<property>
    <name>yarn.scheduler.capacity.root.hive.default-applicationlifetime</name>
    <value>-1</value>
</property>

Distribution profile , And restart yarn Or refresh the queue

yarn rmadmin -refreshQueues

towards Hive Queue submit task

shell The way

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -D mapreduce.job.queuename=hive /input /output

jar package

The default task submission is to submit to default Queued . If you want to submit a task to another queue , Need to be in Driver In a statement

Configuration conf = new Configuration();
conf.set("mapreduce.job.queuename","hive");
Job job = Job.getInstance(conf);

Task priority

Capacity scheduler , Support task priority configuration , When resources are tight , High priority tasks will get resources first . By default , Yarn Limit the priority of all tasks to 0, If you want to use the priority function of the task , This restriction must be opened

modify yarn-site.xml file , Add the following parameters

<property>
	<name>yarn.cluster.max-application-priority</name>
	<value>5</value>
</property>

Distribution configuration , restart yarn

Simulate resource constrained environments , Submit calculation pi The task of

hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar pi 5 2000000

 Submit higher priority tasks during operation , Found jumping in line 
hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar pi -D mapreduce.job.priority=5 5 2000000

Modify task priority

yarn application -appID <ApplicationID> -updatePriority  priority 
yarn application -appID application_1611133087930_0009 -updatePriority 5

Fair scheduler multi queue configuration

Create two queues , Namely test and atguigu（ Name the group to which the user belongs ）. The following effects are expected ： If the user specifies a queue when submitting a task , Then the task is submitted to the specified queue to run ; If no queue is specified , test The task submitted by the user to root.group.test The queue runs , atguigu The task submitted to root.group.atguigu The queue runs

The configuration of the fair scheduler involves two files , One is yarn-site.xml, The other is the fair scheduler queue allocation file fair-scheduler.xml（ The file name can be customized ）

modify yarn-site.xml file , Appoint fair The location of the scheduler's configuration file

<property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairS
cheduler</value>
    <description> Configure fair scheduler </description>
</property>
<property>
    <name>yarn.scheduler.fair.allocation.file</name>
    <value>/opt/module/hadoop-3.1.3/etc/hadoop/fair-scheduler.xml</value>
    <description> Indicates the fair scheduler queue allocation profile </description>
</property>
<property>
    <name>yarn.scheduler.fair.preemption</name>
    <value>false</value>
    <description> Inter queue resource preemption is prohibited </description>
</property>

To configure fair-scheduler.xml

<?xml version="1.0"?>
<allocations>
    <!--  In a single queue  Application Master  The maximum proportion of resources occupied , Value  0-1 , General configuration of the enterprise  0.1 -->
    <queueMaxAMShareDefault>0.5</queueMaxAMShareDefault>
    <!--  The default value for the maximum resource of a single queue  test atguigu default -->
    <queueMaxResourcesDefault>4096mb,4vcores</queueMaxResourcesDefault>
    <!--  Add a queue  test -->
    <queue name="test">
        <!--  Queue minimum resource  -->
        <minResources>2048mb,2vcores</minResources>
        <!--  Maximum queue resources  -->
        <maxResources>4096mb,4vcores</maxResources>
        <!--  The maximum number of applications running simultaneously in the queue , Default  50, Configure according to the number of threads  -->
        <maxRunningApps>4</maxRunningApps>
        <!--  In line  Application Master  The maximum proportion of resources occupied  -->
        <maxAMShare>0.5</maxAMShare>
        <!--  The queue resource weight , The default value is  1.0 -->
        <weight>1.0</weight>
        <!--  Resource allocation policy within the queue  -->
        <schedulingPolicy>fair</schedulingPolicy>
    </queue>
    <!--  Add a queue  atguigu -->
    <queue name="atguigu" type="parent">
        <!--  Queue minimum resource  -->
        <minResources>2048mb,2vcores</minResources>
        <!--  Maximum queue resources  -->
        <maxResources>4096mb,4vcores</maxResources>
        <!--  The maximum number of applications running simultaneously in the queue , Default  50, Configure according to the number of threads  -->
        <maxRunningApps>4</maxRunningApps>
        <!--  In line  Application Master  The maximum proportion of resources occupied  -->
        <maxAMShare>0.5</maxAMShare>
        <!--  The queue resource weight , The default value is  1.0 -->
        <weight>1.0</weight>
        <!--  Resource allocation policy within the queue  -->
        <schedulingPolicy>fair</schedulingPolicy>
    </queue>
    <!--  Task queue allocation policy , Configurable multi-layer rules , Match from the first rule , Until the match is successful  -->
    <queuePlacementPolicy>
        <!--  Specify the queue when submitting a task , If no submission queue is specified , Then continue to match the next rule ; false  Express ：  If it means   Fixed queue does not exist , Automatic creation of... Is not allowed -->
        <rule name="specified" create="false"/>
        <!--  Submitted to the  root.group.username  queue , if  root.group  non-existent , Automatic creation of... Is not allowed ;  if  root.group.user  non-existent , Allow automatic creation of  -->
        <rule name="nestedUserQueue" create="true">
            <rule name="primaryGroup" create="false"/>
        </rule>
        <!--  The last rule must be  reject  perhaps  default. Reject  Indicates that the creation of the submission was rejected and failed , default  Means to submit a task to  default  queue  -->
        <rule name="reject" />
    </queuePlacementPolicy>
</allocations>

Distribute the configuration and restart Yarn test

hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar pi - Dmapreduce.job.queuename=root.test 1 1

If no queue is specified, it will be submitted to the queue matching the user name

版权声明
本文为[zhaojiew]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204230949430904.html