当前位置:网站首页>What is Machine Reinforcement Learning?What is the principle?
What is Machine Reinforcement Learning?What is the principle?
2022-08-11 03:51:00 【Program Yuan Keke】
Reinforcement Learning (RL), also known as Reinforcement Learning and Evaluation Learning, is an important machine learning method. It has many applications in the fields of intelligent control of robots and analysis and predictionapplication.
So what is reinforcement learning?
Reinforcement learning is the learning of intelligent system mapping from environment to behavior, so as to maximize the function value of reward signal (reinforcement signal), reinforcement learning is different from supervision in connectionist learningLearning is mainly manifested in the teacher signal. The reinforcement signal provided by the environment in reinforcement learning is an evaluation (usually a scalar signal) of the quality of the generated action, rather than telling the reinforcement learning system RLS (reinforcement learning system) how to go.produce the correct action.Because the external environment provides little information, RLS must learn from its own experience or ability.In this way, the RLS acquires knowledge in an action-evaluation environment and adapts the program to suit the environment.
In layman's terms, when a child is confused or confused in learning, if the teacher finds that the child's method or thinking is correct, he or she will be given positive feedback (reward orencouragement); otherwise, give him (her) negative feedback (lessons or punishment), motivate the child's potential, strengthen his (her) self-learning ability, rely on his or her own strength to actively learn and continue to explore, and finally let him (her) findThe correct method or idea to adapt to the changing external environment.
Reinforcement learning is different from traditional machine learning. It cannot get a mark immediately, but can only get a feedback (reward or penalty). It can be said that reinforcement learning is a kind of markDelayed supervised learning.Reinforcement learning is developed from theories such as animal learning and parameter perturbation adaptive control.
Principles of reinforcement learning:
If a certain behavioral strategy of the agent leads to a positive reward (reinforcing signal) in the environment, then the tendency of the agent to produce this behavioral strategy in the future will be strengthened.The agent's goal is to discover the optimal policy in each discrete state to maximize the expected discounted reward sum.
Reinforcement learning regards learning as a tentative evaluation process. Agent selects an action to use in the environment. After the environment accepts the action, the state changes, and a reinforcement signal (reward orThe agent then selects the next action according to the reinforcement signal and the current state of the environment. The principle of selection is to increase the probability of positive reinforcement (reward).The selected action not only affects the immediate enhancement value, but also affects the state of the environment at the next moment and the final enhancement value.
If the R/A gradient information is known, the supervised learning algorithm can be used directly.Because the reinforcement signal R and the action A generated by the Agent are not described in a clear functional form, the gradient information R/A cannot be obtained.Therefore, in reinforcement learning systems, some kind of random unit is required, with which the agent searches through the space of possible actions and finds the correct action.
Free to share some artificial intelligence learning materials that I have organized for you. It has been organized for a long time and is very comprehensive.Including some basic introduction videos of artificial intelligence + practical videos of common AI frameworks, image recognition, OpenCV, NLP, YOLO, machine learning, pytorch, computer vision, deep learning and neural networks and other videos, courseware source code, domestic and foreign well-known essence resources, AI popularpapers, etc.
The following are some screenshots, and the free download method is attached at the end of the article.
Table of Contents
1. AI Free Video Courses and Projects
Second, artificial intelligence must-read books
Three, Collection of Artificial Intelligence Papers
Fourth, Machine Learning + Computer Vision Basic Algorithm Tutorial
5. Deep Learning Machine Learning Cheat Sheet (26 in total)
To learn artificial intelligence well, you need to read more, do more hands-on, and practice more.gain something.
Click the business card below and scan the code to download the text for free.
边栏推荐
- [yu gong series] Go program 035-08 2022 interfaces and inheritance and transformation and empty interface
- The "top pillar" slides, and new growth is extremely difficult to shoulder the heavy responsibility. Is Ali "squatting" to jump higher?
- 互换性测量与技术——偏差与公差的计算,公差图的绘制,配合与公差等级的选择方法
- 【ADI低功耗2k代码】基于ADuCM4050的ADXL363、TMP75的加速度、温度检测及串口打印、蜂鸣器播放音乐(孤勇者)
- Uni - app - access to Chinese characters, pinyin initials (according to the Chinese get pinyin initials)
- 一次简单的 JVM 调优,学会拿去写到简历里
- En-us is an invalid culture error solution when Docker links sqlserver
- 【FPGA】名词缩写
- 互换性测量技术-几何误差
- Interchangeability and Measurement Techniques - Tolerance Principles and Selection Methods
猜你喜欢
QueryDet: Cascading Sparse Query Accelerates Small Object Detection at High Resolution
Differences and connections between distributed and clustered
什么是机器强化学习?原理是什么?
LeetCode刷题第16天之《239滑动窗口最大值》
互换性与测量技术-公差原则与选用方法
LeetCode814算题第15天二叉树系列值《814 二叉树剪枝》
Description of ESB product development steps under cloud platform
Echart地图的省级,以及所有地市级下载与使用
云平台下ESB产品开发步骤说明
机器学习怎么学?机器学习流程
随机推荐
MYSQLg高级------聚簇索引和非聚簇索引
【FPGA】day20-I2C读写EEPROM
How to delete statements audit log?
rac备库双节点查询到的表最后更新时间不一致
电商项目——商城限时秒杀功能系统
A large horse carries 2 stone of grain, a middle horse carries 1 stone of grain, and two ponies carry one stone of grain. It takes 100 horses to carry 100 stone of grain. How to distribute it?
高校就业管理系统设计与实现
The thirteenth day of learning programming
What has programmatic trading changed?
【ADI低功耗2k代码】基于ADuCM4050的ADXL363、TMP75的加速度、温度检测及串口打印、蜂鸣器播放音乐(孤勇者)
"Life Is Like First Seen" is ill-fated, full of characters, and the contrast of Zhu Yawen's characters is too surprising
uni-app - 城市选择索引列表 / 通过 A-Z 排序的城市列表(uview 组件库 IndexList 索引列表)
Docker 链接sqlserver时出现en-us is an invalid culture错误解决方案
E-commerce project - mall time-limited seckill function system
LeetCode刷题第16天之《239滑动窗口最大值》
【FPGA】day22-SPI协议回环
没想到MySQL还会问这些...
Interchangeability Measurements and Techniques - Calculation of Deviations and Tolerances, Drawing of Tolerance Charts, Selection of Fits and Tolerance Classes
En-us is an invalid culture error solution when Docker links sqlserver
“顶梁柱”滑坡、新增长极难担重任,阿里“蹲下”是为了跳更高?