当前位置:网站首页>What is Machine Reinforcement Learning?What is the principle?
What is Machine Reinforcement Learning?What is the principle?
2022-08-11 03:51:00 【Program Yuan Keke】
Reinforcement Learning (RL), also known as Reinforcement Learning and Evaluation Learning, is an important machine learning method. It has many applications in the fields of intelligent control of robots and analysis and predictionapplication.
So what is reinforcement learning?
Reinforcement learning is the learning of intelligent system mapping from environment to behavior, so as to maximize the function value of reward signal (reinforcement signal), reinforcement learning is different from supervision in connectionist learningLearning is mainly manifested in the teacher signal. The reinforcement signal provided by the environment in reinforcement learning is an evaluation (usually a scalar signal) of the quality of the generated action, rather than telling the reinforcement learning system RLS (reinforcement learning system) how to go.produce the correct action.Because the external environment provides little information, RLS must learn from its own experience or ability.In this way, the RLS acquires knowledge in an action-evaluation environment and adapts the program to suit the environment.

In layman's terms, when a child is confused or confused in learning, if the teacher finds that the child's method or thinking is correct, he or she will be given positive feedback (reward orencouragement); otherwise, give him (her) negative feedback (lessons or punishment), motivate the child's potential, strengthen his (her) self-learning ability, rely on his or her own strength to actively learn and continue to explore, and finally let him (her) findThe correct method or idea to adapt to the changing external environment.
Reinforcement learning is different from traditional machine learning. It cannot get a mark immediately, but can only get a feedback (reward or penalty). It can be said that reinforcement learning is a kind of markDelayed supervised learning.Reinforcement learning is developed from theories such as animal learning and parameter perturbation adaptive control.
Principles of reinforcement learning:
If a certain behavioral strategy of the agent leads to a positive reward (reinforcing signal) in the environment, then the tendency of the agent to produce this behavioral strategy in the future will be strengthened.The agent's goal is to discover the optimal policy in each discrete state to maximize the expected discounted reward sum.
Reinforcement learning regards learning as a tentative evaluation process. Agent selects an action to use in the environment. After the environment accepts the action, the state changes, and a reinforcement signal (reward orThe agent then selects the next action according to the reinforcement signal and the current state of the environment. The principle of selection is to increase the probability of positive reinforcement (reward).The selected action not only affects the immediate enhancement value, but also affects the state of the environment at the next moment and the final enhancement value.
If the R/A gradient information is known, the supervised learning algorithm can be used directly.Because the reinforcement signal R and the action A generated by the Agent are not described in a clear functional form, the gradient information R/A cannot be obtained.Therefore, in reinforcement learning systems, some kind of random unit is required, with which the agent searches through the space of possible actions and finds the correct action.
Free to share some artificial intelligence learning materials that I have organized for you. It has been organized for a long time and is very comprehensive.Including some basic introduction videos of artificial intelligence + practical videos of common AI frameworks, image recognition, OpenCV, NLP, YOLO, machine learning, pytorch, computer vision, deep learning and neural networks and other videos, courseware source code, domestic and foreign well-known essence resources, AI popularpapers, etc.
The following are some screenshots, and the free download method is attached at the end of the article.
Table of Contents

1. AI Free Video Courses and Projects

Second, artificial intelligence must-read books

Three, Collection of Artificial Intelligence Papers

Fourth, Machine Learning + Computer Vision Basic Algorithm Tutorial


5. Deep Learning Machine Learning Cheat Sheet (26 in total)

To learn artificial intelligence well, you need to read more, do more hands-on, and practice more.gain something.
Click the business card below and scan the code to download the text for free.
边栏推荐
- A brief analysis of whether programmatic futures trading or manual order is better?
- 轮转数组问题:如何实现数组“整体逆序,内部有序”?“三步转换法”妙转数组
- STC8H development (15): GPIO drive Ci24R1 wireless module
- Audio codec, using FAAC to implement AAC encoding
- MYSQLg advanced ------ clustered and non-clustered indexes
- The last update time of the tables queried by the two nodes of the rac standby database is inconsistent
- 机器学习是什么?详解机器学习概念
- Multi-serial port RS485 industrial gateway BL110
- Uni - app - access to Chinese characters, pinyin initials (according to the Chinese get pinyin initials)
- 【FPGA】day22-SPI协议回环
猜你喜欢

轮转数组问题:如何实现数组“整体逆序,内部有序”?“三步转换法”妙转数组

DNS separation resolution and intelligent resolution

DNS分离解析和智能解析
![Binary tree related code questions [more complete] C language](/img/85/a109eed69cd54be3c8290e8dd67b7c.png)
Binary tree related code questions [more complete] C language

【FPGA】day22-SPI协议回环

E-commerce project - mall time-limited seckill function system

机器学习中什么是集成学习?

Kubernetes集群搭建Zabbix监控平台

CTO说MySQL单表行数不要超过2000w,为啥?

移动端地图开发选择哪家?
随机推荐
LeetCode刷题第16天之《239滑动窗口最大值》
Is there any way for kingbaseES to not read the system view under sys_catalog by default?
LeetCode刷题第10天字符串系列之《125回文串验证》
互换性与测量技术——表面粗糙度选取和标注方法
Multi-merchant mall system function disassembly 26 lectures - platform-side distribution settings
When EasyCVR is connected to the GB28181 device, what is the reason that the device is connected normally but the video cannot be played?
输入起始位置,终止位置截取链表
【愚公系列】2022年08月 Go教学课程 035-接口和继承和转换与空接口
Is Redis old?Performance comparison between Redis and Dragonfly
二叉树相关代码题【较全】C语言
LeetCode热题(12.买卖股票的最佳时机)
[yu gong series] Go program 035-08 2022 interfaces and inheritance and transformation and empty interface
互换性测量技术-几何误差
Interchangeable Measurement Techniques - Geometric Errors
Uni - app - access to Chinese characters, pinyin initials (according to the Chinese get pinyin initials)
watch监听
C语言 recv()函数、recvfrom()函数、recvmsg()函数
The development of the massage chair control panel makes the massage chair simple and intelligent
程序化交易改变了什么?
"Life Is Like First Seen" is ill-fated, full of characters, and the contrast of Zhu Yawen's characters is too surprising