当前位置:网站首页>What is Machine Reinforcement Learning?What is the principle?
What is Machine Reinforcement Learning?What is the principle?
2022-08-11 03:51:00 【Program Yuan Keke】
Reinforcement Learning (RL), also known as Reinforcement Learning and Evaluation Learning, is an important machine learning method. It has many applications in the fields of intelligent control of robots and analysis and predictionapplication.
So what is reinforcement learning?
Reinforcement learning is the learning of intelligent system mapping from environment to behavior, so as to maximize the function value of reward signal (reinforcement signal), reinforcement learning is different from supervision in connectionist learningLearning is mainly manifested in the teacher signal. The reinforcement signal provided by the environment in reinforcement learning is an evaluation (usually a scalar signal) of the quality of the generated action, rather than telling the reinforcement learning system RLS (reinforcement learning system) how to go.produce the correct action.Because the external environment provides little information, RLS must learn from its own experience or ability.In this way, the RLS acquires knowledge in an action-evaluation environment and adapts the program to suit the environment.

In layman's terms, when a child is confused or confused in learning, if the teacher finds that the child's method or thinking is correct, he or she will be given positive feedback (reward orencouragement); otherwise, give him (her) negative feedback (lessons or punishment), motivate the child's potential, strengthen his (her) self-learning ability, rely on his or her own strength to actively learn and continue to explore, and finally let him (her) findThe correct method or idea to adapt to the changing external environment.
Reinforcement learning is different from traditional machine learning. It cannot get a mark immediately, but can only get a feedback (reward or penalty). It can be said that reinforcement learning is a kind of markDelayed supervised learning.Reinforcement learning is developed from theories such as animal learning and parameter perturbation adaptive control.
Principles of reinforcement learning:
If a certain behavioral strategy of the agent leads to a positive reward (reinforcing signal) in the environment, then the tendency of the agent to produce this behavioral strategy in the future will be strengthened.The agent's goal is to discover the optimal policy in each discrete state to maximize the expected discounted reward sum.
Reinforcement learning regards learning as a tentative evaluation process. Agent selects an action to use in the environment. After the environment accepts the action, the state changes, and a reinforcement signal (reward orThe agent then selects the next action according to the reinforcement signal and the current state of the environment. The principle of selection is to increase the probability of positive reinforcement (reward).The selected action not only affects the immediate enhancement value, but also affects the state of the environment at the next moment and the final enhancement value.
If the R/A gradient information is known, the supervised learning algorithm can be used directly.Because the reinforcement signal R and the action A generated by the Agent are not described in a clear functional form, the gradient information R/A cannot be obtained.Therefore, in reinforcement learning systems, some kind of random unit is required, with which the agent searches through the space of possible actions and finds the correct action.
Free to share some artificial intelligence learning materials that I have organized for you. It has been organized for a long time and is very comprehensive.Including some basic introduction videos of artificial intelligence + practical videos of common AI frameworks, image recognition, OpenCV, NLP, YOLO, machine learning, pytorch, computer vision, deep learning and neural networks and other videos, courseware source code, domestic and foreign well-known essence resources, AI popularpapers, etc.
The following are some screenshots, and the free download method is attached at the end of the article.
Table of Contents

1. AI Free Video Courses and Projects

Second, artificial intelligence must-read books

Three, Collection of Artificial Intelligence Papers

Fourth, Machine Learning + Computer Vision Basic Algorithm Tutorial


5. Deep Learning Machine Learning Cheat Sheet (26 in total)

To learn artificial intelligence well, you need to read more, do more hands-on, and practice more.gain something.
Click the business card below and scan the code to download the text for free.
边栏推荐
- 一次简单的 JVM 调优,学会拿去写到简历里
- 作业8.10 TFTP协议 下载功能
- How does MSP430 download programs to the board?(IAR MSPFET CCS)
- En-us is an invalid culture error solution when Docker links sqlserver
- Build Zabbix Kubernetes cluster monitoring platform
- 多串口RS485工业网关BL110
- How can users overcome emotional issues in programmatic trading?
- Leetcode 450. 删除二叉搜索树中的节点
- Is there any way for kingbaseES to not read the system view under sys_catalog by default?
- Will oracle cardinality affect query speed?
猜你喜欢
![[FPGA] day19- binary to decimal (BCD code)](/img/d8/6d223e5e81786335a143f135385b08.png)
[FPGA] day19- binary to decimal (BCD code)

The last update time of the tables queried by the two nodes of the rac standby database is inconsistent

互换性测量与技术——偏差与公差的计算,公差图的绘制,配合与公差等级的选择方法

【FPGA】day19-二进制转换为十进制(BCD码)

LeetCode刷题第12天二叉树系列之《104 二叉树的最大深度》

LeetCode刷题第10天字符串系列之《125回文串验证》

高校就业管理系统设计与实现

Graphical LeetCode - 640. Solving Equations (Difficulty: Moderate)

【FPGA】day18-ds18b20实现温度采集

树莓派入门(5)系统备份
随机推荐
uni-app - 城市选择索引列表 / 通过 A-Z 排序的城市列表(uview 组件库 IndexList 索引列表)
Multi-serial port RS485 industrial gateway BL110
Paper Accuracy - 2017 CVPR "High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis"
机器学习可以应用在哪些场景?机器学习有什么用?
LeetCode Hot Questions (12. The Best Time to Buy and Sell Stocks)
[FPGA] Design Ideas - I2C Protocol
Interchangeability Measurements and Techniques - Calculation of Deviations and Tolerances, Drawing of Tolerance Charts, Selection of Fits and Tolerance Classes
移动端地图开发选择哪家?
The development of the massage chair control panel makes the massage chair simple and intelligent
Homework 8.10 TFTP protocol download function
LeetCode刷题第10天字符串系列之《125回文串验证》
常见布局效果实现方案
【FPGA】SDRAM
C language recv() function, recvfrom() function, recvmsg() function
App Basic Framework Construction丨Log Management - KLog
电商项目——商城限时秒杀功能系统
MYSQLg advanced ------ return table
互换性测量与技术——偏差与公差的计算,公差图的绘制,配合与公差等级的选择方法
Qnet Weak Network Test Tool Operation Guide
E-commerce project - mall time-limited seckill function system