当前位置:网站首页>What is Machine Reinforcement Learning?What is the principle?
What is Machine Reinforcement Learning?What is the principle?
2022-08-11 03:51:00 【Program Yuan Keke】
Reinforcement Learning (RL), also known as Reinforcement Learning and Evaluation Learning, is an important machine learning method. It has many applications in the fields of intelligent control of robots and analysis and predictionapplication.
So what is reinforcement learning?
Reinforcement learning is the learning of intelligent system mapping from environment to behavior, so as to maximize the function value of reward signal (reinforcement signal), reinforcement learning is different from supervision in connectionist learningLearning is mainly manifested in the teacher signal. The reinforcement signal provided by the environment in reinforcement learning is an evaluation (usually a scalar signal) of the quality of the generated action, rather than telling the reinforcement learning system RLS (reinforcement learning system) how to go.produce the correct action.Because the external environment provides little information, RLS must learn from its own experience or ability.In this way, the RLS acquires knowledge in an action-evaluation environment and adapts the program to suit the environment.
In layman's terms, when a child is confused or confused in learning, if the teacher finds that the child's method or thinking is correct, he or she will be given positive feedback (reward orencouragement); otherwise, give him (her) negative feedback (lessons or punishment), motivate the child's potential, strengthen his (her) self-learning ability, rely on his or her own strength to actively learn and continue to explore, and finally let him (her) findThe correct method or idea to adapt to the changing external environment.
Reinforcement learning is different from traditional machine learning. It cannot get a mark immediately, but can only get a feedback (reward or penalty). It can be said that reinforcement learning is a kind of markDelayed supervised learning.Reinforcement learning is developed from theories such as animal learning and parameter perturbation adaptive control.
Principles of reinforcement learning:
If a certain behavioral strategy of the agent leads to a positive reward (reinforcing signal) in the environment, then the tendency of the agent to produce this behavioral strategy in the future will be strengthened.The agent's goal is to discover the optimal policy in each discrete state to maximize the expected discounted reward sum.
Reinforcement learning regards learning as a tentative evaluation process. Agent selects an action to use in the environment. After the environment accepts the action, the state changes, and a reinforcement signal (reward orThe agent then selects the next action according to the reinforcement signal and the current state of the environment. The principle of selection is to increase the probability of positive reinforcement (reward).The selected action not only affects the immediate enhancement value, but also affects the state of the environment at the next moment and the final enhancement value.
If the R/A gradient information is known, the supervised learning algorithm can be used directly.Because the reinforcement signal R and the action A generated by the Agent are not described in a clear functional form, the gradient information R/A cannot be obtained.Therefore, in reinforcement learning systems, some kind of random unit is required, with which the agent searches through the space of possible actions and finds the correct action.
Free to share some artificial intelligence learning materials that I have organized for you. It has been organized for a long time and is very comprehensive.Including some basic introduction videos of artificial intelligence + practical videos of common AI frameworks, image recognition, OpenCV, NLP, YOLO, machine learning, pytorch, computer vision, deep learning and neural networks and other videos, courseware source code, domestic and foreign well-known essence resources, AI popularpapers, etc.
The following are some screenshots, and the free download method is attached at the end of the article.
Table of Contents
1. AI Free Video Courses and Projects
Second, artificial intelligence must-read books
Three, Collection of Artificial Intelligence Papers
Fourth, Machine Learning + Computer Vision Basic Algorithm Tutorial
5. Deep Learning Machine Learning Cheat Sheet (26 in total)
To learn artificial intelligence well, you need to read more, do more hands-on, and practice more.gain something.
Click the business card below and scan the code to download the text for free.
边栏推荐
- DNS separation resolution and intelligent resolution
- Kubernetes集群搭建Zabbix监控平台
- 分布式和集群的区别和联系
- 80端口和443端口是什么?有什么区别?
- Multi-serial port RS485 industrial gateway BL110
- 学编程的第十三天
- How can users overcome emotional issues in programmatic trading?
- LeetCode热题(12.买卖股票的最佳时机)
- uni-app - 获取汉字拼音首字母(根据中文获取拼音首字母)
- How to delete statements audit log?
猜你喜欢
Interchangeability Measurements and Techniques - Calculation of Deviations and Tolerances, Drawing of Tolerance Charts, Selection of Fits and Tolerance Classes
es-head插件插入查询以及条件查询(五)
DNS分离解析和智能解析
Description of ESB product development steps under cloud platform
Qnet Weak Network Test Tool Operation Guide
【FPGA】设计思路——I2C协议
你不知道的 console.log 替代品
【FPGA】名词缩写
CTO said that the number of rows in a MySQL table should not exceed 2000w, why?
互换性测量技术-几何误差
随机推荐
What has programmatic trading changed?
【FPGA】SDRAM
树莓派入门(5)系统备份
构建程序化交易系统需要注意什么问题?
Echart地图的省级,以及所有地市级下载与使用
STC8H开发(十五): GPIO驱动Ci24R1无线模块
EasyCVR接入GB28181设备时,设备接入正常但视频无法播放是什么原因?
What should I do if the channel ServerID is incorrect when EasyCVR is connected to a Hikvision Dahua device and selects another cluster server?
【ADI低功耗2k代码】基于ADuCM4050的ADXL363、TMP75的加速度、温度检测及串口打印、蜂鸣器播放音乐(孤勇者)
MySQL数据库存储引擎以及数据库的创建、修改与删除
【愚公系列】2022年08月 Go教学课程 036-类型断言
【FPGA】day18-ds18b20实现温度采集
【Yugong Series】August 2022 Go Teaching Course 036-Type Assertion
Roewe imax8ev cube battery security, what blackening and swelling are hidden behind it?
Binary tree related code questions [more complete] C language
Build Zabbix Kubernetes cluster monitoring platform
输入起始位置,终止位置截取链表
C语言 recv()函数、recvfrom()函数、recvmsg()函数
[FPGA] day19- binary to decimal (BCD code)
leetcode刷题第13天二叉树系列之《98 BST及其验证》