当前位置:网站首页>RL reinforcement learning summary (2)
RL reinforcement learning summary (2)
2022-08-06 08:54:00 【Times & Beliefs】
Markov Decision Process
Markov decision process, English is Markov Decision Process, referred to as MDP.
Requirements for Markov Decisions
1. Can reach the ideal state.It can also be said that it can reach the final state.For example: AlphaGo can play chess to the winning step
2. Various attempts can be made
For example: AlphaGo can choose one of multiple positions on the chessboard when making a move
3. The next state of the agent is only related to the current state and the actions taken in the current state, not related to the previous state
The Five Elements of MDP

Tips explained:

State Value Function

Tips:
In short: weighted value!!!
Bellman's equation




The core of the Bellman equation: the value of the current state = the current reward + the value of the next step
边栏推荐
- Full screen digital preload animation
- 第十七天(续第十六天BPDU相关知识以及STP的配置)
- 回头再看ResNet——深度学习史上的关键一步
- Hdu2022 多校训练(5) BBQ
- Parameter ‘courseId’ not found. Available parameters are [arg1, arg0, param1, para
- QianBase Operation and Maintenance Practical Commands
- Use the aggird component to implement sliding request paging to achieve the effect of infinite scrolling
- 需要具备哪些能力,才能算得上是一个合格的软件测试工程师...
- From "prairie cattle" to "digital cattle": Mengniu's digital transformation!
- 《nlp入门+实战:第九章:循环神经网络》
猜你喜欢

Usage of torch.utils.data in pytorch ---- Loading Data

Jetpack WorkManager is enough to read this article~

第十六天(配置BPDU,TCN BPDU)

Day 16 (Configuration BPDU, TCN BPDU)

【内网横移方法与实验】

ArrayList 的扩容机制

国内自媒体宣发.多媒体发稿的优势

剑指 Offer 33. 二叉搜索树的后序遍历序列

dalle2: hierarchical text-conditional image generation with clip

从“草原牛”到“数字牛”:蒙牛的数字化转型之道!
随机推荐
5. 自动引入打包资源 plugins的使用
干货,分布式数据库在金融核心场景的落地实践|腾讯云数据库
Hdu2022 多校训练(5) BBQ
VLAN experiment
数据安全法在企业如何落地?
原生js 实现table表格
LinkedList 是如何完成添加的?
yum offline installation
代码签名证书可以解决软件被杀毒软件报毒提醒吗?
【无标题】
/var/log/messages is empty
Remember to deduplicate es6 Set to implement common menus
韩流体小球加载动画
实验9(交换综合实验)
剑指 Offer 56 - I. 数组中数字出现的次数
21天学习挑战赛--第三天打卡(动态更换app图标)
ACM常用头文件
Native js implements mouse following to display floating box information
山石发声 | 做好安全运营,没有你想象的那么难
How is the LinkedList added?