当前位置:网站首页>[Letter from Wu Enda] The development of reinforcement learning!
[Letter from Wu Enda] The development of reinforcement learning!
2022-08-10 15:12:00 【Zhengyi】
一、译文
亲爱的朋友们,
In preparation for the third session of the machine learning specialization course(which includes reinforcement learning),I thought hard about why reinforcement learning algorithms are still very demanding in use.They are very sensitive to the choice of hyperparameters,Those with experience in hyperparameter tuning may gain10倍或100倍的性能提升.十年前,Using supervised deep learning algorithms is equally demanding,But as research into systematic approaches to building supervised models continues to advance,It also became more stable.
Will reinforcement learning algorithms become more robust in the next decade??我希望如此.然而,Reinforcement learning in building the real world(非模拟)Benchmarking faces a unique hurdle.
When supervised deep learning is in the early stages of development,Experienced hyperparameter regulators can achieve better results than less experienced regulators.We have to choose the neural network structure、正则化方法、学习速率、Reduced learning rate schedule、mini-batch 的尺寸、动量、Random weight initialization methods, etc..The right choice can have a huge impact on the convergence speed and final performance of the algorithm.
Credit to research progress over the past decade,We now have a more robust optimization algorithm,如Adam,更好的神经网络架构,and more systematic guidance on default choices for many other hyperparameters,making it easier to get good results.I suspect that scaling up neural networks would also make them more robust.这些天来,even if i only have100个训练样本,I would also have no hesitation in training a containing2000Network with more than 10,000 parameters(比如 ResNet-50).相反,如果在100training on samples containing1000parameter network,Every parameter is more important,So we need to tune more carefully.
My buddies and I have applied reinforcement learning to cars、直升机、四足动物、Robot Snake and many other applications.然而,Reinforcement learning algorithms today are still picky.While poorly tuned hyperparameters in supervised deep learning can cause your algorithm to train slowly3倍或10倍(这很糟糕),But in reinforcement learning,If the algorithm does not converge,May result in slower training speed100倍!Similar to supervised learning ten years ago,We have developed many techniques to help reinforcement learning algorithms converge(如双 Q 学习、软更新、Experience replay and use epsilon slowly decreasingepsilon-greedy exploration).These methods are clever,I praised the development of their researchers,But many of these techniques generate additional hyperparameters,在我看来,These parameters are difficult to tune.
Further research into reinforcement learning may follow the path of supervised deep learning,and provide us with more robust algorithms and systematic guidance on how to make those choices.but one thing worries me.在监督学习中,Benchmark datasets enable researchers around the world to tune algorithms against the same dataset,and build on each other's work.在强化学习中,More commonly used benchmarks are simulated environments,如 OpenAI Gym.But getting reinforcement learning algorithms to run on simulated robots is much easier than getting them to run on physical robots.
Many algorithms that excel at simulated tasks struggle with physical robots.Even two copies of the same robot design will be different.此外,Giving every aspiring reinforcement learning researcher their own copy of the robot is not feasible.While researchers are simulating robots(And playing video games)Rapid progress has been made in reinforcement learning,But bridges used in applications in non-simulated environments are often missing.Many excellent research labs are working on physical robotics.But since every robot is unique,Results from one lab may be difficult to replicate by other labs,This hinders the speed of development.
So far I haven't found a solution to these tricky problems.But I hope that all people can work together in the field of artificial intelligence,make these algorithms more robust and broadly effective.
Please keep learning!
吴恩达
二、原文
Dear friends,
While working on Course 3 of the Machine Learning Specialization, which covers reinforcement learning, I was reflecting on how reinforcement learning algorithms are still quite finicky. They’re very sensitive to hyperparameter choices, and someone experienced at hyperparameter tuning might get 10x or 100x better performance. Supervised deep learning was equally finicky a decade ago, but it has gradually become more robust with research progress on systematic ways to build supervised models.
Will reinforcement learning (RL) algorithms also become more robust in the next decade? I hope so. However, RL faces a unique obstacle in the difficulty of establishing real-world (non-simulation) benchmarks.
When supervised deep learning was at an earlier stage of development, experienced hyperparameter tuners could get much better results than less-experienced ones. We had to pick the neural network architecture, regularization method, learning rate, schedule for decreasing the learning rate, mini-batch size, momentum, random weight initialization method, and so on. Picking well made a huge difference in the algorithm’s convergence speed and final performance.
Thanks to research progress over the past decade, we now have more robust optimization algorithms like Adam, better neural network architectures, and more systematic guidance for default choices of many other hyperparameters, making it easier to get good results. I suspect that scaling up neural networks — these days, I don’t hesitate to train a 20 million-plus parameter network (like ResNet-50) even if I have only 100 training examples — has also made them more robust. In contrast, if you’re training a 1,000-parameter network on 100 examples, every parameter matters much more, so tuning needs to be done much more carefully.D
My collaborators and I have applied RL to cars, helicopters, quadrupeds, robot snakes, and many other applications. Yet today’s RL algorithms still feel finicky. Whereas poorly tuned hyperparameters in supervised deep learning might mean that your algorithm trains 3x or 10x more slowly (which is bad), in reinforcement learning, it feels like they might result in training 100x more slowly — if it converges at all! Similar to supervised learning a decade ago, numerous techniques have been developed to help RL algorithms converge (like double Q learning, soft updates, experience replay, and epsilon-greedy exploration with slowly decreasing epsilon). They’re all clever, and I commend the researchers who developed them, but many of these techniques create additional hyperparameters that seem to me very hard to tune.
Further research in RL may follow the path of supervised deep learning and give us more robust algorithms and systematic guidance for how to make these choices. One thing worries me, though. In supervised learning, benchmark datasets enable the global community of researchers to tune algorithms against the same dataset and build on each other’s work. In RL, the more-commonly used benchmarks are simulated environments like OpenAI Gym. But getting an RL algorithm to work on a simulated robot is much easier than getting it to work on a physical robot.
Many algorithms that work brilliantly in simulation struggle with physical robots. Even two copies of the same robot design will be different. Further, it’s infeasible to give every aspiring RL researcher their own copy of every robot. While researchers are making rapid progress on RL for simulated robots (and for playing video games), the bridge to application in non-simulated environments is often missing. Many excellent research labs are working on physical robots. But because each robot is unique, one lab’s results can be difficult for other labs to replicate, and this impedes the rate of progress.
I don’t have a solution to these knotty issues. But I hope that all of us in AI collectively will manage to make these algorithms more robust and more widely useful.
Keep learning!
Andrew
Andrew Ng Posted in:2022-08-05 10:10
边栏推荐
- 关于已拦截跨源请求CORS 头缺少 ‘Access-Control-Allow-Origin‘问题解决
- 面试面到了一个腾讯30k出来的,有见识到何为精通MySQL调优
- PAT甲级 1014 排队等候(队列大模拟+格式化时间)
- 领域驱动模型设计与微服务架构落地-从项目去剖析领域驱动
- 电脑重装系统提示activex部件不能创建对象如何解决
- Pagoda panel open Redis to specify the network machine
- “蔚来杯“2022牛客暑期多校训练营7
- PEST 分析法
- Flask框架——MongoEngine使用MongoDB数据库
- "Thesis Reading" PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable
猜你喜欢
It is reported that the original Meitu executive joined Weilai mobile phone, the top product may exceed 7,000 yuan
网络安全(加密技术、数字签名、证书)
Do not access Object.prototype method ‘hasOwnProperty‘ from target object....
【MindSpore易点通机器人-02】设计与技术选型
Understanding_Data_Types_in_Go
符合信创要求的堡垒机有哪些?支持哪些系统?
PyTorch multi-machine multi-card training: DDP combat and skills
JS entry to proficient full version
Problem solving-->Online OJ (19)
阿里五位MySQL封神大佬耗17个月总结出53章性能优化法则
随机推荐
数字藏品平台系统开发实战
websocket实现实时变化图表内容
MQTT服务器搭建
易观分析联合中小银行联盟发布海南数字经济指数,敬请期待!
sql语句 异常 Err] 1064 – You have an error in your SQL syntax; check the manual that corresponds to your
win2012安装Oraclerac失败
12海里、24海里、200海里的意义及名称
Analysys and the Alliance of Small and Medium Banks jointly released the Hainan Digital Economy Index, so stay tuned!
正则表达式(包含各种括号,echo,正则三剑客以及各种正则工具)
Websocket realizes real-time change of chart content
易基因|深度综述:m6A RNA甲基化在大脑发育和疾病中的表观转录调控作用
Digital Collection Platform System Development Practice
@RequestBody的使用[通俗易懂]
2022-08-10日报: Swin Transformer作者曹越加入智源,开展视觉基础模型研究
SWIG tutorial "four" - package of go language
systemui屏蔽通知栏
JS entry to proficient full version
解读STEAM教育中的表现性评价
面试面到了一个腾讯30k出来的,有见识到何为精通MySQL调优
司空见惯 - 股市狠狠下跌后,何時能反弹?