导读:

 

“The development of general intelligence relies on building algorithms that can generate algorithms on their own.AI,只有这样才能让AIGet rid of human manual planning,A true path to self-development.”

 

How to achieve general intelligence,前OpenAI研究经理、Associate Professor at University of British ColumbiaJeff Clune认为,Darwin's theory of evolution has given us the answer——From single life to human civilization,The development of intelligence seems to imply a pattern——Intelligence may not be planned.It was born from the continuous iteration and progress of the agent itself.在这个过程中,The agent itself will produce evolutionary methods that are beneficial to its development(算法).

 

A fan of evolution,CluneIn the early years, in order to explore the nature of intelligence,completed the study of philosophy.Later, because of reading the American evolutionary algorithm scientistHod Lipson的报道,Dedicated to the study and research of artificial intelligence.He then co-founded with his colleaguesUber AI Lab,并担任过OpenAIhead of research team.近日在AIacademic attentionVPT(Video Pretrained)model hasClune的贡献.

 

 

此外,Cluneprofessor still2022At the annual Zhiyuan conference, the title was“Improving Robot and Deep Reinforcement Learning via Quality Diversity, Open-Ended, and AI-Generating Algorithms”演讲,Check back for details at the end of the article.

 

近日,Zhiyuan Community interviewedClune教授,Ask him to talk about his early research experience,And for readers to readAI-GAs的核心思路.

 

 

(图片来源:Zhiyuan Conference official website)

Jeff Clune 

前OpenAI研究经理、Associate Professor at University of British Columbia

Jeff Clune主要研究深度学习,包括深度强化学习.此前,他是OpenAI研究团队负责人,UberSenior Research Manager and Founding Member of the Artificial Intelligence Lab,Harris Associate Professor of Computer Science at the University of Wyoming,and research scientists at Cornell University.Michigan State University(博士、硕士)and the University of Michigan(学士)的学位.自 2015 年以来,He received the Presidential Early Career Award for Scientists and Engineers from the White House,在《Nature》发表了两篇论文,在PNAS发表了一篇论文,获得了NSF CAREER奖,Decade Distinguished Paper and Outstanding Young Investigator Award,and at top machine learning conferences(NeurIPS、CVPR、ICLR 和 ICML)上获得了最佳论文奖、Oral presentations and invited lectures.His research is often reported in the media,包括《纽约时报》、NPR、NBC、《连线》、BBC、《经济学家》、《科学》、《自然》、《国家地理》、《大西洋》和《新科学家》.

An interview with writing:戴一鸣
编辑:李梦佳
From Philosophy Student to Computer Ph.D.,Pursue diligently solve evolution and intelligent two big problems

1. Big fan of Darwin's theory of evolution,Want to understand the origin of intelligence through philosophy

 

Jeff CluneNot getting into computer science right from the start.在本科阶段,He studied philosophy at the University of Michigan.根据Uber Engeering在2019In an interview with the article[1],CluneI've always been obsessed with two questions:

 

  1. How such a complex life forms on earth evolved?比如说,Why do jaguars exist in nature、鹰、海豚、various forms of life such as whales?In what way can such an infinite number of life engineering miracles be born??Darwin's theory of evolution answers some questions,But there's still a lot that humans don't understand.

     

  2. “思考”是如何发生的?Can we build a thinking machine?

 

为此,CluneChoosing Philosophy as a College Major.to his disappointment,Although philosophy is very interesting,He can't test whether his views are correct and improve iteratively.[1]

 

2000年代初,CluneRead an introduction to then Cornell University professorHod Lipson[7]Article on developing robots using evolutionary algorithms[2].他深受影响,intend to choose to study machine learning and computer science,Because he finally had the opportunity to build intelligent systems by,to understand it.CluneIdentify with American theoretical physicistsRichard Feynman的名言:“What I cannot create, I do not understand.”(what i can't create,i can't understand.)在Clune看来,Artificial intelligence research is an excellent way to can solve evolution and thinking.

 

2. Interprofessional applications for Ph.D. repeatedly rejected,Eight years of studying to work with idols

 

with enthusiasmCluneThe first contactHod Lipson,Hope to join his lab.遗憾的是,The lab requires applicants to have a PhD in Computer Science,而CluneOnly graduating from undergraduate.Then in the process of applying for a Ph.D.,Since he only has a philosophical background,was widely rejected[1],To apply for the computerPhD项目,Requires an undergraduate computer degree.

 

Opportunity from Michigan State University,CluneFound a professor working with researchers in the field of using evolutionary algorithms to study the evolutionary process of the complexity of biological systems.CluneSo he completed his Master of Philosophy at Michigan State University,And applied for the computer doctoral program.终于,在读到LipsonArticle nearly8年后,ClunePhD and withLipson取得了联系.“I now have a Ph.D.,can join your lab?”[1]

 

Hod Lipson教授

(图片来源:维基百科[7])

 

LipsonThe professor readily agreed.CluneVery excited to join the lab.据形容,it's like a novel《查理与巧克力工厂》Dazzling in the Willy Wonka Chocolate Factory.[1] 两年后,CluneEstablished his own laboratory at the University of Wyoming.现在,He was almost a week can receive an email with the students,They are just like themselves,Eager to enter the field of artificial intelligence,But don't know how to do it.

 

Clune在康奈尔大学Hod Lipson教授的Creative Machine Lab,使用3DPrinter prints objects designed by artificial intelligence[10].特别的是,These objects are based on developmental biology(Developmental Biology)Generated by inspired coding designs,这种思路被称为CPPNs[11],由 Kenneth Stanley提出.

(来源:Provided by the author)

 

3. Greatness cannot be planned:影响Clunebig ideas

 

Clune表示,The biggest influence on his career was a former University of Central Florida computer science professorKenneth Stanley.Stanley教授是Geometric Intelligence的联合创始人,CluneUsed to be at this startup,后来被Uber收购,For creatingUber AI Labs的一部分(StanleyProfessor is also a founding member of the lab).Stanley于2020年加入OpenAI,He is known for many breakthroughs in neuroevolution,包括 NEAT [9]、HyperNEAT、CPPNs、Novelty Search、POET和Go-Explore(The last two withJeff Clune合作)等.

 

(图片来源:https://www.goodreads.com/book/show/25670869-why-greatness-cannot-be-planned)

 

Stanley与Joel LehmanCo-author of academic works——《Why can't plan greatness:The myth of the target》(Why Greatness Cannot Be Planned: The Myth of the Objective).The book presents“goal paradox”的观点,指出:“Once a goal is created,we've destroyed the ability to make it happen”.Many of the ideas in this book are embodied inStanley和CluneDiversity Algorithms in Mass(Quality Diversity Algorithms)And open algorithm(Open-ended Algotithms)in work,例如在POET[3]和Enhanced POET(https://arxiv.org/abs/2003.08536).

 

Kenneth Stanley教授

(图片来源:https://www.youtube.com/watch?v=dXQPL9GooyI)

 

在近期,Cluneand his research team proposed a video pre-training modelVPT[8],能够帮助AILearning task-completing actions from unsupervised video data.Zhiyuan community pleaseCluneThe professor explained it.

 

VPT:Pre-training with video,learning is action

1. 序列任务:巧妇难为无米之炊

 

Sequence task is a task form of scenes in human daily life,Can be defined as require multiple steps、work performed by a process or action,Follow the steps to complete the cooking ingredients、Browse the web for the correct information、conduct chemical experiments、Assemble furniture, etc. according to the instructions.This task is easier for humans.One just needs to follow the instructions、指示图、Video tutorials or other materials for imitation learning,In a relatively short period of time can master the action,完成任务.

 

然而,Such a seemingly simple task,对于AI则比较困难.Cluneand his colleagues think,The most important reason is that the current public data lacks labels.例如,在视频教程中,No supervised learning annotations for each frame action,所以AIIt is very difficult to learn these actions just by watching videos.

 

Commonly used methods for reinforcement learning is often a sequence of tasks,但根据VPT论文[4],The sampling efficiency of this method is low.And the rewards from the environment are very sparse,But some of the stepsAITo repeat many times to complete,even deceptivehard-exploration,High cost for reinforcement learning.including browsing the web,使用PS软件,On the task of booking flights, etc.,Reinforcement learning underperforms.因此,How to explore a small sample、Solutions that leverage unsupervised data at scale,Become the focus of researcher thinking.

 

2. VPT:Large-scale unsupervised training based on pseudo-labels

 

The goal of the current mainstream semi-supervised imitation learning is to enable the model to learn from labels with few or no explicit actions.(即伪标签)中学习.VPTThe idea of ​​unsupervised training based on a small number of labels is adopted.The researchers used video as the data modality.

 

首先,在VPT项目中,Crowdsourcers need to record a video of completing a task,And record the actions in the process of completing this task,and its corresponding keyboard and mouse operations(点按、拖拽等)as the label corresponding to the video frame.当然,The amount of labeled data is small compared to the entire unsupervised dataset.

 

VPT的训练过程

(图片来源:VPT论文[4])

 

The researchers chose the sandbox simulation game that has swept the worldMineCraft(MC)as a mission scenario.在网络上,已经有大量的MCGame Videos as Unsupervised Data.MCset build、养成、Adventure and more,Contains a large number of sequential action missions.在MC中,Players collect in a certain order、Combination raw materials,Make more complex props.

 

想要在MCsuccess in the game,Agents need to act in the same way as humans(如移动鼠标、Tap the correct keyboard shortcut、Drag and drop to get items、Attack the animals and plants in the game, etc.).MCThere are also many simple graphical interface to letAIto learn to operate.最后,MCitself is already a testAIan important platform for learning,and provides a wealth of components,例如模拟的3D世界,There are tech trees etc.,对于AIfriendly enough for training,but also challenging.

 

MCThe game make stone pickaxe need through the process、The number of actions and the median time humans need to complete

(图片来源:VPT博客[8])

 

接着,研究者设计了一个Inverse Dynamic Model(Inverse Dynamics Model,IDM),Make it learn to label sequences of actions in data videos.训练的目标是:each time step in a given video,action based on time before and after it,to predict the ongoing action at the current time step.Behavioral cloning has traditionally been used by researchers(Behavioral Cloning,BC)模型,This model is based on observations of the past(Including actions and changes in the environment, etc.),Make predictions about the intent and distribution at the next time step,This requires a lot of data for training.IDMacausal(Non-causal)的,It can see predicted actions past and future actions.[4]

 

同时,在大多数设置下,Changes in the environment are much smaller than changes in human actions,所以IDMModel only need to pay attention to the action itself,A certain degree of ignorance of changes in the environment,This requires less and simpler labeled data for training.

 

训练好IDM之后,Researchers make it generate pseudo-labels for large-scale unlabeled training data,and then train again aBC模型,imitate learning.训练完成后,The model can then be fine-tuned for downstream tasks,Using behavioral cloning and reinforcement learning.

 

The pretrained model learned to make stone and diamond pickaxes through fine-tuning and reinforcement learning

(图片来源:VPT博客[8])

 

如上图所示,Researchers let models learn how to make sticks and building benches(Crafting Table)后,Then let the model complete the task of making a diamond pickaxe.Compared to using random initialization of the reinforcement learning model,VPTSuccessfully crafted a diamond pickaxe(在10minutes of video in2.5%完成),and the success rate(Evaluate whether you can collect all the crafting materials)on par with humans,This is the first time that an agent hasMCComplete the task of crafting a diamond pickaxe in the game.And for this task,The average human needs to24000个动作,花费超过20分钟的时间.[8]

 

Why not let the model generate video frames directly?Clune认为,This study is toAILearn the actions and sequence of operations to perform a task from videos.A more efficient method is to make the model of learning how to action,instead of learning to generate representations for future videos.他举例说,imagine that there may be many complex cloud patterns in a game,But the shape of these clouds by themselves has no effect on the player completing the game..然而,When the player needs to cross the river,The precise location of each stone in the river is even more critical.A video generative model might try to model both the cloud and rock positions in the game,但是人类,以及VPTThe model will focus on the position of the stone itself,ignore the existence of clouds.

 

关注模型“learned actions”本身,More important for applying the model on downstream tasks.论文中[4]提到:

 

“VPTMay even be a better gm said learning method,Can act on unlearned downstream tasks——例如,通过微调,Let the model explain what's happening in the video.可以说,在任何给定场景中,The most important information is represented by features that can predict the distribution of future human actions..”

 

从“guide model action”的角度而言,Generating video is unnecessary and expensive.

 

3. VPT的启示:AIProgress must stand on the shoulders of giants

 

问及VPTWhen compared with the advantage of the reinforcement learning,Clune认为,The rapid development of pre-training models in the past two years tells us:要提升AI的性能,need to stand on the shoulders of giants(Especially in the context of huge human datasets“肩膀”上).Traditional reinforcement learning methods require agents to explore from scratch,For more general scenarios,This approach is difficult and inefficient.If there is a behavioral prior(Behavioral Priors),Great improvement for the model.这不仅适用于NLP和视觉,在机器人领域,This rule also applies.

 

当然,Clune也表示,Will the training model and reinforcement learning methods,its challenges andNLP等类似——The size of the model can become very large,The training cost is also higher,It is also costly especially in terms of acquiring and storing high-quality data.同时,Models have the potential to learn things from data that people don’t expect,比如偏见.

 

And in the field of robotics,CluneIdentify challenges unique to a field:Running an emulator is expensive,对于机器人来说,To apply the pretrained model,Need to bridge the gap between simulation and reality.

 

AI-GAs:get rid of manual“计划”,让AI自己产生“伟大”,Another road to general intelligence?‍

 

1. Inexhaustible modules and ways to build,Human manual methods have faced difficulties

 

最近一段时间,关于AGIRoute discussions intensify.on the topic of general intelligence,Clune则认为,Fastest way to human levelAI,The method is likely to be throughAI生成的算法(AI-Generating Algorithms, AI-GAs),Machine learning is about creating powerfulAI的重要途径.

 

在2020年的论文中[5],CluneTwo builds are summarizedAI的方式.

 

  • The first is the manual method,achieved through two stages.在第一阶段,One needs to explore the various modular components needed to build intelligence.

 

  • 在第二阶段,people in various ways,Assemble these modules into a complex intelligent system.This model is currently adopted by most machine learning research institutes.

 

然而,Clune认为,It is very difficult to achieve intelligence this way.首先,People need to spend a lot of time and effort,Explore modules that enable intelligence.This smart module is almost endless,One can't seem to exhaust its variety and form.

 

CluneIn the thesis cited researchers have put forward a lot of building intelligent system module

(图片来源:AI-GAs论文[5])

 

Even if humans can find a variety of effectiveAI组件,The work of building intelligence is also far from done,论文[5]指出:There are many ways to combine modules,People need to find the most correct combination;And there will be many nonlinear interaction effects between components,This makes it possible to understand the performance of the entire system,and it becomes very difficult to debug it;When building with dozens or even hundreds of components,It faces complex scientific and engineering challenges.It is difficult for us to understand how a single module contributes to the wholeAIThe impact of intelligence,Not to mention the impact of different combinations and configurations.

 

此外,CluneAlso critiques the research paradigms needed to build large-scale intelligent systems.Traditional research is done by a small team.while large engineering teams——Similar to the Apollo moon program team,will bring the organization to a single large-scale project“绑定”.

 

2. AI-GAs:Three pillars build a more naturalAI构建方式

 

Considering the many problems brought by the first path.Cluneproposed another way to realize human abilityAI的方法——让AI生成算法.其主要思路是让AILearn as many automation methods as possible,New algorithm of automatic generation to be able to run and adapt to the environment,并不断迭代进化.回顾AI的发展历史,Humans are built this wayAI的:Designed by handAI的系统架构,Let the machine learn after hand coding some components(Fitting by Gradient Descent)[5],Finally, it should be developed to allowAIGenerate code yourself and learn.

 

The goal of this approach is to makeAILearn how to improve your general abilities,这需要AIcapable of discovering、Method for refining and combining components in separate intelligent machines,This means is currently impossible for humans to complete,But machines can keep improving、改进的方法.[5]

 

AI-GAsIt is also the evolutionary method of intelligent life in nature..在达尔文进化论中,An unintelligent algorithm is to adapt life to the environment,Then iterate by means of simple replication at scale.this mode to the end,Human consciousness finally formed.可以说,Darwin's evolutionary method is the first general intelligence generation algorithm.

 

Clune认为,要实现AI-GAs,Three pillars are needed.One is to meta-learn the architecture(Meta-learning the Architectures),The second is meta-learning for learning algorithms(Meta-learning the Learning Algorithms Themselves);The third is to automatically generateAIAn environment for challenge and learning(Automatically Generating the Learning Challenges and Environments).Although the first two in the past decades have a lot of research progress,But the exploration of the third pillar is very rare.[5]

 

当前AIThe training method is mainly based on humans providing suitable data sets.但如果要让AIable to learn,you need to set up an environment suitable for learning,instead of just feeding the model some data.Just as human beings set the appropriate syllabus in the education of children,实现AI的自主学习,need guidanceAIFrom easy to difficult,in a certain order,To learn enough knowledge and skills.AI-GAs论文指出[5],In the future, there will be specialAI设计的“教学大纲”.而研究AI学习的环境,will be a new frontier for future research,and potential fields of significant influence.

 

Clune表示,He has nowOpenAI离职,I will spend more time with my family,并专注于在UBCand vector college research,Included in the futureAI-GAs相关的研究上.The reader friends if they are interested in,可以观看Clune教授在2022Keynote report at Zhiyuan Conference,and on his personal website(JeffClune.com)Read related research on.

 

 

 

参考资料:

[1] Advancing AI: A Conversation with Jeff Clune, Senior Research Manager at Uber: https://eng.uber.com/jeff-clune-interview/

[2] Scientists Report They Have Made Robot That Makes Its Own Robots:

https://www.nytimes.com/2000/08/31/us/scientists-report-they-have-made-robot-that-makes-its-own-robots.html

[3] POET: Endlessly Generating Increasingly Complex and Diverse Learning Environments and their Solutions through the Paired Open-Ended Trailblazer: https://eng.uber.com/poet-open-ended-deep-learning/

[4] Baker, Bowen, et al. "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos." arXiv preprint arXiv:2206.11795 (2022).

[5] Clune, Jeff. "AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence." arXiv preprint arXiv:1905.10985 (2019).

[6] A Path Towards Autonomous Machine Intelligence: https://openreview.net/pdf?id=BZ5a1r-kVsf

[7] Hod Lipson: https://en.wikipedia.org/wiki/Hod_Lipson

[8] Learning to Play Minecraft with Video PreTraining (VPT): https://openai.com/blog/vpt/

[9] Kenneth Stanley: https://en.wikipedia.org/wiki/Kenneth_Stanley

[10] https://dl.acm.org/doi/abs/10.1145/2078245.2078246

[11] https://link.springer.com/article/10.1007/s10710-007-9028-8