当前位置：网站首页>Fundamentals of in-depth learning -- a simple understanding of meta learning (from Li Hongyi's course notes)

Fundamentals of in-depth learning -- a simple understanding of meta learning (from Li Hongyi's course notes)

2022-04-23 05:59:00 【umbrellalalalala】

Know that the account with the same name is released synchronously

One 、 A preliminary understanding

Please add a picture description
Let's take the classification problem as an example , before , The purpose of learning is to learn a binary classifier $f^*$ ; Now? , The purpose of learning is to learn a learning algorithm F, This learning algorithm F Be able to learn a binary classifier $f^*$ . Please add a picture description
Since you want to learn a learning algorithm directly F, Then we have to consider its parameters . Past learning , Is to learn a specific binary classifier $f^*$ , Suppose that the learning algorithm specified artificially is a perceptron , So the process of parameter learning needs to be improved $w$ and $b$ . For now meta learning, The goal is to learn the algorithm directly F, Then the parameters that need attention （ In the picture above “component” Express ） Network architecture 、 Initialize parameters 、 Learning rate and so on , We use it $\Phi$ To represent these parameters , $\Phi$ Also known as learnable components.

meta learning Training process of , Is constantly adjusting $\Phi$ , So as to obtain a good learning algorithm F, Then in the use phase , Users can use F To train a binary classifier on a given data set $f^*$ , This two classifier should be a good two classifier .

Please add a picture description
The picture above is meta learning An overall process architecture . To be specific , In previous studies , We have a training set , There is a test set , On the training set, we use the perceptron algorithm to train a loss Lower two classifiers , Then test whether the two classifiers are good or not on the test set ; And now there is meta learning, The situation has changed , We have a training task set （ There are a lot of training tasks in it ）, There is a test task , There are training materials and test materials in each training task , There are training materials and test materials in each test task , We use this training task set or a pile of training tasks to train a good learning algorithm $F_{\Phi^*}$ , Then in the test phase , Using this learning algorithm $F_{\Phi^*}$ A binary classifier is trained from the training data in the test task $f_{\theta^*}$ , Then test the two classifiers on the test data in the test task $f_{\theta^*}$ Is it good or not .

Please add a picture description
（ Of course , The obvious thing is , As in previous studies , The test data cannot appear the same in the training process , stay meta learning in , Test tasks cannot be used during training .）

Seeing this, you may have questions ,meta learning To do so , Will it be a little superfluous ？ Why do you have to learn algorithms first F, Then use the learning algorithm F To learn two classifiers f Well ？ Learn two classifiers directly f Is it not good? ？

actually , In reality , We may face a problem , That is the lack of data . Take the above two classification problems as an example , If you want to classify cats and dogs , that labeled training data It's a lot , But if you're going to treat a strange looking beast —— For example, African pangolins and maned wolves are classified , And you have only a small amount of labeled training data, So what to do now ？ Let's start with the answer ： What you can do at this time is , Find a lot of training tasks , These tasks can include cat and dog classification tasks 、 Apple orange two classification task 、 Car and bicycle two classification tasks, etc , In short, these tasks are labeled data Very abundant , Then you take these two category tasks as meta learning The training task of , To train a binary learning algorithm F, Then take the dichotomous task of African pangolin and maned wolf as the test task , You may only collect a small amount of... For this test task labeled data, That is, there are few training materials and test materials for this training task , But because you already have a good F, Then you can use F On this small amount of training materials, a two classifier of African pangolin and maned wolf is trained （ Then test the two classifiers on the test data ）.

Next , Let me further elaborate on the rationality of the above process , That is, we use easy data to train F, And then use F Train on data that is not easy to collect f, Why this f It can be effective ？ We analyze this problem from the perspective of people's own growth , Just imagine , Before you can tell the difference between a computer and a mobile phone , You've probably only seen a few mobile phones 、 Several computers , But you can successfully distinguish between a computer and a mobile phone . Seems to be , You train yourself to “ Computer mobile phone II ” front , And I didn't get much training material , So why can you succeed ？ actually , That's because you have accumulated a lot of experience in your previous life , Maybe when you were in primary school and middle school , You've seen the difference between whiteboard and blackboard , So you understand that objects can be distinguished by color ; Maybe you have eaten Tangyuan and xuetangyuan （ An ice cream ）, Although they are very similar , But you see that objects can be distinguished by temperature ; Maybe you have encountered countless kinds of round and square objects in your life , So you see that shape is also an important reference for distinguishing objects … In short, before you come into contact with mobile phones and computers , You may have trained yourself on countless dichotomous problems , let me put it another way , You've been exposed to a huge amount of training materials for binary training tasks , Learning a learning algorithm for binary classification problem F. So now , When you first came into contact with computers and mobile phones , When thinking about the difference between them , You start using F In a small number of computers 、 Training computer on mobile training materials - The process of mobile phone two classifier , So I can easily get a good computer - Mobile phone II f.

That's all for the image , Next, let's go back to meta learning The technology itself .

Look again at the process architecture in the above picture , We said ,F It's trained by a bunch of training tasks , But I didn't talk about how to do it in the training process . We know that there are training materials and test materials in each training task , You may ask , Just mentioned how the training materials and test materials in the test task should be used , But we haven't talked about how to use the training materials and test materials in the training task , So how should I use it ？ How to use the training materials in this pile of training tasks to train this learning algorithm $F_{\Phi^*}$ Well ？ What is the purpose of the test data in this pile of training tasks ？ How do we calculate this as usual loss Well ？ Even with loss, How to optimize learnable components $\Phi$ Well ？

Next, answer this question ：
Please add a picture description
The figure above shows a training process . hypothesis , There are two tasks in the training task set , Task one is apple orange classification task , The second task is to classify bicycles and cars . Task 1 has some photos of apples and oranges as training materials , There are some photos of apples and oranges as test data ; Task 2 has some photos of bicycles and cars as training materials , There are some photos of bicycles and cars as test data .

Please add a picture description
Like traditional learning , At first we need to initialize learnable component $\Phi$ , So we have an initial learning algorithm $F_\Phi$ .

Let's do something about task one first , All you have to do is use $F_\Phi$ The training materials in it （ Some photos of apples and oranges ） Come up and train a two classifier $f_{\theta^{1*}}$ , This $\theta^{1*}$ It refers specifically to the use of $F_\Phi$ Two classifiers learning on task one $f_{\theta^{1*}}$ Parameters of .
Please add a picture description
then , Use this two classifier to test the test data of task 1 , To calculate a loss $l^1$ ;

similarly , We can also repeat the above process for task 2 , To get loss $l^2$ ：
Please add a picture description
The final loss Nature is equal to $l^1+l^2$

Of course , If the training task focus has N A mission , So the final loss As shown in the figure below ：
Please add a picture description
There is now a loss, Naturally, it can be optimized , Here's the picture ：

The process above is very similar to the traditional learning process . But there's a problem , We know that in the perceptron algorithm , You can just let loss Yes $w$ and $b$ Calculate the gradient , And then use gradient descent , But in meta learning Our parameter in is learnable component $\Phi$ , It includes the architecture of the network and so on 、 stay loss Nondifferentiable elements in , How to optimize it ？ Li Hongyi said , If you can calculate loss Yes $\Phi$ Differential of , Then you can use gradient descent , If not , You can use Reinforcement Learning perhaps Evolutionary Algorithm.

Two 、 and ML The difference of

Come here ,meta learning The basic thought flow of is clear , Let's compare meta learning and machine learning Differences between ：
Please add a picture description
First , It's the picture above , One is looking for f, One is looking for F,F It's used to find f Of .

secondly ,ML Corresponding to a task ,meta learning Corresponding to multiple tasks . The training material in the attention task is also called Support set, The test data is also called Query set.meta learning The learning process is called Across-task Training.

Please add a picture description
The picture above shows Within-task and Across-task Comparison of , The meaning is self-evident .

Of course, the difference between the two loss There is also a difference .

Please add a picture description
ML Technology or thing in , Can also be used or occur in meta learning On . With the above development task The corresponding is “ Verification task ”, It is between training task and testing task , Can be regarded as corresponding to ML Validation set in .

3、 ... and 、 Application understanding

Now let's talk about different kinds of meta learning.

meta learning How is it classified ？ We first mentioned , $\Phi$ It stands for meta learning Parameters in , It is called learnable components, It has a lot of component, May include network architecture 、 Parameter initialization 、 Learning rate and so on . Of course these component There may be 、 Or maybe not , So we will be different $\Phi$ Corresponding to different kinds of meta learning On ：
Please add a picture description
for instance , If $\Phi$ It refers to the network architecture , that meta learning It has become familiar to many people NAS：

So at this time loss Yes $\Phi$ It must be non differentiable , You can use techniques such as reinforcement learning ：

Please add a picture description
Of course, in addition to intensive learning , Evolutionary algorithms can also be used , The figure above shows some related paper.

meta learning There are also some applications , such as Few-shot Image Classification：
Please add a picture description
Remember so much first , Li Hongyi has another class , That class is about more specific algorithm details , Today, let's learn about meta learning The basic idea , See it when you need it .