当前位置：网站首页>Postgraduate Work Weekly (Week 13)

Postgraduate Work Weekly (Week 13)

2022-08-09 16:48:00 【wangyunpeng33】

Tip: After the article is written, the table of contents can be automatically generated. For how to generate it, please refer to the help document on the right

NICE-GAN performance evaluation

Foreword
I. How to write deep learning code?
II. Generator Efficiency Evaluation

Foreword

This week's talk is mainly to look at the source code of the paper, think about some questions, and ask the brothers who have experience in pytorch code and the bloggers you know for some experience. I have a lot of insights and summarize it.

I. How to write deep learning code?

Of course, writing code has always been done as a college student, but the code of deep learning is obviously different from the code of the engineering project I did before. I got the source code of a paper, several main modules model, dataset, trainIt is still necessary to clarify the purpose and logic behind them.
Generally speaking, a better code sequence is to write the model first, then the dataset, and finally the train.
model constitutes the skeleton of the entire deep learning training and inference system, and also determines the input and output formats of the entire AI model.For vision tasks, the model architectures are mostly convolutional neural networks or the latest ViT model; for NLP tasks, the model architectures are mostly Transformer and Bert; for time series prediction, the model architectures are mostly RNN or LSTM.Different models correspond to different data input formats. For example, ResNet generally inputs a multi-channel two-dimensional matrix, while ViT needs to input image patches with position information.After determining what kind of model to use, the input format of the data is also determined.According to the determined input format, we can build the corresponding dataset.

dataset constructs the input and output format of the entire AI model.When writing the dataset component, we need to consider the storage location and storage method of the data, such as whether the data is stored in a distributed manner, whether the model needs to run in the case of multiple machines and multiple cards, and whether there is a bottleneck in the read and write speed.When the bottleneck of reading and writing comes, it is necessary to preload data into memory, etc.When writing the dataset component, we also fine-tune the model component in reverse.For example, after determining the data read and write for distributed training, you need to wrap the model with modules such as nn.DataParallel or nn.DistributedDataParallel, so that the model can run on multiple machines and multiple cards.In addition, the writing of the dataset component also affects the training strategy, which also paved the way for building the train component.For example, according to the size of the video memory, we need to determine the corresponding BatchSize, and the BatchSize directly affects the size of the learning rate.For another example, according to the distribution of data, we need to choose different sampling strategies for Feature Balance, which will also be reflected in the training strategy.

train builds the training strategy and evaluation method of the model, which is the most important and complex component.Building the model and dataset first can add restrictions and reduce the complexity of the train component.In the train component, we need to determine the model update strategy according to the training environment (single-machine multi-card, multi-machine multi-card or federated learning), as well as determine the total training time epochs, the type of optimizer, the size of the learning rate and the decay strategy,The initialization method of the parameters, the model loss function.In addition, in order to combat overfitting and improve generalization, it is also necessary to introduce appropriate regularization methods, such as Dropout, BatchNorm, L2-Regularization, Data Augmentation, etc.Some methods to improve generalization performance can be implemented directly in the train component (such as adding L2-Reg, Mixup), some need to be added to the model (such as Dropout and BatchNorm), and some need to be added to the dataset (such as Data Augmentation).

In addition, train also needs to record some important information of the training process and visualize this information, such as recording the average loss of the training set and the accuracy of the test set at each epoch, and writing this information to tensorboard, and then in theWeb-side real-time monitoring.In the construction of the train component, we need to fine-tune the parameters according to the model performance at any time, and improve the model and dataset components according to the results.

II. Generator Efficiency Evaluation

In fact, generators with large differences in efficiency can be directly identified by the naked eye. For generators with small gaps, the indicators that can quantify performance generally include: Inception Score (KL divergence), FID (calculating distance at feature level),KID (Kernel Unbiased Estimate).
In order to overcome the limitation of a single indicator, the ablation experiment in the paper uses FID+KID to measure the model performance.
Please add image description

Ablation experiment
Separation of key components to verify validity: NICE: No independent for encodingComponents; RA: Add residual connections in the CAM attention module; C0x represents (10 x 10 receptive field), C1x represents intermediate scale (70 x 70 receptive field), C2x represents global scale (286 x 286 receptive field); −: willThe number of shared layers decreases by 1; +: increases by 1.

insert image description here

It can be seen that NICE RA improves performance.
The second is model compression. Whether sharing the first layer encoder or the last layer decoder, it will damage the conversion performance and weaken the domain translation ability.

MMD (Maximum Mean Error)
insert image description here

By shortening the transition path between domains in the latent space, NICE-GAN built upon the shared latent space assumption can probably facilitate domain translation in the image space.

NICE-GAN based on shared latent space hypothesis may help domain transition in image space by shortening the transition path between domains in latent space

Based on the network architecture provided by the paper, NICE-GAN has more affinity than Cycle-GAN when applied to the cat and dog dataset.
insert image description here