当前位置:网站首页>dalle2: hierarchical text-conditional image generation with clip
dalle2: hierarchical text-conditional image generation with clip
2022-08-06 07:54:00 【Kun Li】
1.introduction
The clip is robust to changes in image distribution and can be zero-shot. The diffusion model can satisfy sample diversity and has good fidelity.dalle2 combines the good features of both models.
2.method

The picture above is very good. Based on this picture, first of all, there is a clip above the dotted line. This clip is trained in advance and will not be used again during the training of dalle2.To train clip, it is a weight lock. In the training of dalle2, the input is also a pair of data, a text pair and its corresponding image, first enter a text, and go through the text encoding module of clip (bert, clip uses vit for images)., use bert to encode text, clip is a basic contrastive learning, the encoding of two modalities is very important, after modal encoding, the cosine is directly calculated for similarity).Image vector, this image vector is actually gt.The generated text code is input into the first prior model, which is a diffusion model, and an autoregressive transformer can also be used. This diffusion model outputs a set of image vectors, which are supervised by the image vectors generated by clip.It is actually a supervised model, followed by a decoder module. In the previous dalle, the encoder and the decoder were trained together in dvae, but the deocder here is a single training and a diffusion model. In fact, under the dotted lineThe generative model is to turn a complete generation step into a two-stage explicit image generation. The author experimented with this explicit generation.This article calls itself unclip, clip is to convert input text and images into features, and dalle2 is the process of converting text features into image features and then into images. In fact, image features to images are achieved through a diffusion model.In the deocder, both the classifier-free guidence and the clip's guidence are used. This guidence refers to the process of the decoder, the input is a noisy image at time t, and the final output is an image, this noisy image.A feature map obtained by unet each time can be judged by an image classifier. Here, the cross-entropy function is generally used for a two-classification, but the gradient of image classification can be obtained, and this gradient can be used to guide the diffusion to betterdecoder.
边栏推荐
- [ CTF ]【天格】战队WriteUp-第六届“强网杯”全国安全挑战赛(初赛)
- MySQL数据库的逆向生成实体类,查询等接口xml的脚本
- 明日立秋 autumn begins,天气渐凉
- 按钮只能点击一次
- dalle2:hierarchical text-conditional image generation with clip
- DemographicTable 新的基线特征表绘制 R包
- Check the inverse relationship between the shift distance and the number of iterations
- 《UnityShader入门精要》总结(2):初级篇
- Unity屏幕坐标和世界坐标的转化
- 【手机】手机选购指南
猜你喜欢

机械制造企业如何借助ERP系统,做好生产排产管理?

产品经理专业知识50篇(三)-如何寻找用户增长的根本动因

Unity NavMesh基础自动寻路

2022-08-05:以下go语言代码输出什么?A:65, string;B:A, string;C:65, int;D:报错。

记录自己LitJson解析Json的方法

Use the aggird component to implement sliding request paging to achieve the effect of infinite scrolling

Test case design method - detailed explanation of scenario method

七夕玫瑰花合集

《UnityShader入门精要》总结(2):初级篇

测试用例设计方法-场景法详解
随机推荐
Cesium关于Entity中的parent、isShowing、entityCollection和监听事件的探讨
js simulates the function of dynamically deleting messages
C language force buckle the 59th spiral matrix ②.analog matrix
《UnityShader入门精要》总结(2):初级篇
DescrTab2包,输出SCI级别的描述统计表
Unity 模型简化/合并 一键式插件
2022-08-05:以下go语言代码输出什么?A:65, string;B:A, string;C:65, int;D:报错。
【手机】手机选购指南
[Cloud Native--Kubernetes] Configuration Management
UNITY物体上下漂浮工具
猴子都能上手的Unity插件Photon之重要部分(PUN)
Why do interviewers keep asking technical questions on your resume until they can't answer them?
凹语言——名字的由来和寓意
QianBase Operation and Maintenance Practical Commands
Chapter 13 Bayesian Network Practice
ggplot2图形排版:patchwork包简单入门
LeetCode——345. 反转字符串中的元音字母
【Yugong Series】August 2022 Go Teaching Course 030-Object Inheritance
【Redhat】新系统yum源配置
[Popular Science] What basic knowledge do I need to learn to engage in Web3?
https://www.bilibili.com/video/BV17r4y1u77B?spm_id_from=333.999.0.0&vd_source=4aed82e35f26bb600bc5b46e65e25c22