当前位置:网站首页>Cvpr2022 𞓜 collaborative dual stream visual language pre training model for cross modal retrieval
Cvpr2022 𞓜 collaborative dual stream visual language pre training model for cross modal retrieval
2022-04-22 18:53:00 【Zhiyuan community】

Thesis link :https://arxiv.org/abs/2204.07441
Large scale single tower pre training model , It has achieved amazing retrieval results in cross modal Retrieval . Unfortunately , Because most of them use time-consuming real parameters and cross modal interaction , The retrieval efficiency is very low . lately , image CLIP and ALIGN In this way, the two tower model with high reasoning efficiency also shows good results , However , They only consider instance level alignment between modes ( Therefore, there is still room for improvement ). To overcome these limitations , We propose a novel collaborative two tower visual language pre training model , Referred to as COTS. in general , What we proposed COTS It is to improve the image quality by strengthening the interaction between modes - The effect of text retrieval . In addition to instance level alignment through momentum contrast learning , We also propose two additional cross modal interactions .(1)Token Level interaction — Without using the argument interaction model , We designed a masking visual language modeling (MVLM) Learning objectives of , The variational self encoder is used for visual coding , Visual information can be generated for each image token Level markers .(2) Task level interaction — An algorithm is designed between text to image and image to text retrieval tasks KL- Align learning objectives , The probability distribution of each task is calculated by using the negative sample queue in momentum contrast learning . In a fair comparison , What we proposed COTS The best result of all the two tower methods , Compared with the latest single tower method ,COTS Show considerable ability ( But reasoning is fast 10,800 times ). meanwhile , What we proposed COTS It is also applicable to the retrieval from text to video , In widely used MSR-VTT The best results so far have been obtained on the data set .

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204221849502088.html
边栏推荐
- 聊聊我这些年错过的一些机遇
- 【Spark】(task6)Spark RDD完成统计逻辑
- The sandbox has entered into cooperative relations with slipknot and knotfest to jointly build knotverse
- What does naas, a charging service provider, rely on to rise without building piles?
- Simple application of tablayout + viewpager2 + fragment
- How to design API interface to realize uniform format return
- 【无标题】2022年煤矿防突考试练习题及模拟考试
- Simulation experiment of Arduino uno steering gear
- 错误 C4996 ‘fopen‘: This function or variable may be unsafe. Consider using fopen_s instead. To disabl
- [untitled] 2022 coal mine outburst prevention test exercise and simulation test
猜你喜欢

浅析局域网聊天软件的能力

jsp学习(八.JDBC与文件上传处理的项目)

How to design API interface to realize uniform format return

CData Drivers for Jira完整的特征和属性

Huawei cloud media Zha Yong: Huawei cloud's technical practice in the field of video AI transcoding

【自我救赎--牛客网Top101 4天刷题计划】 第一天 热身运动

如何设计 API 接口,实现统一格式返回

PostgreSQL 15即将支持SQL标准中的MERGE语句

微星小飞机 性能监控设置

单片机红外模块知识分享,理论是日后实战的基础
随机推荐
P1794 求解好多鱼问题
2019-12-07 wav音频剪切与合并
Type description file of module code
我们需要什么样的数据库产品
Simulation experiment of Arduino uno steering gear
readline分析日志
模块代码的类型描述文件
MySQL数据库中的索引(含SQL语句)
Take you to understand the principle of highly flexible spark architecture
数据分析师职业规划——数据分析师的职业焦虑与未来发展
图片转base64
What does naas, a charging service provider, rely on to rise without building piles?
Pattern machine template computer CAD free pattern drawing and format conversion software ps300b tutorial: general CAD drawing and pattern drawing of Japanese brother pattern machine DXF file transfer
TypeScript中的命名空间使用
错误 C4996 ‘fopen‘: This function or variable may be unsafe. Consider using fopen_s instead. To disabl
STC目前所有系列的中断列表
jsp学习(八.JDBC与文件上传处理的项目)
今日指数项目之数据分时行情、数据备份和个股涨跌幅开发【十一】
深圳大学课题组发布《深圳市可持续发展评估报告(2016-2021年)》
The balance between safety and opportunity -- the choice and thinking of St stock investment target