Bridging Vision and Language Model

Related tags

Deep LearningBriVL
Overview

BriVL

BriVL (Bridging Vision and Language Model) 是首个中文通用图文多模态大规模预训练模型。BriVL模型在图文检索任务上有着优异的效果,超过了同期其他常见的多模态预训练模型(例如UNITER、CLIP)。

BriVL论文:WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

适用场景

适用场景示例:图像检索文本、文本检索图像、图像标注、图像零样本分类、作为其他下游多模态任务的输入特征等。

技术特色

  1. BriVL使用对比学习算法将图像和文本映射到了同一特征空间,可用于弥补图像特征和文本特征之间存在的隔阂。
  2. 基于视觉-语言弱相关的假设,除了能理解对图像的描述性文本外,也可以捕捉图像和文本之间存在的抽象联系。
  3. 图像编码器和文本编码器可分别独立运行,有利于实际生产环境中的部署。

下载专区

模型 语言 参数量(单位:亿) 文件(file)
BriVL-1.0 中文 10亿 BriVL-1.0-5500w.tar

使用BriVL

搭建环境

# 环境要求
lmdb==0.99
timm==0.4.12
easydict==1.9
pandas==1.2.4
jsonlines==2.0.0
tqdm==4.60.0
torchvision==0.9.1
numpy==1.20.2
torch==1.8.1
transformers==4.5.1
msgpack_numpy==0.4.7.1
msgpack_python==0.5.6
Pillow==8.3.1
PyYAML==5.4.1

配置要求在requirements.txt中,可使用下面的命令:

pip install -r requirements.txt

特征提取与计算检索结果

cd evaluation/
bash test_xyb.sh

数据解释

现已放入3个图文对示例:

./data/imgs  # 放入图像
./data/jsonls # 放入图文对描述

引用BriVL

@article{DBLP:journals/corr/abs-2103-06561,
  author    = {Yuqi Huo and
               Manli Zhang and
               Guangzhen Liu and
               Haoyu Lu and
               Yizhao Gao and
               Guoxing Yang and
               Jingyuan Wen and
               Heng Zhang and
               Baogui Xu and
               Weihao Zheng and
               Zongzheng Xi and
               Yueqian Yang and
               Anwen Hu and
               Jinming Zhao and
               Ruichen Li and
               Yida Zhao and
               Liang Zhang and
               Yuqing Song and
               Xin Hong and
               Wanqing Cui and
               Dan Yang Hou and
               Yingyan Li and
               Junyi Li and
               Peiyu Liu and
               Zheng Gong and
               Chuhao Jin and
               Yuchong Sun and
               Shizhe Chen and
               Zhiwu Lu and
               Zhicheng Dou and
               Qin Jin and
               Yanyan Lan and
               Wayne Xin Zhao and
               Ruihua Song and
               Ji{-}Rong Wen},
  title     = {WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training},
  journal   = {CoRR},
  volume    = {abs/2103.06561},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.06561},
  archivePrefix = {arXiv},
  eprint    = {2103.06561},
  timestamp = {Tue, 03 Aug 2021 12:35:30 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2103-06561.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
Owner
Wudao is a large-scale pre-training model project initiated by BAAI, aiming to break through the core technology and promote the development of AGI.
This is the second place solution for : UmojaHack Africa 2022: African Snake Antivenom Binding Challenge

UmojaHack-Africa-2022-African-Snake-Antivenom-Binding-Challenge This is the second place solution for : UmojaHack Africa 2022: African Snake Antivenom

Mami Mokhtar 10 Dec 03, 2022
Minecraft Hack Detection With Python

Minecraft Hack Detection An attempt to try and use crowd sourced replays to find

Kuleen Sasse 3 Mar 26, 2022
Contrastive Language-Image Pretraining

CLIP [Blog] [Paper] [Model Card] [Colab] CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pair

OpenAI 11.5k Jan 08, 2023
Pytorch implemenation of Stochastic Multi-Label Image-to-image Translation (SMIT)

SMIT: Stochastic Multi-Label Image-to-image Translation This repository provides a PyTorch implementation of SMIT. SMIT can stochastically translate a

Biomedical Computer Vision Group @ Uniandes 37 Mar 01, 2022
TensorFlow ROCm port

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

ROCm Software Platform 622 Jan 09, 2023
Code for "LoRA: Low-Rank Adaptation of Large Language Models"

LoRA: Low-Rank Adaptation of Large Language Models This repo contains the implementation of LoRA in GPT-2 and steps to replicate the results in our re

Microsoft 394 Jan 08, 2023
Implementation of various Vision Transformers I found interesting

Implementation of various Vision Transformers I found interesting

Kim Seonghyeon 78 Dec 06, 2022
Code release for Local Light Field Fusion at SIGGRAPH 2019

Local Light Field Fusion Project | Video | Paper Tensorflow implementation for novel view synthesis from sparse input images. Local Light Field Fusion

1.1k Dec 27, 2022
This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset.

DeepLab-ResNet-TensorFlow This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset. Up

19 Jan 16, 2022
An OpenAI Gym environment for Super Mario Bros

gym-super-mario-bros An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) us

Andrew Stelmach 1 Jan 05, 2022
Anomaly detection in multi-agent trajectories: Code for training, evaluation and the OpenAI highway simulation.

Anomaly Detection in Multi-Agent Trajectories for Automated Driving This is the official project page including the paper, code, simulation, baseline

12 Dec 02, 2022
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
Toolchain to build Yoshi's Island from source code

Project-Y Toolchain to build Yoshi's Island (J) V1.0 from source code, by MrL314 Last updated: September 17, 2021 Setup To begin, download this toolch

MrL314 19 Apr 18, 2022
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

458 Jan 02, 2023
This repository is for our paper Exploiting Scene Graphs for Human-Object Interaction Detection accepted by ICCV 2021.

SG2HOI This repository is for our paper Exploiting Scene Graphs for Human-Object Interaction Detection accepted by ICCV 2021. Installation Pytorch 1.7

HT 10 Dec 20, 2022
FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

0 Apr 02, 2021
This repo is customed for VisDrone.

Object Detection for VisDrone(无人机航拍图像目标检测) My environment 1、Windows10 (Linux available) 2、tensorflow = 1.12.0 3、python3.6 (anaconda) 4、cv2 5、ensemble

53 Jul 17, 2022
[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [arXiv] [Project Page] @inproceedings{ huang2021fapn, title={{FaPN}: Feature-alig

Shihua Huang 23 Jul 22, 2022
MMGeneration is a powerful toolkit for generative models, based on PyTorch and MMCV.

Documentation: https://mmgeneration.readthedocs.io/ Introduction English | 简体中文 MMGeneration is a powerful toolkit for generative models, especially f

OpenMMLab 1.3k Dec 29, 2022
Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Inter-Prototype (BMVC 2021): Official Project Webpage This repository provides the official PyTorch implementation of the following paper: Improving F

Jungsoo Lee 16 Jun 30, 2022