(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Last update: Dec 04, 2022

Related tags

Deep Learning Kaleido-BERT

Overview

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Mingchen Zhuge*, Dehong Gao*, Deng-Ping Fan#, Linbo Jin, Ben Chen, Haoming Zhou, Minghui Qiu, Ling Shao.

[Paper][中文版][Video][Poster][MSRA_Slide][News1][New2][MSRA_Talking][机器之心_Talking]

Introduction

We present a new vision-language (VL) pre-training model dubbed Kaleido-BERT, which introduces a novel kaleido strategy for fashion cross-modality representations from transformers. In contrast to random masking strategy of recent VL models, we design alignment guided masking to jointly focus more on image-text semantic relations. To this end, we carry out five novel tasks, \ie, rotation, jigsaw, camouflage, grey-to-color, and blank-to-color for self-supervised VL pre-training at patches of different scale. Kaleido-BERT is conceptually simple and easy to extend to the existing BERT framework, it attains state-of-the-art results by large margins on four downstream tasks, including text retrieval ([email protected]: 4.03% absolute improvement), image retrieval ([email protected]: 7.13% abs imv.), category recognition (ACC: 3.28% abs imv.), and fashion captioning (Bleu4: 1.2 abs imv.). We validate the efficiency of Kaleido-BERT on a wide range of e-commercial websites, demonstrating its broader potential in real-world applications.

Noted

Code will be released in 2021/4/16.
This is the tensorflow implementation built on Alibaba/EasyTransfer. We will also release a Pytorch version built on Huggingface/Transformers in future.
If you feel hard to download these datasets, please modify /dataset/get_pretrain_data.sh, /dataset/get_finetune_data.sh, /dataset/get_retrieve_data.sh, and comment out some wget #file_links as you want. This will not inhibit following implementation.

Get started

Clone this code

git clone [email protected]:mczhuge/Kaleido-BERT.git
cd Kaleido-BERT

Enviroment setup (Details can be found on conda_env.info)

conda create  --name kaleidobert --file conda_env.info
conda activate kaleidobert
conda install tensorflow==1.15.0
pip install boto3 tqdm tensorflow_datasets --index-url=https://mirrors.aliyun.com/pypi/simple/
pip install sentencepiece==0.1.92 sklearn --index-url=https://mirrors.aliyun.com/pypi/simple/
pip install joblib==0.14.1
python setup.py develop

Download Pretrained Dependancy

cd Kaleido-BERT/scripts/checkpoint
sh get_checkpoint.sh

Finetune

#Download finetune datasets

cd Kaleido-BERT/scripts/dataset
sh get_finetune_dataset.sh
sh get_retrieve_dataset.sh

#Testing CAT/SUB

cd Kaleido-BERT/scripts
sh run_cat.sh
sh run_subcat.sh

#Testing TIR/ITR

cd Kaleido-BERT/scripts
sh run_i2t.sh
sh run_t2i.sh

Pre-training

#Download pre-training datasets

cd Kaleido-BERT/scripts/dataset
sh get_prtrain_dataset.sh

#Remove existed checkpoint
rm -rf Kaleido-BERT/checkpoint/pretrained

#Run pre-training
cd Kaleido-BERT/scripts/
sh run_pretrain.sh

Acknowlegement

Thanks Alibaba ICBU Search Team and Alibaba PAI Team for technical support.

Citing Kaleido-BERT

@inproceedings{Zhuge2021KaleidoBERT,
  title={Kaleido-BERT: Vision-Language Pre-training on Fashion Domain},
  author={Zhuge, Mingchen and Gao, Dehong and Fan, Deng-Ping and Jin, Linbo and Chen, Ben and Zhou, Haoming and Qiu, Minghui and Shao, Ling},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={},
  year={2021}
}

Contact

Mingchen Zhuge (email: [email protected] | wechat: tjpxiaoming)
Deng-Ping Fan (email: [email protected])
Dehong Gao (email: [email protected])

Feel free to contact us if you have additional questions.

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Related tags

Overview

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Introduction

Noted

Get started

Acknowlegement

Citing Kaleido-BERT

Contact

Owner

Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

TLXZoo - Pre-trained models based on TensorLayerX

Monitora la qualità della ricezione dei segnali radio nelle province siciliane.

The official code of "SCROLLS: Standardized CompaRison Over Long Language Sequences".

An implementation of a discriminant function over a normal distribution to help classify datasets.

PyTorch code to run synthetic experiments.

本步态识别系统主要基于GaitSet模型进行实现

Optimal Adaptive Allocation using Deep Reinforcement Learning in a Dose-Response Study

A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction.

Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

Second Order Optimization and Curvature Estimation with K-FAC in JAX.

Official implementation of "Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks", NeurIPS 2021.

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

基于Paddle框架的fcanet复现

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Repository to run object detection on a model trained on an autonomous driving dataset.

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

DivNoising is an unsupervised denoising method to generate diverse denoised samples for any noisy input image. This repository contains the code to reproduce the results reported in the paper https://openreview.net/pdf?id=agHLCOBM5jP

STRIVE: Scene Text Replacement In Videos