Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Last update: Apr 14, 2022

Related tags

Text Data & NLP AppleLM

Overview

AppleLM

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles (TASLP 2022)

Setup

This implementation is based on Transformers.

Preparation

Download GLUE datasets

The datasets can be downloaded automatically. Please refer to https://github.com/nyu-mll/GLUE-baselines

git clone https://github.com/nyu-mll/GLUE-baselines.git
python download_glue_data.py --data_dir glue_data --tasks all

It is recommended to put the folder glue_data to data/. The architecture looks like:

AppleLM
└───data
│   └───glue_data
│       │   CoLA/
│       │   MRPC/
│       │   ...

Visual Features

Pre-extracted visual features can be downloaded from Google Drive borrowed from the repo Multi30K.

The features are used in image embedding layer for indexing. Extract train-resnet50-avgpool.npy and put it in the data/ folder.

Training & Evaluate

export GLUE_DIR=data/glue_data/
export CUDA_VISIBLE_DEVICES="0"
export TASK_NAME=CoLA
python ./examples/run_glue_visual-tfidf_att.py \
    --model_type bert \
    --model_name_or_path bert-large-uncased-whole-word-masking \
    --task_name $TASK_NAME \
    --do_eval \
    --do_lower_case \
    --data_dir $GLUE_DIR/$TASK_NAME \
    --max_seq_length 128 \
    --per_gpu_eval_batch_size=32   \
    --per_gpu_train_batch_size=16   \
    --learning_rate 1e-5 \
    --eval_all_checkpoints \
    --save_steps 500 \
    --max_steps 5336 \
    --warmup_steps 320 \
    --image_dir data/train.lc.norm.tok.en \
    --image_embedding_file data/train-resnet50-avgpool.npy \
    --num_img 3 \
    --tfidf 5 \
    --image_merge att-gate \
    --stopwords_dir data/stopwords-en.txt \
    --output_dir experiments/CoLA_bert_wwm

Reference

Please kindly cite this paper in your publications if it helps your research:

@ARTICLE{zhang2022which,
  author={Zhang, Zhuosheng and Yu, Haojie and Zhao, Hai and Utiyama, Masao},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Which Apple Keeps Which Doctor Away? Colorful Word Representations With Visual Oracles}, 
  year={2022},
  volume={30},
  number={},
  pages={49-59},
  doi={10.1109/TASLP.2021.3130972}
}

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Related tags

Overview

AppleLM

Setup

Preparation

Training & Evaluate

Reference

Owner

Zhuosheng Zhang

FB ID CLONER WUTHOT CHECKPOINT, FACEBOOK ID CLONE FROM FILE

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Nateve compiler developed with python.

Code for the paper "Language Models are Unsupervised Multitask Learners"

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

LUKE -- Language Understanding with Knowledge-based Embeddings

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

The SVO-Probes Dataset for Verb Understanding

Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Blender addon - Scrub timeline from viewport with a shortcut

NLP topic mdel LDA - Gathered from New York Times website

Sentence Embeddings with BERT & XLNet

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

EMNLP 2021 paper "Pre-train or Annotate? Domain Adaptation with a Constrained Budget".

Code examples for my Write Better Python Code series on YouTube.

Non-Autoregressive Predictive Coding

A simple word search made in python

This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

Stanford CoreNLP provides a set of natural language analysis tools written in Java