This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Overview

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

1. install python environment.

Follow the instruction of "env_install.txt" to create python virtual environment and install necessary packages. The environment is tested on python >=3.6 and pytorch >=1.8.

2. Gloss alignment algorithm.

Change your dictionary data format into the data format of "wordnet_def.txt" in "data/". Run the following commands to get gloss alignment results.

cd run_align_definitions_main/
python ../model/align_definitions_main.py

3. Download the pretrained model and data.

Visit https://drive.google.com/drive/folders/1I5-iOfWr1E32ahYDCbHKCssMdm74_JXG?usp=sharing. Download the pretrained model (SemEq-General-Large which is based on Roberta-Large) and put it under run_robertaLarge_model_span_WSD_twoStageTune/ and also run_robertaLarge_model_span_FEWS_twoStageTune/. Please make sure that the downloaded model file name is "pretrained_model_CrossEntropy.pt". The script will load the general model and fine-tune on specific WSD datasets to get the expert model.

4. Fine-tune the general model to get an expert model (SemEq-Expert-Large).

All-words WSD:

cd run_robertaLarge_model_span_WSD_twoStageTune/
python ../BERT_model_span/BERT_model_main.py --gpu_id 0 --prepare_data True --eval_dataset WSD --exp_mode twoStageTune --optimizer AdamW --learning_rate 2e-6 --bert_model roberta_large --batch_size 16

Few-shot WSD (FEWS):

cd run_robertaLarge_model_span_FEWS_twoStageTune/
python ../BERT_model_span/BERT_model_main.py --gpu_id 0 --prepare_data True --eval_dataset FEWS --exp_mode twoStageTune --optimizer AdamW --learning_rate 5e-6 --bert_model roberta_large --batch_size 16

5. Evaluate results.

All-words WSD: (you can try different epochs)

cd run_robertaLarge_model_span_WSD_twoStageTune/
python ../evaluate/evaluate_WSD.py --loss CrossEntropy --epoch 1
python ../evaluate/evaluate_WSD_POS.py

Few-shot WSD (FEWS): (you can try different epochs)

cd run_robertaLarge_model_span_FEWS_twoStageTune/
python ../evaluate/evaluate_FEWS.py --loss CrossEntropy --epoch 1

Note that the best results of test set on few-shot setting or zero-shot setting are selected based on dev set across epochs, respectively.

Extra. Apply the trained model to any given sentences to do WSD.

After training, you can apply the trained model (trained_model_CrossEntropy.pt) to any sentences. Examples are included in data_custom/. Examples are based on glosses in WordNet3.0.

cd run_BERT_model_span_CustomData/
python ../BERT_model_span/BERT_model_main.py --gpu_id 0 --prepare_data True --eval_dataset custom_data --exp_mode eval --bert_model roberta_large --batch_size 16

If you think this repo is useful, please cite our work. Thanks!

@inproceedings{yao-etal-2021-connect,
    title = "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories",
    author = "Yao, Wenlin  and
      Pan, Xiaoman  and
      Jin, Lifeng  and
      Chen, Jianshu  and
      Yu, Dian  and
      Yu, Dong",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.610",
    pages = "7741--7751",
}

Disclaimer: This repo is only for research purpose. It is not an officially supported Tencent product.

Owner
Research repositories.
pix2pix in tensorflow.js

pix2pix in tensorflow.js This repo is moved to https://github.com/yining1023/pix2pix_tensorflowjs_lite See a live demo here: https://yining1023.github

Yining Shi 47 Oct 04, 2022
Zero-Cost Proxies for Lightweight NAS

Zero-Cost-NAS Companion code for the ICLR2021 paper: Zero-Cost Proxies for Lightweight NAS tl;dr A single minibatch of data is used to score neural ne

SamsungLabs 108 Dec 20, 2022
50-days-of-Statistics-for-Data-Science - This repository consist of a 50-day program

50-days-of-Statistics-for-Data-Science - This repository consist of a 50-day program. All the statistics required for the complete understanding of data science will be uploaded in this repository.

komal_lamba 22 Dec 09, 2022
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and ap

3.4k Jan 04, 2023
A general python framework for single object tracking in LiDAR point clouds, based on PyTorch Lightning.

Open3DSOT A general python framework for single object tracking in LiDAR point clouds, based on PyTorch Lightning. The official code release of BAT an

Kangel Zenn 172 Dec 23, 2022
[peer review] An Arbitrary Scale Super-Resolution Approach for 3D MR Images using Implicit Neural Representation

ArSSR This repository is the pytorch implementation of our manuscript "An Arbitrary Scale Super-Resolution Approach for 3-Dimensional Magnetic Resonan

Qing Wu 19 Dec 12, 2022
Image Fusion Transformer

Image-Fusion-Transformer Platform Python 3.7 Pytorch =1.0 Training Dataset MS-COCO 2014 (T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ram

Vibashan VS 68 Dec 23, 2022
PyTorch implementation of Deformable Convolution

PyTorch implementation of Deformable Convolution !!!Warning: There is some issues in this implementation and this repo is not maintained any more, ple

Wei Ouyang 893 Dec 18, 2022
Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

20 Oct 24, 2022
SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

SAT: 2D Semantics Assisted Training for 3D Visual Grounding SAT: 2D Semantics Assisted Training for 3D Visual Grounding by Zhengyuan Yang, Songyang Zh

Zhengyuan Yang 22 Nov 30, 2022
Dynamic View Synthesis from Dynamic Monocular Video

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer This repository contains code to compute depth from a

Intelligent Systems Lab Org 2.3k Jan 01, 2023
On Evaluation Metrics for Graph Generative Models

On Evaluation Metrics for Graph Generative Models Authors: Rylee Thompson, Boris Knyazev, Elahe Ghalebi, Jungtaek Kim, Graham Taylor This is the offic

13 Jan 07, 2023
Contrastive Language-Image Pretraining

CLIP [Blog] [Paper] [Model Card] [Colab] CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pair

OpenAI 11.5k Jan 08, 2023
Data reduction pipeline for KOALA on the AAT.

KOALA KOALA, the Kilofibre Optical AAT Lenslet Array, is a wide-field, high efficiency, integral field unit used by the AAOmega spectrograph on the 3.

4 Sep 26, 2022
Official re-implementation of the Calibrated Adversarial Refinement model described in the paper Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

Official re-implementation of the Calibrated Adversarial Refinement model described in the paper Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

Elias Kassapis 31 Nov 22, 2022
The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

A mini-scale reproduction code of the AlphaStar program. Note: the original AlphaStar is the AI proposed by DeepMind to play StarCraft II.

Ruo-Ze Liu 216 Jan 04, 2023
Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

[TCSVT] Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization LPN [Paper] NEWs Prerequisites Python 3.6 GPU Memory = 8G Numpy 1.

46 Dec 14, 2022
A convolutional recurrent neural network for classifying A/B phases in EEG signals recorded for sleep analysis.

CAP-Classification-CRNN A deep learning model based on Inception modules paired with gated recurrent units (GRU) for the classification of CAP phases

Apurva R. Umredkar 2 Nov 25, 2022
A PyTorch implementation: "LASAFT-Net-v2: Listen, Attend and Separate by Attentively aggregating Frequency Transformation"

LASAFT-Net-v2 Listen, Attend and Separate by Attentively aggregating Frequency Transformation Woosung Choi, Yeong-Seok Jeong, Jinsung Kim, Jaehwa Chun

Woosung Choi 29 Jun 04, 2022
Python implementation of Bayesian optimization over permutation spaces.

Bayesian Optimization over Permutation Spaces This repository contains the source code and the resources related to the paper "Bayesian Optimization o

Aryan Deshwal 9 Dec 23, 2022