The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

Overview

README

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

Introduction

We propose a hierarchical core-fringe learning framework to measure fine-grained domain relevance of terms – the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., deep learning) domain.

image-20210528201234901

Requirements

See requirements.txt

To install torch_geometric, please follow the instruction on pytorch_geometric

Reproduction

To reproduce the results in the paper (using word2vec embeddings)

Download data from Google Drive, unzip and put all the folders in the root directory of this repo (details about data are described below)

For broad domains (e.g., CS)

python run.py --domain cs --method cfl

For narrow domains (e.g., ML)

python run.py --domain cs --method hicfl --narrow

For narrow domains (PU setting) (e.g., ML)

python run.py --domain cs --method hicfl --narrow --pu

All experiments are run on an NVIDIA Quadro RTX 5000 with 16GB of memory under the PyTorch framework. The training of CFL for the CS domain can finish in 1 minute.

Query

To handle user query (using compositional GloVe embeddings as an example)

Download data from Google Drive, unzip and put all the folders in the root directory of this repo

Download GloVe embeddings from https://nlp.stanford.edu/projects/glove/, save the file to features/glove.6B.100d.txt

Example:

python query.py --domain cs --method cfl

The first run will train a model and save the model to model/. For the follow-up queries, the trained model can be loaded for prediction.

You can use the model either in a transductive or in an inductive setting (i.e., whether to include the query terms in training).

Options

You can check out the other options available using:

python run.py --help

Data

Data can be downloaded from Google Drive:

term-candidates/: list of seed terms. Format: term frequency

features/: features of terms (term embeddings trained by word2vec). To use compositional GloVe embeddings as features, you can download GloVe embeddings from https://nlp.stanford.edu/projects/glove/. To load the features, refer to utils.py for more details.

wikipedia/: Wikipedia search results for constructing the core-anchored semantic graph / automatic annotation

  • core-categories/: categories of core terms collected from Wikipedia. Format: term catogory ... category

  • gold-subcategories/: gold-subcategories for each domain collected from Wikipedia. Format: level#Category

  • ranking-results/: Wikipedia search results. 0 means using exact match, 1 means without exact match. Format: term result_1 ... result_k.

    The results are collected by the following script:

    # https://pypi.org/project/wikipedia/
    import wikipedia
    def get_wiki_search_result(term, mode=0):
        if mode==0:
            return wikipedia.search(f"\"{term}\"")
        else:
            return wikipedia.search(term)

train-valid-test/: train/valid/test split for evaluation with core terms

manual-data/:

  • ml2000-test.csv: manually created test set for ML
  • domain-relevance-comparison-pairs.csv: manually created test set for domain relevance comparison

Term lists

Several term lists with domain relevance scores produced by CFL/HiCFL are available on term-lists/

Format:

term  domain relevance score  core/fringe

Sample results for Machine Learning:

image-20210528201345177

Citation

The details of this repo are described in the following paper. If you find this repo useful, please kindly cite it:

@inproceedings{huang2021measuring,
  title={Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach},
  author={Huang, Jie and Chang, Kevin Chen-Chuan and Xiong, Jinjun and Hwu, Wen-mei},
  booktitle={Proceedings of ACL-IJCNLP},
  year={2021}
}
Owner
Jie Huang
Jie Huang
Repository for Driving Style Recognition algorithms for Autonomous Vehicles

Driving Style Recognition Using Interval Type-2 Fuzzy Inference System and Multiple Experts Decision Making Created by Iago Pachêco Gomes at USP - ICM

Iago Gomes 9 Nov 28, 2022
Real-Time High-Resolution Background Matting

Real-Time High-Resolution Background Matting Official repository for the paper Real-Time High-Resolution Background Matting. Our model requires captur

Peter Lin 6.1k Jan 03, 2023
Unofficial Implementation of RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series (AAAI 2019)

RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series (AAAI 2019) This repository contains python (3.5.2) implementation of

Doyup Lee 222 Dec 21, 2022
Differential rendering based motion capture blender project.

TraceArmature Summary TraceArmature is currently a set of python scripts that allow for high fidelity motion capture through the use of AI pose estima

William Rodriguez 4 May 27, 2022
Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.

COResets and Data Subset selection Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order

decile-team 244 Jan 09, 2023
Image Captioning on google cloud platform based on iot

Image-Captioning-on-google-cloud-platform-based-on-iot - Image Captioning on google cloud platform based on iot

Shweta_kumawat 1 Jan 20, 2022
An efficient PyTorch library for Global Wheat Detection using YOLOv5. The project is based on this Kaggle competition Global Wheat Detection (2021).

Global-Wheat-Detection An efficient PyTorch library for Global Wheat Detection using YOLOv5. The project is based on this Kaggle competition Global Wh

Chuxin Wang 11 Sep 25, 2022
tf2-keras implement yolov5

YOLOv5 in tesnorflow2.x-keras yolov5数据增强jupyter示例 Bilibili视频讲解地址: 《yolov5 解读,训练,复现》 Bilibili视频讲解PPT文件: yolov5_bilibili_talk_ppt.pdf Bilibili视频讲解PPT文件:

yangcheng 254 Jan 08, 2023
PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

PaddlePaddle Vision Transformers State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 🤖 PaddlePaddle Visual Transformers (PaddleViT or

1k Dec 28, 2022
The Codebase for Causal Distillation for Language Models.

Causal Distillation for Language Models Zhengxuan Wu*,Atticus Geiger*, Josh Rozner, Elisa Kreiss, Hanson Lu, Thomas Icard, Christopher Potts, Noah D.

Zen 20 Dec 31, 2022
A model to classify a piece of news as REAL or FAKE

Fake_news_classification A model to classify a piece of news as REAL or FAKE. This python project of detecting fake news deals with fake and real news

Gokul Stark 1 Jan 29, 2022
A python script to convert images to animated sus among us crewmate twerk jifs as seen on r/196

img_sussifier A python script to convert images to animated sus among us crewmate twerk jifs as seen on r/196 Examples How to use install python pip i

41 Sep 30, 2022
Code for the paper "Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness"

DU-VAE This is the pytorch implementation of the paper "Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness" Acknowledgement

Dazhong Shen 4 Oct 19, 2022
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

[ICCV2021] TransReID: Transformer-based Object Re-Identification [pdf] The official repository for TransReID: Transformer-based Object Re-Identificati

DamoCV 569 Dec 30, 2022
This repository contains the code used to quantitatively evaluate counterfactual examples in the associated paper.

On Quantitative Evaluations of Counterfactuals Install To install required packages with conda, run the following command: conda env create -f requi

Frederik Hvilshøj 1 Jan 16, 2022
The NEOSSat is a dual-mission microsatellite designed to detect potentially hazardous Earth-orbit-crossing asteroids and track objects that reside in deep space

The NEOSSat is a dual-mission microsatellite designed to detect potentially hazardous Earth-orbit-crossing asteroids and track objects that reside in deep space

John Salib 2 Jan 30, 2022
BboxToolkit is a tiny library of special bounding boxes.

BboxToolkit is a light codebase collecting some practical functions for the special-shape detection, such as oriented detection

jbwang1997 73 Jan 01, 2023
This is a simple backtesting framework to help you test your crypto currency trading. It includes a way to download and store historical crypto data and to execute a trading strategy.

You can use this simple crypto backtesting script to ensure your trading strategy is successful Minimal setup required and works well with static TP a

Andrei 154 Sep 12, 2022
Indonesian Car License Plate Character Recognition using Tensorflow, Keras and OpenCV.

Monopol Indonesian Car License Plate (Indonesia Mobil Nomor Polisi) Character Recognition using Tensorflow, Keras and OpenCV. Background This applicat

Jayaku Briliantio 3 Apr 07, 2022
AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention. AdaNet buil

3.4k Jan 07, 2023