Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Last update: Dec 16, 2022

Overview

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention

ACL2021 Findings

Usage

0. Prepare environment

Requirements:

python==3.6
tensorflow-gpu==1.13.1
scipy==1.5.2
scikit-learn==0.23.2

1. Prepare data

Download preprocessed datasets from Google Drive and extract files to the path ./data.

2. Run the model

python main.py --data_dir ./data/{dataset} --output_dir ./output

3. Evaluation

topic coherence: coherence score.

topic diversity:

python utils/TU.py --data_path {path of topic word file}

Citation

If you are interested in our work, please cite as

@inproceedings{wu2021discovering,
    title = "Discovering Topics in Long-tailed Corpora with Causal Intervention",
    author = "Wu, Xiaobao  and
    Li, Chunping  and
    Miao, Yishu",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.15",
    doi = "10.18653/v1/2021.findings-acl.15",
    pages = "175--185",
}

Other related works

EMNLP2020 Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder

NLPCC2020 Learning Multilingual Topics with Neural Variational Inference

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Related tags

Overview

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention

Usage

0. Prepare environment

1. Prepare data

2. Run the model

3. Evaluation

Citation

Other related works

Owner

Xiaobao Wu

✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

多语言降噪预训练模型MBart的中文生成任务

Knowledge Graph,Question Answering System，基于知识图谱和向量检索的医疗诊断问答系统

ChatBotProyect - This is an unfinished project about a simple chatbot.

Fuzzy String Matching in Python

Automatic privilege escalation for misconfigured capabilities, sudo and suid binaries

List of GSoC organisations with number of times they have been selected.

Baseline code for Korean open domain question answering(ODQA)

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Tokenizer - Module python d'analyse syntaxique et de grammaire, tokenization

A python project made to generate code using either OpenAI's codex or GPT-J (Although not as good as codex)

Задания КЕГЭ по информатике 2021 на Python

Translates basic English sentences into the Huna language (hoo-NAH)

Simple python code to fix your combo list by removing any text after a separator or removing duplicate combos

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

Tools for curating biomedical training data for large-scale language modeling