Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

Last update: Sep 17, 2022

Overview

XLM-EMO: Multilingual Emotion Prediction in Social Media Text

Abstract

Detecting emotion in text allows social and computational scientists to study how people behave and react to online events. However, developing these tools for different languages requires data that is not always available. This paper collects the available emotion detection datasets across 19 languages. We train a multilingual emotion prediction model for social media data, XLM-EMO. The model shows competitive performance in a zero-shot setting, suggesting it is helpful in the context of low-resource languages. We release our model to the community so that interested researchers can directly use it.

See the paper for additional details:

Bianchi, F., Nozza, & D., Hovy. "XLM-EMO: Multilingual Emotion Prediction in Social Media Text". In Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (Forthcoming). Association for Computational Linguistics, 2022. Link.

Free software: MIT license

Installing

pip install -U xlm-emo

Important: If you want to use CUDA you need to install the correct version of the CUDA systems that matches your distribution, see PyTorch.

Features

from xlm_emo.classifier import  EmotionClassifier
ec = EmotionClassifier()

ec.predict(["senti testa di cazzo", "I am very happy"])

>> ["anger", "joy"]

Models

Model	Link	Macro F1 on Test Set
XLM-EMO-T	https://huggingface.co/MilaNLProc/xlm-emo-t	0.85
XLM-EMO-B	TBD	TBD
XLM-EMO-L	TBD	TBD

Reference

If you use this tool please cite the following paper:

@inproceedings{bianchi-etal-2022-xlmemo,
title = {{XLM-EMO}: Multilingual Emotion Prediction in Social Media Text},
author = "Bianchi, Federico and Nozza, Debora and Hovy, Dirk",
booktitle = "Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis",
year = "2022",
publisher = "Association for Computational Linguistics"
}

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

Related tags

Overview

XLM-EMO: Multilingual Emotion Prediction in Social Media Text

Abstract

Installing

Features

Models

Reference

Credits

Owner

MilaNLP

🏆 • 5050 most frequent words in 109 languages

profile tools for pytorch nn models

📝An easy-to-use package to restore punctuation of the text.

A python package for deep multilingual punctuation prediction.

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Natural Language Processing library built with AllenNLP 🌲🌱

Rhythm-Finder is a unsupervised ML driven python powered web-application that can find the songs that suits you.

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

Learning to Rewrite for Non-Autoregressive Neural Machine Translation

Python3 to Crystal Translation using Python AST Walker

Stanford CoreNLP provides a set of natural language analysis tools written in Java

Collection of scripts to pinpoint obfuscated code

Datasets of Automatic Keyphrase Extraction

ZUNIT - Toward Zero-Shot Unsupervised Image-to-Image Translation

Code for text augmentation method leveraging large-scale language models

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings