Package for extracting emotions from social media text. Tailored for financial data.

Last update: Nov 17, 2022

Overview

EmTract: Extracting Emotions from Social Media Text Tailored for Financial Contexts

EmTract is a tool that extracts emotions from social media text. It incorporates key aspects of social media data (e.g., non-standard phrases, emojis and emoticons), and uses cutting edge natural language processing (NLP) techniques to learn latent representations, such as word order, word usage, and local context, to predict the emotions.

Details on the model and text processing are in the appendix of EmTract: Investor Emotions and Market Behavior.

User Guide

Installation

Before being able to use the package python3 must be installed. We also recommend using a virtual environment so that the tool runs with the same dependencies with which it was developed. Instruction on how to set up a virtual environment can be found here.

Once basic requirements are setup, follow these instructions:

Clone the repository: git clone https://github.com/dvamossy/EmTract.git
Navigate into repository: cd EmTract
(Optional) Create and activate virtual environment:
```
python3 -m venv venv
source venv/bin/activate
```
Run ./install.sh. This will install python requirements and also download our model files

Usage

Our package should be run with the following command:

python3 -m emtract.inference [args]

Where args are the following:

--model_type: can be twitter or stocktwits. Default is stocktwits
--interactive: Run in interactive mode
--input_file/-i: input to use for predictions (only for non interactive mode)
--output_file/-o: output location for predictions(only for non interactive mode)

Output

For each input (i.e., text), EmTract outputs probabilities (they sum to 1!) corresponding to seven emotional states: neutral, happy, sad, anger, disgust, surprise, fear. It also labels the text by computing the argmax of the probabilities.

Modes

Our tool can be run in 2 execution modes.

Interactive mode allows the user to input a tweet and evaluate it in real time. This is great for exploratory analysis.

python3 -m emtract.inference --interactive

The other mode is intended for automating predictions. Here an input file must be specified that will be used as the prediction input. This file must be a csv or text file with 1 column. This column should have the messages/text to predict with.

python3 -m emtract.inference -i tweets_example.csv -o predictions.csv

Model Types

Our models leverage GloVe Embeddings with Bidirectional GRU architecture.

We trained our emotion models with 2 different data sources. One from Twitter, and another from StockTwits. The Twitter training data comes from here; it is available at data/twitter_emotion.csv. The StockTwits training data is explained in the paper.

One of the key concerns using emotion packages is that it is unknown how well they transfer to financial text data. We alleviate this concern by hand-tagging 10,000 StockTwits messages. These are available at data/hand_tagged_sample.parquet.snappy; they were not included during training any of our models. We use this for testing model performance, and alternative emotion packages (notebooks/Alternative Packages.ipynb).

We found our StockTwits model to perform best on the hand-tagged sample, and therefore it is used as the default for predictions.

Alternative Models

We also have an implementation of DistilBERT in notebooks/Alternative Models.ipynb on the Twitter data; which can be easily extended to any other state-of-the-art models. We find marginal performance gains on the hand-tagged sample, which comes at the cost of far slower inference.

Citation

If you use EmTract in your research, please cite us as follows:

Domonkos Vamossy and Rolf Skog. EmTract: Investor Emotions and Market Behavior https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3975884, 2021.

Contributing and Feedback

This project welcomes contributions and suggestions.

Our goal is to provide a unified framework for extracting emotions from financial social media text. Particularly useful for research on emotions in financial contexts would be labeling financial social media text. We plan to upload sample text upon request.

Package for extracting emotions from social media text. Tailored for financial data.

Related tags

Overview

EmTract: Extracting Emotions from Social Media Text Tailored for Financial Contexts

User Guide

Installation

Usage

Output

Modes

Model Types

Alternative Models

Citation

Contributing and Feedback

Owner

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .

A crossplatform menu bar application using mpv as DLNA Media Renderer.

[NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Consumer Fairness in Recommender Systems: Contextualizing Definitions and Mitigations

Code for "Diffusion is All You Need for Learning on Surfaces"

MODNet: Trimap-Free Portrait Matting in Real Time

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Robot Reinforcement Learning on the Constraint Manifold

Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Code accompanying "Evolving spiking neuron cellular automata and networks to emulate in vitro neuronal activity," accepted to IEEE SSCI ICES 2021

An implementation of a discriminant function over a normal distribution to help classify datasets.

Implementation of Barlow Twins paper

A state-of-the-art semi-supervised method for image recognition

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classiﬁer')

Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet

High-Resolution Image Synthesis with Latent Diffusion Models

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Package for extracting emotions from social media text. Tailored for financial data.

Related tags

Overview

EmTract: Extracting Emotions from Social Media Text Tailored for Financial Contexts

User Guide

Installation

Usage

Output

Modes

Model Types

Alternative Models

Citation

Contributing and Feedback

Owner

The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

A crossplatform menu bar application using mpv as DLNA Media Renderer.

[NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Consumer Fairness in Recommender Systems: Contextualizing Definitions and Mitigations

Code for "Diffusion is All You Need for Learning on Surfaces"

MODNet: Trimap-Free Portrait Matting in Real Time

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Robot Reinforcement Learning on the Constraint Manifold

Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Code accompanying "Evolving spiking neuron cellular automata and networks to emulate in vitro neuronal activity," accepted to IEEE SSCI ICES 2021

An implementation of a discriminant function over a normal distribution to help classify datasets.

Implementation of Barlow Twins paper

A state-of-the-art semi-supervised method for image recognition

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classiﬁer')

Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet

High-Resolution Image Synthesis with Latent Diffusion Models

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .