A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

Overview

collie

PyPI version versions Workflows Passing Documentation Status codecov license

Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Collie dog breed.

Collie offers a collection of simple APIs for preparing and splitting datasets, incorporating item metadata directly into a model architecture or loss, efficiently evaluating a model's performance on the GPU, and so much more. Above all else though, Collie is built with flexibility and customization in mind, allowing for faster prototyping and experimentation.

See the documentation for more details.

"We adopted 2 Border Collies a year ago and they are about 3 years old. They are completely obsessed with fetch and tennis balls and it's getting out of hand. They live in the fenced back yard and when anyone goes out there they instantly run around frantically looking for a tennis ball. If there is no ball they will just keep looking and will not let you pet them. When you do have a ball, they are 100% focused on it and will not notice anything else going on around them, like it's their whole world."

-- A Reddit thread on r/DogTraining

Installation

pip install collie

Through July 2021, this library used to be under the name collie_recs. While this version is still available on PyPI, it is no longer supported or maintained. All users of the library should use collie for the latest and greatest version of the code!

Quick Start

Implicit Data

Creating and evaluating a matrix factorization model with implicit MovieLens 100K data is simple with Collie:

Open In Colab

from collie.cross_validation import stratified_split
from collie.interactions import Interactions
from collie.metrics import auc, evaluate_in_batches, mapk, mrr
from collie.model import MatrixFactorizationModel, CollieTrainer
from collie.movielens import read_movielens_df
from collie.utils import convert_to_implicit


# read in explicit MovieLens 100K data
df = read_movielens_df()

# convert the data to implicit
df_imp = convert_to_implicit(df)

# store data as ``Interactions``
interactions = Interactions(users=df_imp['user_id'],
                            items=df_imp['item_id'],
                            allow_missing_ids=True)

# perform a data split
train, val = stratified_split(interactions)

# train an implicit ``MatrixFactorization`` model
model = MatrixFactorizationModel(train=train,
                                 val=val,
                                 embedding_dim=10,
                                 lr=1e-1,
                                 loss='adaptive',
                                 optimizer='adam')
trainer = CollieTrainer(model, max_epochs=10)
trainer.fit(model)
model.eval()

# evaluate the model
auc_score, mrr_score, mapk_score = evaluate_in_batches(metric_list=[auc, mrr, mapk],
                                                       test_interactions=val,
                                                       model=model)

print(f'AUC:          {auc_score}')
print(f'MRR:          {mrr_score}')
print(f'[email protected]:       {mapk_score}')

More complicated examples of implicit pipelines can be viewed for MovieLens 100K data here, in notebooks here, and documentation here.

Explicit Data

Collie also handles the situation when you instead have explicit data, such as star ratings. Note how similar the pipeline and APIs are compared to the implicit example above:

Open In Colab

from collie.cross_validation import stratified_split
from collie.interactions import ExplicitInteractions
from collie.metrics import explicit_evaluate_in_batches
from collie.model import MatrixFactorizationModel, CollieTrainer
from collie.movielens import read_movielens_df

from torchmetrics import MeanAbsoluteError, MeanSquaredError


# read in explicit MovieLens 100K data
df = read_movielens_df()

# store data as ``Interactions``
interactions = ExplicitInteractions(users=df['user_id'],
                                    items=df['item_id'],
                                    ratings=df['rating'])

# perform a data split
train, val = stratified_split(interactions)

# train an implicit ``MatrixFactorization`` model
model = MatrixFactorizationModel(train=train,
                                 val=val,
                                 embedding_dim=10,
                                 lr=1e-2,
                                 loss='mse',
                                 optimizer='adam')
trainer = CollieTrainer(model, max_epochs=10)
trainer.fit(model)
model.eval()

# evaluate the model
mae_score, mse_score = explicit_evaluate_in_batches(metric_list=[MeanAbsoluteError(),
                                                                 MeanSquaredError()],
                                                    test_interactions=val,
                                                    model=model)

print(f'MAE: {mae_score}')
print(f'MSE: {mse_score}')

Comparison With Other Open-Source Recommendation Libraries

On some smaller screens, you might have to scroll right to see the full table. ➡️

Aspect Included in Library Surprise LightFM FastAI Spotlight RecBole TensorFlow Recommenders Collie
Implicit data support for when we only know when a user interacts with an item or not, not the explicit rating the user gave the item
Explicit data support for when we know the explicit rating the user gave the item
Support for side-data incorporated directly into the models
Support a flexible framework for new model architectures and experimentation
Deep learning libraries utilizing speed-ups with a GPU and able to implement new, cutting-edge deep learning algorithms
Automatic support for multi-GPU training
Actively supported and maintained
Type annotations for classes, methods, and functions
Scalable for larger, out-of-memory datasets
Includes model zoo with two or more model architectures implemented
Includes implicit loss functions for training and metric functions for model evaluation
Includes adaptive loss functions for multiple negative examples
Includes loss functions with partial credit for side-data

The following table notes shows the results of an experiment training and evaluating recommendation models in some popular implicit recommendation model frameworks on a common MovieLens 10M dataset. The data was split via a 90/5/5 stratified data split. Each model was trained for a maximum of 40 epochs using an embedding dimension of 32. For each model, we used default hyperparameters (unless otherwise noted below).

Model [email protected] Score Notes
Randomly initialized, untrained model 0.0001
Logistic MF 0.0128 Using the CUDA implementation.
LightFM with BPR Loss 0.0180
ALS 0.0189 Using the CUDA implementation.
BPR 0.0301 Using the CUDA implementation.
Spotlight 0.0376 Using adaptive hinge loss.
LightFM with WARP Loss 0.0412
Collie MatrixFactorizationModel 0.0425 Using a separate SGD bias optimizer.

At ShopRunner, we have found Collie models outperform comparable LightFM models with up to 64% improved [email protected] scores.

Development

To run locally, begin by creating a data path environment variable:

# Define where on your local hard drive you want to store data. It is best if this
# location is not inside the repo itself. An example is below
export DATA_PATH=$HOME/data/collie

Run development from within the Docker container:

docker build -t collie .

# run the container in interactive mode, leaving port ``8888`` open for Jupyter
docker run \
    -it \
    --rm \
    -v "${DATA_PATH}:/collie/data/" \
    -v "${PWD}:/collie" \
    -p 8888:8888 \
    collie /bin/bash

Run on a GPU:

docker build -t collie .

# run the container in interactive mode, leaving port ``8888`` open for Jupyter
docker run \
    -it \
    --rm \
    --gpus all \
    -v "${DATA_PATH}:/collie/data/" \
    -v "${PWD}:/collie" \
    -p 8888:8888 \
    collie /bin/bash

Start JupyterLab

To run JupyterLab, start the container and execute the following:

jupyter lab --ip 0.0.0.0 --no-browser --allow-root

Connect to JupyterLab here: http://localhost:8888/lab

Unit Tests

Library unit tests in this repo are to be run in the Docker container:

# execute unit tests
pytest --cov-report term --cov=collie

Note that a handful of tests require the MovieLens 100K dataset to be downloaded (~5MB in size), meaning that either before or during test time, there will need to be an internet connection. This dataset only needs to be downloaded a single time for use in both unit tests and tutorials.

Docs

The Collie library supports Read the Docs documentation. To compile locally,

cd docs
make html

# open local docs
open build/html/index.html
PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

PyTorch Realtime Multi-Person Pose Estimation This is a pytorch version of Realtime_Multi-Person_Pose_Estimation, origin code is here Realtime_Multi-P

Dave Fang 157 Nov 12, 2022
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"

merlot_reserve Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound" MERLOT Reserve (in submission) is a mo

Rowan Zellers 92 Dec 11, 2022
KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, Angjoo Ka

Tomas Jakab 87 Nov 30, 2022
Multi-layer convolutional LSTM with Pytorch

Convolution_LSTM_pytorch Thanks for your attention. I haven't got time to maintain this repo for a long time. I recommend this repo which provides an

Zijie Zhuang 733 Dec 30, 2022
Active learning for Mask R-CNN in Detectron2

MaskAL - Active learning for Mask R-CNN in Detectron2 Summary MaskAL is an active learning framework that automatically selects the most-informative i

49 Dec 20, 2022
Automatic deep learning for image classification.

AutoDL AutoDL automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few line

wenqi 2 Oct 12, 2022
Ppq - A powerful offline neural network quantization tool with custimized IR

PPL Quantization Tool(PPL 量化工具) PPL Quantization Tool (PPQ) is a powerful offlin

605 Jan 03, 2023
Adaptation through prediction: multisensory active inference torque control

Adaptation through prediction: multisensory active inference torque control Submitted to IEEE Transactions on Cognitive and Developmental Systems Abst

Cristian Meo 1 Nov 07, 2022
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022
For IBM Quantum Challenge Africa 2021, 9 September (07:00 UTC) - 20 September (23:00 UTC).

IBM Quantum Challenge Africa 2021 To ensure Africa is able to apply quantum computing to solve problems relevant to the continent, the IBM Research La

Qiskit Community 48 Dec 25, 2022
PyTorch Implementation of DSB for Score Based Generative Modeling. Experiments managed using Hydra.

Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling This repository contains the implementation for the paper Diffusion

James Thornton 50 Jan 03, 2023
Scheduling BilinearRewards

Scheduling_BilinearRewards Requirement Python 3 =3.5 Structure main.py This file includes the main function. For getting the results in Figure 1, ple

junghun.kim 0 Nov 25, 2021
A project studying the influence of communication in multi-objective normal-form games

Communication in Multi-Objective Normal-Form Games This repo consists of five different types of agents that we have used in our study of communicatio

Willem Röpke 0 Dec 17, 2021
School of Artificial Intelligence at the Nanjing University (NJU)School of Artificial Intelligence at the Nanjing University (NJU)

F-Principle This is an exercise problem of the digital signal processing (DSP) course at School of Artificial Intelligence at the Nanjing University (

Thyrix 5 Nov 23, 2022
Code for the paper "How Attentive are Graph Attention Networks?"

How Attentive are Graph Attention Networks? This repository is the official implementation of How Attentive are Graph Attention Networks?. The PyTorch

175 Dec 29, 2022
Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

UncertaintyAwareCycleConsistency This repository provides the building blocks and the API for the work presented in the NeurIPS'21 paper Robustness vi

EML Tübingen 19 Dec 12, 2022
Model of an AI powered sign language interpreter.

TEXT AND SPEECH TO SIGN LANGUAGE. A web application which takes in text or live audio speech recording as input, converts and displays the relevant Si

Mark Gatere 4 Mar 30, 2022
Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

Learning What To Do by Simulating the Past This repository contains code that implements the Deep Reward Learning by Simulating the Past (Deep RSLP) a

Center for Human-Compatible AI 24 Aug 07, 2021
[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

Attention Helps CNN See Better: Hybrid Image Quality Assessment Network [CVPRW 2022] Code for Hybrid Image Quality Assessment Network [paper] [code] T

IIGROUP 49 Dec 11, 2022
RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

[Paper] [Хабр] [Model Card] [Colab] [Kaggle] RuDOLPH 🦌 🎄 ☃️ One Hyper-Modal Tr

Sber AI 230 Dec 31, 2022