๐Ÿงฎ Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model after All

Overview

LDA4Rec

Project generated with PyScaffold

Accompanying source code to the paper "Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model After All" by Florian Wilhelm. The preprint can be found here along with the following statement:

"ยฉ Florian Wilhelm 2021. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version was published in RecSys '21: Fifteenth ACM Conference on Recommender Systems Proceedings, https://doi.org/10.1145/3460231.3474266."

Installation

In order to set up the necessary environment:

  1. review and uncomment what you need in environment.yml and create an environment lda4rec with the help of conda:
    conda env create -f environment.yml
    
  2. activate the new environment with:
    conda activate lda4rec
    
  3. (optionally) get a free neptune.ai account for experiment tracking and save the api token under ~/.neptune_api_token (default).

Running Experiments

First check out and adapt the default experiment config configs/default.yaml and run it with:

lda4rec -c configs/default.yaml run

A config like configs/default.yaml can also be used as a template to create an experiment set with:

lda4rec -c configs/default.yaml create -ds movielens-100k

using the Movielens-100k dataset. Check out cli.py for more details.

Cloud Setup

Commands for setting up an Ubuntu 20.10 VM with at least 20 GiB of HD on e.g. a GCP c2-standard-30 instance:

tmux
sudo apt-get install -y build-essential
curl https://sh.rustup.rs -sSf | sh
source $HOME/.cargo/env
cargo install pueue
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O
sh Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
git clone https://github.com/FlorianWilhelm/lda4rec.git
cd lda4rec
conda env create -f environment.yml
conda activate lda4rec
vim ~/.neptune_api_token # and copy it over

Then create and run all experiments for full control over parallelism with pueue:

pueued -d # only once to start the daemon
pueue parallel 10
export OMP_NUM_THREADS=4  # to limit then number of threads per model
lda4rec -c configs/default.yaml create # to create the config files
find ./configs -maxdepth 1 -name "exp_*.yaml" -exec pueue add "lda4rec -c {} run" \; -exec sleep 30 \;

Remark: -exec sleep 30 avoids race condition when reading datasets if parallelism is too high.

Dependency Management & Reproducibility

  1. Always keep your abstract (unpinned) dependencies updated in environment.yml and eventually in setup.cfg if you want to ship and install your package via pip later on.
  2. Create concrete dependencies as environment.lock.yml for the exact reproduction of your environment with:
    conda env export -n lda4rec -f environment.lock.yml
    For multi-OS development, consider using --no-builds during the export.
  3. Update your current environment with respect to a new environment.lock.yml using:
    conda env update -f environment.lock.yml --prune

Project Organization

โ”œโ”€โ”€ AUTHORS.md              <- List of developers and maintainers.
โ”œโ”€โ”€ CHANGELOG.md            <- Changelog to keep track of new features and fixes.
โ”œโ”€โ”€ LICENSE.txt             <- License as chosen on the command-line.
โ”œโ”€โ”€ README.md               <- The top-level README for developers.
โ”œโ”€โ”€ configs                 <- Directory for configurations of model & application.
โ”œโ”€โ”€ data                    <- Downloaded datasets will be stored here.
โ”œโ”€โ”€ docs                    <- Directory for Sphinx documentation in rst or md.
โ”œโ”€โ”€ environment.yml         <- The conda environment file for reproducibility.
โ”œโ”€โ”€ notebooks               <- Jupyter notebooks. Naming convention is a number (for
โ”‚                              ordering), the creator's initials and a description,
โ”‚                              e.g. `1.0-fw-initial-data-exploration`.
โ”œโ”€โ”€ logs                    <- Generated logs are collected here.
โ”œโ”€โ”€ results                 <- Results as exported from neptune.ai.
โ”œโ”€โ”€ setup.cfg               <- Declarative configuration of your project.
โ”œโ”€โ”€ setup.py                <- Use `python setup.py develop` to install for development or
โ”‚                              or create a distribution with `python setup.py bdist_wheel`.
โ”œโ”€โ”€ src
โ”‚   โ””โ”€โ”€ lda4rec             <- Actual Python package where the main functionality goes.
โ”œโ”€โ”€ tests                   <- Unit tests which can be run with `py.test`.
โ”œโ”€โ”€ .coveragerc             <- Configuration for coverage reports of unit tests.
โ”œโ”€โ”€ .isort.cfg              <- Configuration for git hook that sorts imports.
โ””โ”€โ”€ .pre-commit-config.yaml <- Configuration of pre-commit git hooks.

How to Cite

Please cite LDA4Rec if it helps your research. You can use the following BibTeX entry:

@inproceedings{wilhelm2021lda4rec,
author = {Wilhelm, Florian},
title = {Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint Latent Dirichlet Allocation Model After All},
year = {2021},
month = sep,
isbn = {978-1-4503-8458-2/21/09},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3460231.3474266},
doi = {10.1145/3460231.3474266},
booktitle = {Fifteenth ACM Conference on Recommender Systems},
numpages = {8},
location = {Amsterdam, Netherlands},
series = {RecSys '21}
}

License

This sourcecode is AGPL-3-only licensed. If you require a more permissive licence, e.g. for commercial reasons, contact me to obtain a licence for your business.

Acknowledgement

Special thanks goes to Du Phan and Fritz Obermeyer from the (Num)Pyro project for their kind help and helpful comments on my code.

Note

This project has been set up using PyScaffold 4.0 and the dsproject extension 0.6. Some source code was taken from Spotlight (MIT-licensed) by Maciej Kula as well as lrann (MIT-licensed) by Florian Wilhelm and Marcel Kurovski.

Owner
Florian Wilhelm
Data Scientist with a mathematical background.
Florian Wilhelm
A cool little repl-based simulation written in Python

A cool little repl-based simulation written in Python planned to integrate machine-learning into itself to have AI battle to the death before your eye

Em 6 Sep 17, 2022
This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

JigsawClustering Jigsaw Clustering for Unsupervised Visual Representation Learning Pengguang Chen, Shu Liu, Jiaya Jia Introduction This project provid

DV Lab 73 Sep 18, 2022
Fully convolutional deep neural network to remove transparent overlays from images

Fully convolutional deep neural network to remove transparent overlays from images

Marc Belmont 1.1k Jan 06, 2023
Official implementation of NeurIPS'2021 paper TransformerFusion

TransformerFusion: Monocular RGB Scene Reconstruction using Transformers Project Page | Paper | Video TransformerFusion: Monocular RGB Scene Reconstru

Aljaz Bozic 118 Dec 25, 2022
A fast Evolution Strategy implementation in Python

Evostra: Evolution Strategy for Python Evolution Strategy (ES) is an optimization technique based on ideas of adaptation and evolution. You can learn

Mika 251 Dec 08, 2022
Ontologysim: a Owlready2 library for applied production simulation

Ontologysim: a Owlready2 library for applied production simulation Ontologysim is an open-source deep production simulation framework, with an emphasi

10 Nov 30, 2022
Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

Johannes G. 24 Dec 07, 2022
Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.

TargetCLIP- official pytorch implementation of the paper Image-Based CLIP-Guided Essence Transfer This repository finds a global direction in StyleGAN

Hila Chefer 221 Dec 13, 2022
Creating multimodal multitask models

Fusion Brain Challenge The English version of the document can be found here. ะžะฑะฝะพะฒะปะตะฝะธั 01.11 ะœั‹ ะฒั‹ะบะปะฐะดั‹ะฒะฐะตะผ ะฟั€ะธะผะตั€ ะดะฐะฝะฝั‹ั…, ะฐะฝะฐะปะพะณะธั‡ะฝั‹ั… private test

Sber AI 43 Nov 28, 2022
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Rex Cheng 364 Jan 03, 2023
Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

41 Jan 03, 2023
๐Ÿš€ PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)"

PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)" Unofficial PyTorch Implementation of Progressi

Vitaliy Hramchenko 58 Dec 19, 2022
GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

GPU implementation of kNN and SNN GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors Supported by numba cuda and faiss library E

Hyeon Jeon 7 Nov 23, 2022
Keras implementation of Normalizer-Free Networks and SGD - Adaptive Gradient Clipping

Keras implementation of Normalizer-Free Networks and SGD - Adaptive Gradient Clipping

Yam Peleg 63 Sep 21, 2022
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper] Authors: Chenhang He, Ruihuang Li, Shuai Li, L

Billy HE 141 Dec 30, 2022
Info and sample codes for "NTU RGB+D Action Recognition Dataset"

"NTU RGB+D" Action Recognition Dataset "NTU RGB+D 120" Action Recognition Dataset "NTU RGB+D" is a large-scale dataset for human action recognition. I

Amir Shahroudy 578 Dec 30, 2022
Open source code for the paper of Neural Sparse Voxel Fields.

Neural Sparse Voxel Fields (NSVF) Project Page | Video | Paper | Data Photo-realistic free-viewpoint rendering of real-world scenes using classical co

Meta Research 647 Dec 27, 2022
An example to implement a new backbone with OpenMMLab framework.

Backbone example on OpenMMLab framework English | ็ฎ€ไฝ“ไธญๆ–‡ Introduction This is an template repo about how to use OpenMMLab framework to develop a new bac

Ma Zerun 22 Dec 29, 2022
Official implementation for the paper: "Multi-label Classification with Partial Annotations using Class-aware Selective Loss"

Multi-label Classification with Partial Annotations using Class-aware Selective Loss Paper | Pretrained models Official PyTorch Implementation Emanuel

99 Dec 27, 2022
Predicts an answer in yes or no.

Oui-ou-non-prediction Predicts an answer in 'yes' or 'no'. It is based on the game 'effeuiller la marguerite' in which the person plucks flower petals

Ananya Gupta 1 Jan 15, 2022