scAR (single-cell Ambient Remover) is a package for data denoising in single-cell omics.

Overview

scAR

scAR single-cell omics machine learning variational autoencoders denoising

scAR (single cell Ambient Remover) is a package for denoising multiple single cell omics data. It can be used for multiple tasks, such as, sgRNA assignment for scCRISPRseq, identity barcode assignment for cell indexing, protein denoising for CITE-seq, mRNA denoising for scRNAseq, and etc... It is built using probabilistic deep learning, illustrated as follows:

Table of Contents

Installation

Clone this repository,

$ git clone https://github.com/Novartis/scAR.git

Enter the cloned directory:

$ cd scAR

To install the dependencies, create a conda environment:

Please use scAR-gpu if you have an nvidia graphis card and the corresponging driver installed.

$ conda env create -f scAR-gpu.yml

or

Please use scAR-cpu if you don't have a graphis card availalble.

$ conda env create -f scAR-cpu.yml

To activate the scAR conda environment run:

$ conda activate scAR

Usage

There are two ways to run scAR.

  1. Use scAR API if you are Python users
>>> from scAR import model
>>> scarObj = model(adata.X.to_df(), empty_profile)
>>> scarObj.train()
>>> scarObj.inference()
>>> adata.layers["X_scAR_denoised"] = scarObj.native_counts
>>> adata.obsm["X_scAR_assignment"] = scarObj.feature_assignment  # feature assignment, e.g., sgRNAs, tags, and etc.. Only available in 'cropseq' mode

See the tutorials

  1. Run scAR from the command line
$ scar raw_count_matrix.pickle -t technology -e empty_profile.pickle -o output

raw_count_matrix.pickle, a pickle-formatted raw count matrix (MxN) with cells in rows and features in columns
empty_profile.pickle, a pickle-formatted feature frequencies (Nx1) in empty droplets
technology, a string, either 'scRNAseq' or 'CROPseq' or 'CITEseq'

Use scar --help command to see other optional arguments and parameters.

The output folder contains four (or five) files:

output
├── denoised_counts.pickle		# denoised count matrix
├── expected_noise_ratio.pickle	# estimated noise ratio
├── BayesFactor.pickle			# bayesian factor of ambient contamination
├── expected_native_freq.pickle	# estimated native frequencies
└── assignment.pickle			# feature assignment, e.g., sgRNAs, tags, and etc.. Gernerated under 'cropseq' mode

Dependencies

PyTorch 1.8 Python 3.8.6 torchvision 0.9.0 tqdm 4.62.3 scikit-learn 1.0.1

Resources

License

This project is licensed under the terms of License.
Copyright 2022 Novartis International AG.

Reference

If you use scAR in your research, please consider citing our manuscript,

@article {Sheng2022.01.14.476312,
	author = {Sheng, Caibin and Lopes, Rui and Li, Gang and Schuierer, Sven and Waldt, Annick and Cuttat, Rachel and Dimitrieva, Slavica and Kauffmann, Audrey and Durand, Eric and Galli, Giorgio G and Roma, Guglielmo and de Weck, Antoine},
	title = {Probabilistic modeling of ambient noise in single-cell omics data},
	elocation-id = {2022.01.14.476312},
	year = {2022},
	doi = {10.1101/2022.01.14.476312},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2022/01/14/2022.01.14.476312},
	eprint = {https://www.biorxiv.org/content/early/2022/01/14/2022.01.14.476312.full.pdf},
	journal = {bioRxiv}
}
Comments
  • Stochastic rounding to integers for downstream use in TotalVI/SCVI

    Stochastic rounding to integers for downstream use in TotalVI/SCVI

    Hi Caibin,

    I tried using scar's output as input for TotalVI/SCVI. As expected, those gave an error because the input is not integer anymore. I would suggest implementing stochastic rounding to integers as done in SoupX.

    Let me know if you're interested and I can find the time to implement it.

    Regards, Mikhael

    enhancement 
    opened by mdmanurung 9
  • BiocondaBot not triggered

    BiocondaBot not triggered

    Hi @fgypas , I made a new release v0.4.1 but bioconda somehow is not triggered upon the new release.

    In the new release, some codes related to building process have been refactored.

    • All information in setup.py (deleted) is integrated into setup.cfg.
    • An extra pyproject.toml file is added.

    I am wondering whether these affect the bioconda-recipes.

    Many thanks, Caibin

    opened by CaibinSh 7
  • New release

    New release

    Hi @fgypas ,

    I am making a new release. There are mainly three changes: 1) addition of a readthedocs; 2) code reformatting via black and pylint (pylint now can score >7, so I have increase the standard in the Action test from 0.5 to 6); 3) renaming 'scAR' to 'scar'.

    I have a couple of questions regarding whether these changes influence the bioconda recipe.

    • Will renaming package name (scAR) require modification in bioconda PR? All uppercase ('scAR') is changed to lowercase ('scar') in everywhere possible (inc. folder, environment, and etc.) But the repo name may stay as 'scAR' for a while, as renaming repo name requires permission from Nick.

    • Should we exclude the folder of datasets in the conda recipe? In addition, a folder, named 'datasets' contains >100 MBs data is added for the tutorial. Should we exclude it?

    question 
    opened by CaibinSh 3
  • Implementation in scvi-tools

    Implementation in scvi-tools

    Hi scAR team,

    I'm reaching out to gauge interest in having a mirror implementation in scvi-tools for scAR. Given the existing infrastructure in the scvi-tools repository, I was able to create a port of scAR quite easily as an external module. Of course, the implementation will link to this repository as the original and cites the paper in the docs. On top of that, the port would allow users of scvi-tools to use the pretrained scAR encoder for doublet detection using the solo model.

    Here's the pending pull request so you can check out what it would look like in the final implementation: https://github.com/scverse/scvi-tools/pull/1683

    Please let me know what you think!

    opened by ricomnl 2
  • Positive-valued denoising results for ADTs with raw 0 counts

    Positive-valued denoising results for ADTs with raw 0 counts

    Hi scar team!

    Thank you for developing this interesting package. I had a question about the resulting denoised values for CITE-seq experiments.

    I've noticed that some cells that originally have a 0 value for an ADT (as a raw count) will have a positive value (>0) for that ADT after the denoising procedure. Below, I show this case for the CD25 ADT in the 10xPBMC5k CITE-seq dataset (from the tutorial at https://scar-tutorials.readthedocs.io/en/latest/tutorials/scAR_tutorial_denoising_CITEseq.html).

    I'm a bit confused about how to best interpret these values and how they are occurring. Should these be set to 0 after the denoising procedure?

    Screen Shot 2022-05-25 at 1 16 37 AM question 
    opened by diegoalexespi 2
  • Sparsity values for mRNA decontamination?

    Sparsity values for mRNA decontamination?

    Hello,

    I was wondering what the recommendations for the sparsity value would be in denoising mRNA? Specifically if we don't know too much of the data besides UMI/nGenes in the cells etc.? I noticed its generally set at 1 for sgRNA decontamination, but what would the general recommended value be for mRNA?

    Thanks, Chang

    question 
    opened by cnk113 1
  • Number of training epochs + batch size

    Number of training epochs + batch size

    Dear scAR-Team,

    thank you for developing this package. I am currently exploring it and I would like to ask you

    1. how do you determine the number of epochs the user should use for feature_type = "mRNA"? In your tutorials you used 400 epochs and in your paper you mentioned that you fixed the epochs to 800. I applied it for various batch sizes (up to 1000) and noticed that the model is sensitive to it.

    2. I noticed that you use rather small batch-size - is scAR sensitive to the batch-size, it is just due to computational limitations or due to better perfromance?

    Thank you in advance!

    Best,

    question 
    opened by KalinNonchev 1
  • bump to version 0.3.2

    bump to version 0.3.2

    fix(*): changelog docs: adding docstring in documentation docs: adding Release notes in documentation docs: adding docstring in documentation test: adding semantic release refactor: further refactoring codes fix semantic release

    opened by CaibinSh 1
  • ask for permission of Webhooks

    ask for permission of Webhooks

    Hi @kliatsko ,

    We are currently refactoring and adding functionalities to scAR.

    Could you please grant the Webhooks permission for us to automate the documentation?

    Many thanks in advance. Best regards, Caibin on behalf of the scar team @fgypas @Tobias-Ternent @mr-nvs @AlexMTYZ.

    help wanted 
    opened by CaibinSh 1
  • New release

    New release

    • Additions of readthedocs
    • Code refactoring
    1. Renaming module names, e.g. changing "scAR" -> "scar"
    2. Renaming parameter names, e.g.

    changing "scRNAseq_tech" -> "feature_type" changing "model" -> "count_model" changing "scRNAseq_tech" -> "feature_type"

    • Black and Pylint re-formatting the code
    enhancement 
    opened by CaibinSh 1
  • Black github action

    Black github action

    Addition of black github action that runs on every push and every pull request. It shows in the stdout all the changes that need to be made (--diff), but returns exit code 0, even if errors are observed.

    opened by fgypas 1
Releases(v0.4.4)
  • v0.4.4(Aug 9, 2022)

    Documentation

    • Update dependency (03cf19e)
    • Update dependencies (9bd7f1c)
    • Update documentations (418996c)
    • Update dependencies (1bde351)
    • main: Add link to anndata and scanpy (8436e05)
    • main: Update dependencies (984df35)
    • main: Update documentation for .h5 file (2a309e0)
    • Add a link of binary installers (2faed3e)
    • Update documentations (e26a6e9)
    • Add competing methods (8564b2b)
    • scar: Add versionadded directives for parameter sparsity and round_to_int (33e35ca)
    • Update docs (a4da539)
    • Update introduction (a036b24)
    • Change readthedocs template (421e52f)
    • data_generator: Update docs (1f8f668)
    • data_generator: Re-style docs (afef9fb)
    • *: Re-style docs (2d550fa)

    Performance

    • main: Command line tool supports a new input: filtered_feature_bc_matrix.h5 (73bc13e)
    • setup: Add an error raise statement (f4fb1a8)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.3(Jun 15, 2022)

    Fix

    • setup: Fix a bug to allow sample reasonable numbers of droplets (ef6f7e4)
    • main: Fix a bug in main to set default NN number (794ff17)

    Documentation

    • main: Add scanpy as dependency (252a492)

    Performance

    • main: Set a separate batchsize_infer parameter for inference (8727f04)
    • setup: Add an option of random sampling droplets to speed up calculation (ce042dd)
    • setup: Enable manupulate large-scale emptydroplets (15f1840)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Jun 7, 2022)

  • v0.4.1(May 19, 2022)

    What's Changed

    Feature

    • inference: add a round_to_int parameter to round the counts (float) for easy interpretation and better integration into other methods (#47) (902a2b9) (8694239)

    Build

    • setup: replace setup.py with setup.cfg and pyproject.toml (#51) (3dc999a)

    Chore

    Documentation

    • readthedocs: add scAR_logo image (#51) (c34f362)
    • tutorials: add ci=None to speed up plotting (#51) (902a2b9)

    Contributor

    @CaibinSh and @mdmanurung

    Full Changelog: https://github.com/Novartis/scar/compare/v0.4.0...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(May 5, 2022)

  • v0.3.5(May 3, 2022)

  • v0.3.4(May 1, 2022)

  • v0.3.3(May 1, 2022)

  • v0.3.1(Apr 29, 2022)

  • v0.3.0(Apr 27, 2022)

    What's Changed

    Renaming module names, e.g. changing "scAR" -> "scar" Renaming parameter names, e.g.

    "scRNAseq_tech" -> "feature_type" "model" -> "count_model" "empty_profile" -> "ambient_profile" ...

    • Black and Pylint re-formatting the code
    • New release by @CaibinSh in https://github.com/Novartis/scAR/pull/26

    Contributor

    @CaibinSh @fgypas @mr-nvs @Tobias-Ternent

    Full Changelog: https://github.com/Novartis/scAR/compare/v0.2.3...v0.3.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.3(Apr 20, 2022)

    • Add integration test
    • Black formating
    • Bump version to 0.2.3

    Contributors: @fgypas , @mr-nvs and @CaibinSh

    What's Changed

    • Develop by @CaibinSh in https://github.com/Novartis/scAR/pull/19

    Full Changelog: https://github.com/Novartis/scAR/compare/v0.2.2...v0.2.3

    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Apr 4, 2022)

    v0.2.2

    • Remove torchaudio
    • Add test data for integration tests
    • Bump version to 0.2.2

    Contributors: @CaibinSh @fgypas

    What's Changed

    • Remove torchaudio, add test data and bump version to 0.2.2 by @fgypas in https://github.com/Novartis/scAR/pull/15

    Full Changelog: https://github.com/Novartis/scAR/compare/v0.2.1-beta...v0.2.2

    Source code(tar.gz)
    Source code(zip)
  • v0.2.1-beta(Apr 1, 2022)

    • fix a typo in scAR-gpu.yml
    • reorganise init.py files

    Contributor: @CaibinSh

    What's Changed

    • Develop by @CaibinSh in https://github.com/Novartis/scAR/pull/12

    Full Changelog: https://github.com/Novartis/scAR/compare/v0.2.0-beta...v0.2.1-beta

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0-beta(Apr 1, 2022)

    • Support for training of the model with CPUs
    • Addition of two yaml files for CPU/GPU installation
    • Refactor of setup.py and structure of the package
    • Addition of tests with pytest
    • Addition of lint checks
    • Automate build with github actions (install package and run lint checks and pytest)
    • Update documentation
    • Version 0.2.0

    Co-authored-by: @CaibinSh @mr-nvs @Tobias-Ternent @fgypas

    What's Changed

    • 0.2.0-release by @fgypas in https://github.com/Novartis/scAR/pull/11

    Full Changelog: https://github.com/Novartis/scAR/commits/v0.2.0-beta

    Source code(tar.gz)
    Source code(zip)
Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

Prompt-Tuning Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning" Currently, we support the following huggigface models: Bart

Andrew Zeng 36 Dec 19, 2022
Auto-Lama combines object detection and image inpainting to automate object removals

Auto-Lama Auto-Lama combines object detection and image inpainting to automate object removals. It is build on top of DE:TR from Facebook Research and

44 Dec 09, 2022
Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers.

Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers. It contains purchases, recurring

Ayodeji Yekeen 1 Jan 01, 2022
MLSpace: Hassle-free machine learning & deep learning development

MLSpace: Hassle-free machine learning & deep learning development

abhishek thakur 293 Jan 03, 2023
[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation

RCIL [CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation Chang-Bin Zhang1, Jia-Wen Xiao1, Xialei Liu1, Ying-Cong Chen2

Chang-Bin Zhang 71 Dec 28, 2022
Code for "Reconstructing 3D Human Pose by Watching Humans in the Mirror", CVPR 2021 oral

Reconstructing 3D Human Pose by Watching Humans in the Mirror Qi Fang*, Qing Shuai*, Junting Dong, Hujun Bao, Xiaowei Zhou CVPR 2021 Oral The videos a

ZJU3DV 178 Dec 13, 2022
CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

Fraunhofer SCAI 10 Oct 11, 2022
Colar: Effective and Efficient Online Action Detection by Consulting Exemplars, CVPR 2022.

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars This repository is the official implementation of Colar. In this work,

LeYang 246 Dec 13, 2022
Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

Gyeongsik Moon 29 Sep 24, 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers Authors: Jaemin Cho, Abhay Zala, and Mohit Bansal (

Jaemin Cho 98 Dec 15, 2022
Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

Consistent Depth of Moving Objects in Video This repository contains training code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in

Google 203 Jan 05, 2023
MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

Felix Wimbauer 494 Jan 06, 2023
Numba-accelerated Pythonic implementation of MPDATA with examples in Python, Julia and Matlab

PyMPDATA PyMPDATA is a high-performance Numba-accelerated Pythonic implementation of the MPDATA algorithm of Smolarkiewicz et al. used in geophysical

Atmospheric Cloud Simulation Group @ Jagiellonian University 15 Nov 23, 2022
[CVPR 2021] Exemplar-Based Open-Set Panoptic Segmentation Network (EOPSN)

EOPSN: Exemplar-Based Open-Set Panoptic Segmentation Network (CVPR 2021) PyTorch implementation for EOPSN. We propose open-set panoptic segmentation t

Jaedong Hwang 49 Dec 30, 2022
Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

TANG, shixiang 6 Nov 25, 2022
Good Semi-Supervised Learning That Requires a Bad GAN

Good Semi-Supervised Learning that Requires a Bad GAN This is the code we used in our paper Good Semi-supervised Learning that Requires a Bad GAN Ziha

Zhilin Yang 177 Dec 12, 2022
Computational inteligence project on faces in the wild dataset

Table of Contents The general idea How these scripts work? Loading data Needed modules and global variables Parsing the arrays in dataset Extracting a

tooraj taraz 4 Oct 21, 2022
A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

python_graphs This package is for computing graph representations of Python programs for machine learning applications. It includes the following modu

Google Research 258 Dec 29, 2022
The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection Updates | Introduction | Results | Usage | Citation |

33 Jan 05, 2023
Official repository for the paper F, B, Alpha Matting

FBA Matting Official repository for the paper F, B, Alpha Matting. This paper and project is under heavy revision for peer reviewed publication, and s

Marco Forte 404 Jan 05, 2023