CVNets: A library for training computer vision networks

This repository contains the source code for training computer vision models. Specifically, it contains the source code of the MobileViT paper for the following tasks:

Image classification on the ImageNet dataset
Object detection using SSD
Semantic segmentation using Deeplabv3

Note: Any image classification backbone can be used with object detection and semantic segmentation models

Training can be done with two samplers:

Standard distributed sampler
Mulit-scale distributed sampler

We recommend to use multi-scale sampler as it improves generalization capability and leads to better performance. See MobileViT for details.

Installation

CVNets can be installed in the local python environment using the below command:

    git clone [email protected]:apple/ml-cvnets.git
    cd ml-cvnets
    pip install -r requirements.txt
    pip install --editable .

We recommend to use Python 3.6+ and PyTorch (version >= v1.8.0) with conda environment. For setting-up python environment with conda, see here.

Getting Started

General instructions for training and evaluation different models are given here.
Examples for a training and evaluating a specific model are provided in the examples folder. Right now, we support following models.
For converting PyTorch models to CoreML, see README-pytorch-to-coreml.md.

Citation

If you find our work useful, please cite the following paper:

@article{mehta2021mobilevit,
  title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer},
  author={Mehta, Sachin and Rastegari, Mohammad},
  journal={arXiv preprint arXiv:2110.02178},
  year={2021}
}

CVNets: A library for training computer vision networks

Related tags

Overview

CVNets: A library for training computer vision networks

Installation

Getting Started

Citation

Owner

Apple

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

MvtecAD unsupervised Anomaly Detection

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Qlib is an AI-oriented quantitative investment platform

This repo is customed for VisDrone.

Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper

GrabGpu_py: a scripts for grab gpu when gpu is free

This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Automated detection of anomalous exoplanet transits in light curve data.

torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations

A fast Protein Chain / Ligand Extractor and organizer.

OpenVisionAPI server

Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"

DeceFL: A Principled Decentralized Federated Learning Framework

torchlm is aims to build a high level pipeline for face landmarks detection, it supports training, evaluating, exporting, inference(Python/C++) and 100+ data augmentations

Multi-Scale Geometric Consistency Guided Multi-View Stereo

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Fully Automatic Page Turning on Real Scores

Code for the IJCAI 2021 paper "Structure Guided Lane Detection"