PASSL包含 SimCLR,MoCo,BYOL,CLIP等基于对比学习的图像自监督算法以及 Vision-Transformer,Swin-Transformer,BEiT,CVT,T2T,MLP_Mixer等视觉Transformer算法

Overview

PASSL

Introduction

PASSL is a Paddle based vision library for state-of-the-art Self-Supervised Learning research with PaddlePaddle. PASSL aims to accelerate research cycle in self-supervised learning: from designing a new self-supervised task to evaluating the learned representations.

  • Reproducible implementation of SOTA in Self-Supervision: Existing SOTA in Self-Supervision are implemented - SimCLR, MoCo(v1),MoCo(v2), MoCo-BYOL, CLIP. BYOL is coming soon. Also supports supervised trainings.
  • Modular: Easy to build new tasks and reuse the existing components from other tasks (Trainer, models and heads, data transforms, etc.).

Installation

Implemented Models

Benchmark Linear Image Classification on ImageNet-1K

epochs official results passl results Backbone Model
MoCo 200 60.6 60.64 ResNet-50 download
SimCLR 100 64.5 65.3 ResNet-50 download
MoCo v2 200 67.7 67.72 ResNet-50 download
MoCo-BYOL 300 71.56 72.10 ResNet-50 download
BYOL 300 72.50 71.62 ResNet-50 download

Getting Started

Please see GETTING_STARTED.md for the basic usage of PASSL.

Tutorials

Comments
  • MLP-Mixer: An all-MLP Architecture for Vision

    MLP-Mixer: An all-MLP Architecture for Vision

    readme文件里的两个模型的TOP1 是不是写反了?模型大的准确度比模型小的准确度小一些?

    Arch | Weight | Top-1 Acc | Top-5 Acc | Crop ratio | # Params -- | -- | -- | -- | -- | -- mlp_mixer_b16_224 | pretrain 1k | 76.60 | 92.23 | 0.875 | 60.0M mlp_mixer_l16_224 | pretrain 1k | 72.06 | 87.67 | 0.875 | 208.2M

    opened by gaorui999 3
  • 我很关注图像分类的自监督进展

    我很关注图像分类的自监督进展

    小弟想问问,对于图像分类的自监督,目前是什么进展呢?比如猫狗分类这种典型的二分类准确率如何?imagenet1k分类准确率如何?PASSL里面的关于图像分类的自监督算法或者模型,有哪些?能给个例子,让我知道如何使用吗?目前看到PASSLissues才1条,文档完全没看到.方便加个微信或者QQ聊几句吗?小弟对于图像分类的自监督高度重视.还有一个疑问,关于图像分类的自监督模型,是不是我给一堆图片,模型运行后,就会把图片归类呢?我需不需要给出类别的数量呢?说白了,我想知道图像分类的自监督的一个使用流程.现在都1.0了,该有点用处了吧.如果一个模型运行后,图像就分好类了,归纳为N类,我有什么办法判断分类的正确性呢?这方面有算法吗? 提了很多问题,跪求每个问题都回答一下,谢谢大佬.

    opened by yuwoyizhan 2
  • Unintended behavior in clip_logit_scale

    Unintended behavior in clip_logit_scale

    https://github.com/PaddlePaddle/PASSL/blob/83c49e6a5ba3444cee7f054122559d7759152764/passl/modeling/backbones/clip.py#L317

    check this issue for reference https://github.com/PaddlePaddle/Paddle/issues/43710

    Suggested approach (with non-public API)

    logit_scale_buffer = self.logit_scale.clip(-4.6, 4.6)
    logit_scale_buffer._share_buffer_to(self.logit_scale)
    
    opened by minogame 1
  • 建议

    建议

    1.passl很多文字都是英文的,包括快速使用等文档,希望可以提供中文文档. 2.希望知道图像分类自监督学习的技术研究目前到达什么程度了.比如猫狗这种二分类准确率如何,imagenet准确率如何,使用passl进行图像分类,需要给类别总数量吗? 3.能加个QQ或者微信聊几句吗?有些疑问,拜托了,大佬. QQ:1226194560 微信:18820785964

    opened by yuwoyizhan 1
  • fix bug of mixup for DeiT

    fix bug of mixup for DeiT

    DeiT/B-16 pretrained on ImageNet1K:

    [01/21 02:54:46] passl.engine.trainer INFO: Validate Epoch [290] acc1 (81.336), acc5 (95.544)
    [01/21 03:02:31] passl.engine.trainer INFO: Validate Epoch [291] acc1 (81.328), acc5 (95.580)
    [01/21 03:10:20] passl.engine.trainer INFO: Validate Epoch [292] acc1 (81.390), acc5 (95.608)
    [01/21 03:18:10] passl.engine.trainer INFO: Validate Epoch [293] acc1 (81.484), acc5 (95.636)
    [01/21 03:26:00] passl.engine.trainer INFO: Validate Epoch [294] acc1 (81.452), acc5 (95.600)
    [01/21 03:33:52] passl.engine.trainer INFO: Validate Epoch [295] acc1 (81.354), acc5 (95.528)
    [01/21 03:41:38] passl.engine.trainer INFO: Validate Epoch [296] acc1 (81.338), acc5 (95.562)
    [01/21 03:49:25] passl.engine.trainer INFO: Validate Epoch [297] acc1 (81.344), acc5 (95.542)
    [01/21 03:57:15] passl.engine.trainer INFO: Validate Epoch [298] acc1 (81.476), acc5 (95.550)
    [01/21 04:05:03] passl.engine.trainer INFO: Validate Epoch [299] acc1 (81.476), acc5 (95.572)
    [01/21 04:12:51] passl.engine.trainer INFO: Validate Epoch [300] acc1 (81.386), acc5 (95.536)
    
    opened by GuoxiaWang 1
  • BYOL的预训练中好像使用了gt_label?

    BYOL的预训练中好像使用了gt_label?

    • 在byol的config 中设置了 num_classes=1000: https://github.com/PaddlePaddle/PASSL/blob/9d7a9fd4af41772e29120553dddab1c162e4cb70/configs/byol/byol_r50_IM.yaml#L34
    • 在model中设置了self.classifier = nn.Linear(embedding_dim, num_classes),并且forward中将classif_out和label一起传给了head

    image

    https://github.com/PaddlePaddle/PASSL/blob/9d7a9fd4af41772e29120553dddab1c162e4cb70/passl/modeling/architectures/BYOL.py#L263

    • 在L2 Head中将对比loss和有监督的CE loss加在了一起返回

    image

    https://github.com/PaddlePaddle/PASSL/blob/9d7a9fd4af41772e29120553dddab1c162e4cb70/passl/modeling/heads/l2_head.py#L43

    opened by youqingxiaozhua 0
  • [飞桨论文复现挑战赛(第六期)] (85) Emerging Properties in Self-Supervised Vision Transformers

    [飞桨论文复现挑战赛(第六期)] (85) Emerging Properties in Self-Supervised Vision Transformers

    PR types

    New features

    PR changes

    APIs

    Describe

    • Task: https://github.com/PaddlePaddle/Paddle/issues/41482
    • 添加 passl.model.architectures.dino

    Peformance

    | Model | Official | Passl | | ---- | ---- | ---- | | DINO | 74.0 | 73.6 |

    • [x] 预训练和linear probe代码
    • [ ] 预训练和linear probe权重
    • [ ] 文档
    • [ ] TIPC
    opened by fuqianya 0
Releases(v1.0.0)
  • v1.0.0(Feb 24, 2022)

    • 新增 XCiT 视觉 Transformer 模型 xcit_nano_12_p8_224 蒸馏模型训练指标对齐,感谢 @BrilliantYuKaimin 的高质量贡献 🎉 🎉 🎉

    PASSL飞桨自监督领域核心学习库,提供大量高精度的视觉自监督模型、视觉 Transformer 模型,并支持超大视觉模型分布式训练功能,旨在提升飞桨开发者在自监督领域建模效率,并提供基于飞桨框架2.2的超大视觉模型领域最佳实践

    Source code(tar.gz)
    Source code(zip)
PrimitiveNet: Primitive Instance Segmentation with Local Primitive Embedding under Adversarial Metric (ICCV 2021)

PrimitiveNet Source code for the paper: Jingwei Huang, Yanfeng Zhang, Mingwei Sun. [PrimitiveNet: Primitive Instance Segmentation with Local Primitive

Jingwei Huang 47 Dec 06, 2022
Bounding Wasserstein distance with couplings

BoundWasserstein These scripts reproduce the results of the article Bounding Wasserstein distance with couplings by Niloy Biswas and Lester Mackey. ar

Niloy Biswas 1 Jan 11, 2022
Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Couler What is Couler? Couler aims to provide a unified interface for constructing and managing workflows on different workflow engines, such as Argo

Couler Project 781 Jan 03, 2023
Moment-DETR code and QVHighlights dataset

Moment-DETR QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries Jie Lei, Tamara L. Berg, Mohit Bansal For dataset de

Jie Lei 雷杰 133 Dec 22, 2022
[CVPR 2021] Forecasting the panoptic segmentation of future video frames

Panoptic Segmentation Forecasting Colin Graber, Grace Tsai, Michael Firman, Gabriel Brostow, Alexander Schwing - CVPR 2021 [Link to paper] We propose

Niantic Labs 44 Nov 29, 2022
Stacs-ci - A set of modules to enable integration of STACS with commonly used CI / CD systems

Static Token And Credential Scanner CI Integrations What is it? STACS is a YARA

STACS 18 Aug 04, 2022
RodoSol-ALPR Dataset

RodoSol-ALPR Dataset This dataset, called RodoSol-ALPR dataset, contains 20,000 images captured by static cameras located at pay tolls owned by the Ro

Rayson Laroca 45 Dec 15, 2022
B2EA: An Evolutionary Algorithm Assisted by Two Bayesian Optimization Modules for Neural Architecture Search

B2EA: An Evolutionary Algorithm Assisted by Two Bayesian Optimization Modules for Neural Architecture Search This is the offical implementation of the

SNU ADSL 0 Feb 07, 2022
Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis"

Beyond the Spectrum Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis" by Yang He, Ning Yu, Margret Keu

Yang He 27 Jan 07, 2023
Pytorch Implementation of the paper "Cross-domain Correspondence Learning for Exemplar-based Image Translation"

CoCosNet Pytorch Implementation of the paper "Cross-domain Correspondence Learning for Exemplar-based Image Translation" (CVPR 2020 oral). Update: 202

Lingbo Yang 38 Sep 22, 2021
Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

This repo contains the implementations of Object DGCNN (https://arxiv.org/abs/2110.06923) and DETR3D (https://arxiv.org/abs/2110.06922). Our implementations are built on top of MMdetection3D.

Wang, Yue 539 Jan 07, 2023
This is a JAX implementation of Neural Radiance Fields for learning purposes.

learn-nerf This is a JAX implementation of Neural Radiance Fields for learning purposes. I've been curious about NeRF and its follow-up work for a whi

Alex Nichol 62 Dec 20, 2022
Simple-System-Convert--C--F - Simple System Convert With Python

Simple-System-Convert--C--F REQUIREMENTS Python version : 3 HOW TO USE Run the c

Jonathan Santos 2 Feb 16, 2022
Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image (ICCV 2021)

Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color

75 Dec 02, 2022
Implementation of Wasserstein adversarial attacks.

Stronger and Faster Wasserstein Adversarial Attacks Code for Stronger and Faster Wasserstein Adversarial Attacks, appeared in ICML 2020. This reposito

21 Oct 06, 2022
AQP is a modular pipeline built to enable the comparison and testing of different quality metric configurations.

Audio Quality Platform - AQP An Open Modular Python Platform for Objective Speech and Audio Quality Metrics AQP is a highly modular pipeline designed

Jack Geraghty 24 Oct 01, 2022
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Taming Visually Guided Sound Generation • [Project Page] • [ArXiv] • [Poster] • • Listen for the samples on our project page. Overview We propose to t

Vladimir Iashin 226 Jan 03, 2023
Certifiable Outlier-Robust Geometric Perception

Certifiable Outlier-Robust Geometric Perception About This repository holds the implementation for certifiably solving outlier-robust geometric percep

83 Dec 31, 2022
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Facebook Research 1.6k Dec 25, 2022
BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

weijiawu 47 Dec 26, 2022