Classifies galaxy morphology with Bayesian CNN

Related tags

Deep Learningzoobot
Overview

Zoobot

Documentation Status

Zoobot classifies galaxy morphology with deep learning. This code will let you:

  • Reproduce and improve the Galaxy Zoo DECaLS automated classifications
  • Finetune the classifier for new tasks

For example, you can train a new classifier like so:

model = define_model.get_model(
    output_dim=len(schema.label_cols),  # schema defines the questions and answers
    input_size=initial_size, 
    crop_size=int(initial_size * 0.75),
    resize_size=resize_size
)

model.compile(
    loss=losses.get_multiquestion_loss(schema.question_index_groups),
    optimizer=tf.keras.optimizers.Adam()
)

training_config.train_estimator(
    model, 
    train_config,  # parameters for how to train e.g. epochs, patience
    train_dataset,
    test_dataset
)

Install using git and pip: git clone [email protected]:mwalmsley/zoobot.git pip install -r zoobot/requirements.txt (virtual env or conda highly recommended) pip install -e zoobot The main branch is for stable-ish releases. The dev branch includes the shiniest features but may change at any time.

To get started, see the documentation.

I also include some working examples for you to copy and adapt:

Latest cool features on dev branch (June 2021):

  • Multi-GPU distributed training
  • Support for Weights and Biases (wandb)
  • Worked examples for custom representations

Contributions are welcome and will be credited in any future work.

If you use this repo for your research, please cite the paper.

Comments
  • Benchmarks

    Benchmarks

    It's important that Zoobot has proper benchmarks so that we can be confident new releases work properly for users. This PR adds those benchmarks.

    In the course of setting up the benchmarks, I have made some major changes/improvements:

    • pytorch-galaxy-datasets refactored to work for tensorflow, imports adapted
    • both tensorflow and pytorch zoobot versions use albumentations for augmentations. Old TF code removed.
    • tensorflow version bumped to 2.10 (current latest) while I'm at it
    • pytorch version now has logging for per-question loss. Loss func aggregation has new option to support this.
    • TensorFlow version has per-question logging also, but awaiting issue with Keras team to enable
    • Created minimal_example.py for TensorFlow (thanks, @katgre )
    • Support CPU-only PyTorch training
    • Refactor TF TrainingConfig to Trainer object, Lightning style, for consistency
    enhancement 
    opened by mwalmsley 3
  • on_train_batch_end is slow in TF

    on_train_batch_end is slow in TF

    Unclear what's causing this slowness. Presumably a callback I added - but none look like they should be heavy? Perhaps something wandb is doing?

    • Remove all callbacks and rerun
    • Remove wandb and rerun For each, check if slow warning continues (or if training speed changes at all)
    enhancement 
    opened by mwalmsley 3
  • add gh action to publish package to pypi

    add gh action to publish package to pypi

    Related to https://github.com/mwalmsley/zoobot/issues/18#issuecomment-1278635788

    This PR adds an auto CI release mechanism for publishing zoobot to pypi. It uses the GH action to release to pypi https://github.com/pypa/gh-action-pypi-publish

    opened by camallen 3
  • Publish latest version to PyPi?

    Publish latest version to PyPi?

    A question rather than a request. Are there any plans to publish the refactored work ?

    PyPi shows v0.0.1 is published https://pypi.org/project/zoobot/#history on 15th March 2021 but the latest code is ~v0.0.3 (tags) and the refactor seems to be working well.

    Ideally I can pull in these packages to my own env / container and then train with the latest code vs pulling in from github etc.

    opened by camallen 3
  • setup branch protection rules on 'main'

    setup branch protection rules on 'main'

    https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/managing-a-branch-protection-rule

    It may be too restrictive for your use case / dev flows but we use this for contributor PRs etc. Basically we ensure that a PR meets certain criteria in terms of our CI runs, can only merge a PR once one of the CI runs v3.7 or v3.9 tests pass.

    Feel free to close if you don't think this is useful.

    enhancement 
    opened by camallen 2
  • Deprecate TFRecords

    Deprecate TFRecords

    TFRecords are cumbersome and take up a lot of disk space. It's much simpler to learn directly from images on disk, at the cost of some I/O performance.

    This PR removes support for TFRecords in favour of images-on-disk. This will ultimately enable new TensorFlow weights trained on all of DESI (impractical with TFRecords).

    Breaking change for anyone using TFRecords (i.e. everyone using TensorFlow to train from scratch). Finetuning should not be affected.

    TODO - will require new greyscale/colour pretrained models, just for safety.

    opened by mwalmsley 2
  • feat(CI): Add proposed python CI GH Action

    feat(CI): Add proposed python CI GH Action

    This PR proposes to add a simple GH Action script that establishes a python environment, downloads the requirements and runs pytest.

    Some other things to consider might be to use conda for virtual environments and creating CI scripts for Docker as well.

    opened by SauravMaheshkar 2
  • Improve data files for docker

    Improve data files for docker

    This PR changes the docker / compose setup, specifically it

    • consolidates the docker files to cuda and tensorflow base images (no need for a python base image)
    • adds a .dockerignore entry for all data files when building the container to keep the size down
    • and provides an easy way to inject them at run time via local directory mounts in the compose file
    • finally this removes specific to my machine local directory setup for injecting unrelated data files
    opened by camallen 2
  • add wandb logging, freeze batchnorm by default

    add wandb logging, freeze batchnorm by default

    Doing some polishing on finetuning

    • Add wandb logging to the full_tree example. @camallen use this for dashboard. You will need to add import wandb, wandb.init(authkey, etc) just before when running on Azure.
    • Freeze batch norm layers by default when finetuning, with new recursive function
    • Pass additional params via config (thanks Cam)
    • Minor cleanup
    opened by mwalmsley 1
  • Add PyTorch Finetuning Capability, Examples

    Add PyTorch Finetuning Capability, Examples

    Key change is adding pytorch.training.finetune() method. Works on either classification (e.g. 0, 1) data or count (e.g. 12 said yes, 4 said no) data.

    Includes three working examples:

    • Binary classification, with tiny rings subset
    • Counts for single question, with full internal rings data
    • Counts for all questions, with GZ Cosmic Dawn schema

    Also updates various imports for the galaxy-datasets refactor, fixes prediction method to work on unlabelled data, minor QoL improvements.

    Finally, changes PyTorch dense layer initialisation to custom high-uncertainty initialisation - see efficientnet_custom.py

    cc @camallen

    opened by mwalmsley 1
  • Add v0.02 changes

    Add v0.02 changes

    Adds support (minimal working examples, a guide) for calculating new representations with a trained model.

    Also adds significant new features:

    • Distributed training with several GPUs
    • Metric logging with Weights&Biases (add your own login credentials)
    • Train on color (3-band) images, not just greyscale

    Also adds a critical bugfix (when loading images for direct predictions i.e. not via TFRecords, correctly normalise to the 0-1 interval expected (without documentation) by the tf.keras.experimental.preprocessing layers).

    Also adds misc. minor fixes and documentation tweaks.

    This code was used for the morphology tools paper (to be submitted shortly).

    opened by mwalmsley 1
  • Avoid --extra-index-url via dependency_links

    Avoid --extra-index-url via dependency_links

    It should be possible to search for non-standard package repositories using just setup.py, without having the user also set --extra-index-url.

    https://setuptools.pypa.io/en/latest/deprecated/dependency_links.html

    But I couldn't get this to work on a quick try.

    enhancement help wanted 
    opened by mwalmsley 1
  • Can't import finetune while going through finetune_binary_classification.py

    Can't import finetune while going through finetune_binary_classification.py

    I tried to go through finetune_binary_classification.py, but got the error:

    ImportError: cannot import name 'finetune' from 'zoobot.pytorch.training' (/usr/local/lib/python3.8/dist-packages/zoobot/pytorch/training/init.py)

    I tried it both with kasia and dev branch, went through "git clone" and "pip install" (I remembered there were some issues during Hackaton regarding that), also tried to import other features from the folder (i.e. losses) and it worked fine.

    bug 
    opened by katgre 2
  • Create a simple decision tree in minimal_example.py

    Create a simple decision tree in minimal_example.py

    Instead of using on of the complicated decision trees from decals dr5, let's create a simple decision tree with one dependency already written in the minimal_example.py.

    opened by katgre 0
Releases(v0.0.3)
  • v0.0.3(Apr 25, 2022)

    Improved documentation and refactored train API (pytorch).

    Awaiting results from several segmentation experiments ahead of public release (inc pytorch version).

    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Oct 4, 2021)

  • beta(Sep 29, 2021)

    Initial release.

    This had enough documentation and code to replicate the DECaLS model and make predictions. There are a few minor missing arguments and similar typos that you might have stumbled into, because I made some last minute changes without updating the docs, but everything worked with a little stack tracing.

    Source code(tar.gz)
    Source code(zip)
Owner
Mike Walmsley
Mike Walmsley
Contrastive Learning for Metagenomic Binning

CLMB A simple framework for CLMB - a novel deep Contrastive Learningfor Metagenomic Binning Created by Pengfei Zhang, senior of Department of Computer

1 Sep 14, 2022
nanodet_plus,yolov5_v6.0

OAK_Detection OAK设备上适配nanodet_plus,yolov5_v6.0 Environment pytorch = 1.7.0

炼丹去了 1 Feb 18, 2022
Automated Hyperparameter Optimization Competition

QQ浏览器2021AI算法大赛 - 自动超参数优化竞赛 ACM CIKM 2021 AnalyticCup 在信息流推荐业务场景中普遍存在模型或策略效果依赖于“超参数”的问题,而“超参数"的设定往往依赖人工经验调参,不仅效率低下维护成本高,而且难以实现更优效果。因此,本次赛题以超参数优化为主题,从真

20 Dec 09, 2021
🐦 Quickly annotate data from the comfort of your Jupyter notebook

🐦 pigeon - Quickly annotate data on Jupyter Pigeon is a simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort

Anastasis Germanidis 647 Jan 05, 2023
Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images

SASSnet Code for paper: Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images(MICCAI 2020) Our code is origin from UA-MT You can fin

klein 125 Jan 03, 2023
Model parallel transformers in Jax and Haiku

Mesh Transformer Jax A haiku library using the new(ly documented) xmap operator in Jax for model parallelism of transformers. See enwik8_example.py fo

Ben Wang 4.8k Jan 01, 2023
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

Pyjcsx 328 Dec 17, 2022
BBB streaming without Xorg and Pulseaudio and Chromium and other nonsense (heavily WIP)

BBB Streamer NG? Makes a conference like this... ...streamable like this! I also recorded a small video showing the basic features: https://www.youtub

Lukas Schauer 60 Oct 21, 2022
IRON Kaggle project done while doing IRONHACK Bootcamp where we had to analyze and use a Machine Learning Project to predict future sales

IRON Kaggle project done while doing IRONHACK Bootcamp where we had to analyze and use a Machine Learning Project to predict future sales. In this case, we ended up using XGBoost because it was the o

1 Jan 04, 2022
Implementation of Axial attention - attending to multi-dimensional data efficiently

Axial Attention Implementation of Axial attention in Pytorch. A simple but powerful technique to attend to multi-dimensional data efficiently. It has

Phil Wang 250 Dec 25, 2022
METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)

Nautilus-OCR The National Library of Luxembourg (BnL) started its first initiative in digitizing newspapers, with layout recognition and OCR on articl

National Library of Luxembourg 36 Dec 05, 2022
Neural Network Libraries

Neural Network Libraries Neural Network Libraries is a deep learning framework that is intended to be used for research, development and production. W

Sony 2.6k Dec 30, 2022
A Gura parser implementation for Python

Gura Python parser This repository contains the implementation of a Gura (compliant with version 1.0.0) format parser in Python. Installation pip inst

Gura Config Lang 19 Jan 25, 2022
ObjectDrawer-ToolBox: a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system

ObjectDrawer-ToolBox is a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system, Object Drawer.

77 Jan 05, 2023
Secure Distributed Training at Scale

Secure Distributed Training at Scale This repository contains the implementation of experiments from the paper "Secure Distributed Training at Scale"

Yandex Research 9 Jul 11, 2022
This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures using receptive field analysis (RFA) and create graph visualizations of your architecture.

ReceptiveFieldAnalysisToolbox This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures usin

84 Nov 23, 2022
This code provides various models combining dilated convolutions with residual networks

Overview This code provides various models combining dilated convolutions with residual networks. Our models can achieve better performance with less

Fisher Yu 1.1k Dec 30, 2022
Code of TIP2021 Paper《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet and Pytorch versions.

SFace Code of TIP2021 Paper 《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet, PyTorch and Jittor versi

Zhong Yaoyao 47 Nov 25, 2022
Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch.

SE3 Transformer - Pytorch Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch. May be needed for replicating Alphafold2 resu

Phil Wang 207 Dec 23, 2022
Project ArXiv Citation Network

Project ArXiv Citation Network Overview This project involved the analysis of the ArXiv citation network. Usage The complete code of this project is i

Dennis Núñez-Fernández 5 Oct 20, 2022