Research using python - Guide for development of research code (using Anaconda Python)

Overview

Guide for development of research code
(using Anaconda Python)

TL;DR:

One time setup

  1. Install git and go through its one time setup, bare minimum:
    git config --global user.name “First Last”
    git config --global user.email “first[email protected]”
    git config --global core.editor editor_of_choice
    
    Editor option for the few folks on windows (haven't tried it myself):
    git config --global core.editor "'input/path/to/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"
    
  2. Install git-lfs and run git lfs install.
  3. Install miniconda.
  4. Sign up for a GitHub account.
  5. Generate an SSH key and add it to your GitHub account.

Once per repository setup

  1. Create empty repository on GitHub, lets call it my_project.
  2. Initial commit into local repository and push to remote: 0. Create local repository (also creates new directory) git init my_project
    1. Create a markdown file, README.md describing the project.
    2. Create an environment_dev.yml file based on this example. Change the environment name to an appropriate one and add relevant packages.
    3. Copy this pre-commit configuration file.
    4. Copy this .gitignore file and add file types you want git to ignore.
    5. Add file types to be tracked by git-lfs based on file extension, creates the .gitattributes file (e.g. git lfs track "*.pth")
    6. Copy this .flake8 file to customize the tool settings.
git add README.md environment_dev.yml .pre-commit-config.yaml .gitattributes .gitignore .flake8
git commit
git branch -M main
git remote add origin [email protected]:user_name/my_project.git
git push -u origin main
  1. Create virtual environment activate it and set up pre-commit:
    conda env create -f environment_dev.yml
    conda activate my_project_dev
    pre-commit install
    

Start working

  1. Activate virtual environment conda activate my_project_dev
  2. Create new branch off of main:
git checkout main
git checkout -b my_new_branch
  1. Work.
  2. Commit locally and push to remote (origin can be either a fork, if using a triangular workflow, or the original repository if using a centralized workflow):
git add file1 file2 file3
git commit
git push origin my_new_branch
  1. Create a pull request on GitHub and after tests pass merge into main branch.

If code is not in the remote repository, consider it lost.

Long version

Why should you care?

Most scientists need to write code as part of their research. This is a "physical" embodiment of the underlying algorithmic and mathematical theory. Traditionally the software engineering standards applied to code written as part of research have been rather low (rampant code duplication...). In the past decade we have seen this change. Primarily because it is now much more common for researchers to share their code (often due to the "encouragement" of funding agencies) in all its glory.

When sharing code, we expect it to comply with some minimal software engineering standards including design, readability, and testing.

I strive to follow the guidance below, but don't always. Still, it's important to have a goal to strive towards. To quote Lewis Carol (If you don't know where you're going, any road will take you there). From Alice's Adventures in Wonderland:

“Would you tell me, please, which way I ought to go from here?” “That depends a good deal on where you want to get to,” said the Cat. “I don’t much care where-” said Alice. “Then it doesn’t matter which way you go,” said the Cat. "-so long as I get somewhere,” Alice added as an explanation.“Oh, you’re sure to do that,” said the Cat, “if you only walk long enough.”

Personal pet peeves, in no particular order:

  • A single commit of all the code in the GitHub repository. Yes, you're sharing code but it did not magically materialize in its final form, be transparent so that we can trust the code and see how it developed over time. We can learn from paths that did not pan out almost as much as from the path that did. By providing all of the history we can see which algorithmic paths were attempted and did not work out. Help others avoid going down dead-end paths.
  • Repository contains .DS_Store files. Yes, we know you are proud of your Mac. I like OSX too, but seriously, you should have added this file type to the .gitignore file when setting up the repository.
  • Deep learning code sans-data, sans-weight files. This is completely useless in terms of reproducibility. Don't "share" like this.
  • Code duplication with minor, hard to detect, differences between copies.

Version control

  1. Use a version control system, currently Git is the VCS of the day. Learn how to use it (introduction to git slide deck).
  2. Use a remote repository, your cloud backup. Keep it private during development and then make it public upon publication acceptance. Free services GitHub, BitBucket.
  3. Do not commit binary or large files into the repository. Use git-lfs. Beware the Jupyter notebook. Do not commit notebooks with output as this will cause the repository size to blow up, particularly if output includes images. Clear the output before committing.
  4. Use the pre-commit framework to improve (1) compliance to code style (2) avoid commits of large/binary files, AWS credentials and private keys. We all need a little help complying with our self imposed constraints (example configuration file). Note that git pre-commit hooks do not preclude non-compliant commits, as a determined user can go around the hooks, git commit --no-verify.

Writing code (Python as a use case)

Many languages have style, testing and documentation tools and conventions. Here we focus on Python, but the concepts are similar for all languages.

  1. Style - Use consistent style and enforce it. Other human beings need to read the code and readily understand it. Write code that is compliant with PEP8 (the Python style guide):
    • Use flake8 to enforce PEP8.
    • Use the Black code formatter, works for scripts and Jupyter notebooks (for Jupyter notebook support pip install black[jupyter] instead of the regular pip install black). It does not completely agree with flake8, so use both?
    • Some folks don't like the Black formatting, it isn't all roses. An alternative is autopep8.
  2. Testing - Write nominal regression tests at the same time you implement the functionality. Non-rigorous regression testing is acceptable in a research setting as we explore various solutions. The more rigorous the testing the easier it will be for a development team to get code into production. Use pytest for this task.
  3. Documentation - Write the documentation while you are implementing. Start by adding a README file to your repository (use markdown or restructured text). It should include a general description of the repository contents, how to run the programs and possibly instructions on how to build them from source code. Generally, when we postpone writing documentation we will likely never do it. That's fine too, as long as you are willing to admit to yourself that you are consciously choosing to not document your code. In Python, use a consistent Docstring format. Two popular ones are Google style and NumPy style.
  4. Reproducible environment - include instructions or configuration files to reproduce the environment in which the code is expected to work. In Python you provide files listing all package dependencies enabling the creation of the appropriate virtual environment in which to run the program. A requirements.txt for plain Python, or an environment.yml for the anaconda Python distribution. For development we often rely on additional packages not required for usage (e.g. pytest). Consequentially we include a requirements_dev.txt (environment_dev.yml) in addition to the requirements.txt (environment.yml) files. Sample requirements.txt, requirements_dev.txt and environment.yml, environment_dev.yml files.
  5. Your code is a mathematical multi-parametric function that depends on many parameters beyond the input. These parameters are either:
  • Hard coded - best avoided if they need to be changed for different inputs.
  • Given as arguments on the command-line, appropriate when you have a few, less than five. Several popular Python modules/packages that support parsing command-line arguments: argparse, click and docopt. Personally I use argparse (example usage available here).
  • Specified in a configuration file. These usually use XML or JSON formats. I use JSON (example configuration file and short script that reads it). The parameters file is given on the command-line so we also get to use argparse.

Continuous integration

Automate testing and possibly delivery using continuous integration. There are many CI services that readily integrate with remote hosted git services. In the past I've used TravisCI and CircleCI. Currently using GitHub Actions. All of these rely on a yaml based configuration files to define workflows.

An example GitHub actions workflow which runs the same tests as the pre-commit defined above is available here.

Owner
Ziv Yaniv
Ziv Yaniv
Attempt at a Windows version of the plotman Chia Plot Manager system

windows plotman: an attempt to get plotman to work on windows THIS IS A BETA. Not ready for production use just yet. Almost, but not quite there yet.

59 May 11, 2022
A python script made for personal use to monitor for sports card restocks on target.com since they are sold out often

TargetProductMonitor A python script made for personal use to monitor for sports card resocks on target.com since they are sold out often. When a rest

Bryan Lorden 2 Jul 31, 2022
Hashcrack - A non-object oriented open source, Software for Windows/Linux made in Python 3

Multi Force This project is a non-object oriented open source, Software for Wind

Radiationbolt 3 Jan 02, 2023
Fast STL (ASCII & Binary) importer for Blender

blender-fast-stl-importer Fast STL (ASCII & Binary) importer for Blender based on https://en.wikipedia.org/wiki/STL_(file_format) Technical notes: flo

Iyad Ahmed 7 Apr 17, 2022
A Lite Package focuses on making overwrite and mending functions easier and more flexible.

Overwrite Make Overwrite More flexible In Python A Lite Package focuses on making overwrite and mending functions easier and more flexible. Certain Me

2 Jun 15, 2022
Tensorboard plugin 3d with python

tensorboard-plugin-3d Overview In this example, we render a run selector dropdown component. When the user selects a run, it shows a preview of all sc

KitwareMedical 26 Nov 14, 2022
Fast Base64 encoding/decoding in Python

Fast Base64 implementation This project is a wrapper on libbase64. It aims to provide a fast base64 implementation for base64 encoding/decoding. Insta

Matthieu Darbois 96 Dec 26, 2022
Some usefull scripts for the Nastran's 145 solution (Flutter Analysis) using the pyNastran package.

nastran-aero-flutter This project is intended to analyse the Supersonic Panel Flutter using the NASTRAN software. The project uses the pyNastran and t

zuckberj 11 Nov 16, 2022
Handwrite - Type in your Handwriting!

Handwrite - Type in your Handwriting! Ever had those long-winded assignments, that the teacher always wants handwritten?

coded 7 Dec 06, 2022
Sheet2export - FreeCAD macro to export spreadsheet

Description This is FreeCAD macro to export spreadsheet to file.

Darek L 3 Jul 09, 2022
A site that went kinda viral that lets you put Bernie Sanders in places

Bernie In Places An app that accidentally went viral! Read the story in WIRED here Install First, create a python virtual environment, and install all

310 Aug 22, 2022
Distributed behavioral experiments

Autopilot Docs Paper Forum Hardware Autopilot is a Python framework for performing complex, hardware-intensive behavioral experiments with swarms of n

70 Dec 14, 2022
Python script that automates the tasks involved in starting a new coding project

Auto Project Builder Automates the repetitive tasks while starting a new project Installation Use the REQUIREMENTS.txt file to install the dependencie

Prathap S S 1 Feb 03, 2022
Fused multiply-add (with a single rounding) for Python.

pyfma Fused multiply-add for Python. Fused multiply-add computes (x*y) + z with a single rounding. Useful for dot products, matrix multiplications, po

Nico Schlömer 18 Nov 08, 2022
Calculadora-basica - Calculator with basic operators

Calculadora básica Calculadora com operadores básicos; O programa solicitará a d

Vitor Antoni 2 Apr 26, 2022
Pokemon catch events project to demonstrate data pipeline on AWS

Pokemon Catches Data Pipeline This is a sample project to practice end-to-end data project; Terraform is used to deploy infrastructure; Kafka is the t

Vitor Carra 4 Sep 03, 2021
Repository for my Monika Assistant project

Monika_Assistant Repository for my Monika Assistant project Major changes: Added face tracker Added manual daily log to see how long it takes me to fi

3 Jan 10, 2022
PythonCalculator - A simple Calculator made in python using tkinter for GUI

PythonCalculator A simple Calculator made in python using tkinter for GUI. For P

ʀᴇxɪɴᴀᴢᴏʀ 1 Jan 01, 2022
CountBoard 是一个基于Tkinter简单的,开源的桌面日程倒计时应用。

CountBoard 是一个基于Tkinter简单的,开源的桌面日程倒计时应用。 基本功能 置顶功能 是否使窗体一直保持在最上面。 简洁模式 简洁模式使窗体更加简洁。 此模式下不可调整大小,请提前在普通模式下调整大小。 设置功能 修改主窗体背景颜色,修改计时模式。 透明设置 调整窗体的透明度。 修改

gaoyongxian 130 Dec 01, 2022
Just some mtk tool for exploitation, reading/writing flash and doing crazy stuff

Just some mtk tool for exploitation, reading/writing flash and doing crazy stuff. For linux, a patched kernel is needed (see Setup folder) (except for read/write flash). For windows, you need to inst

Bjoern Kerler 1.1k Dec 31, 2022