ONT Analysis Toolkit (OAT)

Overview

Code style: black

                                       ,d
                                       88
              ,adPPYba,  ,adPPYYba, MM88MMM
              a8"     "8a ""     `Y8   88   
              8b       d8 ,adPPPPP88   88   
              "8a,   ,a8" 88,    ,88   88,  
              `"YbbdP"'  `"8bbdP"Y8   "Y888

ONT Analysis Toolkit (OAT)

A pipeline to facilitate sequencing of viral genomes (amplified with a tiled amplicon scheme) and assembly into consensus genomes. Supported viruses currently include Human betaherpesvirus 5 (CMV) and SARS-CoV-2. ONT sequencing data from a MinION can be monitored in real time with the rampart module. All analysis steps are handled by the analysis module, whereby sequencing data are analysed using a pipeline written in snakemake, with the choice of tools heavily influenced by the artic minion pipeline. Both steps can be run in order with a single command using the all module.

Author: Dr Charles Foster

Starting out

To begin with, clone this github repository:

git clone https://github.com/charlesfoster/ont-analysis-toolkit.git

cd ont-analysis-toolkit

Next, install most dependencies using conda:

conda env create -f environment.yml

Pro tip: if you install mamba, you can create the environment with that command instead of conda. A lot of conda headaches go away: it's a much faster drop-in replacement for conda.

conda install mamba
mamba env create -f environment.yml

Install the pipeline correctly by activating the conda environment and using the setup.py script with:

conda activate oat
pip install .

Other dependencies:

  • Demultiplexing is done using guppy_barcoder. The program will need to be installed and in your path.
  • If analysing SARS-CoV-2 data, lineages are typed using pangolin. Accordingly, pangolin needs to be installed according to instructions at https://github.com/cov-lineages/pangolin. The pipeline will fail if the pangolin environment can't be activated.
  • Variants are called using medaka and longshot. Ideally we could install these via mamba in the main environment.yml file, but there are sadly some necessary libraries for medaka that are incompatible with our main oat environment. Consequently, I have written the analysis pipeline so that snakemake automatically installs medaka and its dependencies into an isolated environment during execution of the oat pipeline. The environment is only created the first time you run the pipeline, but if you run the pipeline from a different working directory in the future, the environment will be created again. Solution: always run oat from the same working directory

tl;dr: you don't need to do anything for variant calling to work; just don't get confused during the initial pipeline run when the terminal indicates creation of a new conda environment. You should run oat from the same directory each time, otherwise a new conda environment will be created each time.

Usage

The environment with all necessary tools is installed as 'oat' for brevity. The environment should first be activated:

conda activate oat

Then, to run the pipeline, it's as simple as:

oat 
   

   

where should be replaced with the full path to a spreadsheet with minimal metadata for the ONT sequencing run (see example spreadsheet: run_data_example.csv). The most important thing to remember is that the 'run_name' in the spreadsheet must exactly match the name of the sequencing run, as set up in MinKNOW.

Note that there are many additional options/settings to take advantage of:

A pipeline for sequencing and analysis of viral genomes using an ONT MinION positional arguments: samples_file Path to file with sample metadata (.csv format). See example spreadsheet for minimum necessary information. optional arguments: -h, --help show this help message and exit -b , --barcode_kit Barcode kit that you used: '12' (SQK-RBK004) or '96' (SQK-RBK110-96) (default: 12) -c , --consensus_freq Variant allele frequency threshold for a variant to be incorporated into consensus genome. Variants below this frequency will be incorporated with an IUPAC ambiguity. Default: 0.8 -d, --demultiplex Demultiplex reads using guppy_barcoder. By default, assumes reads were already demultiplexed by MinKNOW. Reads are demultiplexed into the output directory. -f, --force Force overwriting of completed files in snakemake analysis (Default: files not overwritten) -n, --dry_run Dry run only -m rampart | analysis | all, --module rampart | analysis | all Pipeline module to run: 'rampart', 'analysis' or 'all' (rampart followed by analysis). Default: 'all' -o OUTDIR, --outdir OUTDIR Output directory. Default: /path/to/ont- analysis-toolkit/analysis_results + 'run_name' from samples spreadsheet --rampart_outdir RAMPART_OUTDIR Output directory. Default: /path/to/ont- analysis-toolkit/rampart_files -p, --print_dag Save directed acyclic graph (DAG) of workflow to outdir -r REFERENCE, --reference REFERENCE Reference genome to use: 'MN908947.3' (SARS-CoV-2), 'NC_006273.2' (CMV Merlin). Other references can be used, but the corresponding assembly (fasta) and annotation (gff3 from Ensembl) must be added to /home/cfos/miniconda3/envs/oat/lib/python3.9/site- packages/oat/references (Default: MN908947.3) -t , --threads Number of threads to use -v VARIANT_CALLER, --variant_caller VARIANT_CALLER Variant caller to use. Choices: 'medaka-longshot'. Default: 'medaka-longshot' --create_envs_only Create conda environments for snakemake analysis, but do no further analysis. Useful for initial pipeline setup. Default: False --snv_min SNV_MIN Minimum variant allele frequency for an SNV to be kept Default: 0.2 --delete_reads Delete demultiplexed reads after analysis --redo_analysis Delete entire analysis output directory and contents for a fresh run --version show program's version number and exit --minknow_data MINKNOW_DATA Location of MinKNOW data root. Default: /var/lib/minknow/data --max_memory Maximum memory (in MB) that you would like to provide to snakemake. Default: 53456MB --quiet Stop printing of snakemake commands to screen. --report Generate report (currently non-functional).">

                                       ,d
                                       88
              ,adPPYba,  ,adPPYYba, MM88MMM
              a8"     "8a ""     `Y8   88   
              8b       d8 ,adPPPPP88   88   
              "8a,   ,a8" 88,    ,88   88,  
              `"YbbdP"'  `"8bbdP"Y8   "Y888

        OAT: ONT Analysis Toolkit (version 0.2.0)

usage: oat [options] 
          
           

A pipeline for sequencing and analysis of viral genomes using an ONT MinION

positional arguments:
  samples_file          Path to file with sample metadata (.csv format). See
                        example spreadsheet for minimum necessary information.

optional arguments:
  -h, --help            show this help message and exit
  -b 
           
            , --barcode_kit 
            
             
                        Barcode kit that you used: '12' (SQK-RBK004) or '96'
                        (SQK-RBK110-96) (default: 12)
  -c 
             
              , --consensus_freq 
              
                Variant allele frequency threshold for a variant to be incorporated into consensus genome. Variants below this frequency will be incorporated with an IUPAC ambiguity. Default: 0.8 -d, --demultiplex Demultiplex reads using guppy_barcoder. By default, assumes reads were already demultiplexed by MinKNOW. Reads are demultiplexed into the output directory. -f, --force Force overwriting of completed files in snakemake analysis (Default: files not overwritten) -n, --dry_run Dry run only -m rampart | analysis | all, --module rampart | analysis | all Pipeline module to run: 'rampart', 'analysis' or 'all' (rampart followed by analysis). Default: 'all' -o OUTDIR, --outdir OUTDIR Output directory. Default: /path/to/ont- analysis-toolkit/analysis_results + 'run_name' from samples spreadsheet --rampart_outdir RAMPART_OUTDIR Output directory. Default: /path/to/ont- analysis-toolkit/rampart_files -p, --print_dag Save directed acyclic graph (DAG) of workflow to outdir -r REFERENCE, --reference REFERENCE Reference genome to use: 'MN908947.3' (SARS-CoV-2), 'NC_006273.2' (CMV Merlin). Other references can be used, but the corresponding assembly (fasta) and annotation (gff3 from Ensembl) must be added to /home/cfos/miniconda3/envs/oat/lib/python3.9/site- packages/oat/references (Default: MN908947.3) -t 
               
                , --threads 
                
                  Number of threads to use -v VARIANT_CALLER, --variant_caller VARIANT_CALLER Variant caller to use. Choices: 'medaka-longshot'. Default: 'medaka-longshot' --create_envs_only Create conda environments for snakemake analysis, but do no further analysis. Useful for initial pipeline setup. Default: False --snv_min SNV_MIN Minimum variant allele frequency for an SNV to be kept Default: 0.2 --delete_reads Delete demultiplexed reads after analysis --redo_analysis Delete entire analysis output directory and contents for a fresh run --version show program's version number and exit --minknow_data MINKNOW_DATA Location of MinKNOW data root. Default: /var/lib/minknow/data --max_memory 
                 
                   Maximum memory (in MB) that you would like to provide to snakemake. Default: 53456MB --quiet Stop printing of snakemake commands to screen. --report Generate report (currently non-functional). 
                 
                
               
              
             
            
           
          

What does the pipeline do?

RAMPART Module

All input files for RAMPART are generated based on your input spreadsheet, and a web browser is launched to view the sequencing in real time.

Analysis Module

Reads are mapped to the relevant reference genome with minimap2. Amplicon primers are trimmed using samtools ampliconclip. Variants are called using medaka and longshot, followed by filtering and consensus genome assembly using bcftools. The amino acid consequences of SNPs are inferred using bcftools csq. If analysing SARS-CoV-2, lineages are inferred using pangolin. Finally, a variety of sample QC metrics are combined into a final QC file.

Other Notes

Protocols

A protocol for the amplicon scheme needs (a) to be installed in the pipeline, and (b) named in the run_data.csv spreadsheet for analyses to work correctly. The pipeline comes with the Midnight protocol for SARS-CoV-2 pre-installed (https://www.protocols.io/view/sars-cov2-genome-sequencing-protocol-1200bp-amplic-bwyppfvn). Adding additional protocols is fairly easy:

  1. Make a directory called /path/to/ont-analysis-toolkit/oat/protocols/ARTICV3 (needs to be in all caps)

  2. Make a directory within ARTICV3 called 'rampart'

    (a) Put the normal rampart files within that directory (genome.json, primers.json, protocol.json, references.fasta)

  3. Make a directory within ARTICV3 called 'schemes'

    (a) Put the 'scheme.bed' file with primer coordinates in the 'schemes' directory

  4. Make sure you're in /path/to/ont-analysis-toolkit/, then activate the conda environment and use the following command: pip install .

Done!

Amino acid consequences

For the amino acid consequences step to work, a requirement is an annotation file for the chosen reference genome. The annotations must be in gff3 format, and must be in the 'Ensembl flavour' of gff3. There is a script included in the repository that can convert an NCBI gff3 file into an 'Ensembl flavour' gff3 file: /path/to/ont-analysis-toolkit/oat/scripts/gff2gff.py.

Credits

  • When this pipeline is used, citations should be found for the programs used internally.
  • The gff3 file I included for SARS-CoV-2 was originally sent to me by Torsten Seemann.
  • Being new to using snakemake + wrapper scripts, I used pangolin as a guide for directory structure and rule creation - so thanks to them.
  • The analysis module was heavily influenced by the ARTIC team, especially the artic minion pipeline.
  • gff2gff.py is based on work by Damien Farrell https://dmnfarrell.github.io/bioinformatics/bcftools-csq-gff-format
You might also like...
Red Team Toolkit is an Open-Source Django Offensive Web-App which is keeping the useful offensive tools used in the red-teaming together.
Red Team Toolkit is an Open-Source Django Offensive Web-App which is keeping the useful offensive tools used in the red-teaming together.

RedTeam Toolkit Note: Only legal activities should be conducted with this project. Red Team Toolkit is an Open-Source Django Offensive Web-App contain

A Static Analysis Tool for Detecting Security Vulnerabilities in Python Web Applications
A Static Analysis Tool for Detecting Security Vulnerabilities in Python Web Applications

This project is no longer maintained March 2020 Update: Please go see the amazing Pysa tutorial that should get you up to speed finding security vulne

SpiderFoot automates OSINT collection so that you can focus on analysis.
SpiderFoot automates OSINT collection so that you can focus on analysis.

SpiderFoot is an open source intelligence (OSINT) automation tool. It integrates with just about every data source available and utilises a range of m

Yuyu Scanner is a Web Reconnaissance & Web Analysis Scanner to find assets and information about targets.
Yuyu Scanner is a Web Reconnaissance & Web Analysis Scanner to find assets and information about targets.

Yuyu Scanner Yuyu Scanner is a Web Reconnaissance & Web Analysis Scanner to find assets and information about targets. installation ! run as root

ThePhish: an automated phishing email analysis tool
ThePhish: an automated phishing email analysis tool

ThePhish ThePhish is an automated phishing email analysis tool based on TheHive, Cortex and MISP. It is a web application written in Python 3 and base

Lazarus analysis tools and research report
Lazarus analysis tools and research report

Lazarus Research This repository publishes analysis reports and analysis tools for Operation Dream Job and Operation JTrack for Lazarus. Tools Python

IDA scripts for hypervisor (Hyper-v) analysis and reverse engineering automation
IDA scripts for hypervisor (Hyper-v) analysis and reverse engineering automation

Re-Scripts IA32-VMX-Helper (IDA-Script) IA32-MSR-Decoder (IDA-Script) IA32 VMX Helper It's an IDA script (Updated IA32 MSR Decoder) which helps you to

Android Malware (Analysis | Scoring) System
Android Malware (Analysis | Scoring) System

An Obfuscation-Neglect Android Malware Scoring System Quark-Engine is also bundled with Kali Linux, BlackArch. A trust-worthy, practical tool that's r

Malware arcane - Scripts and notes on my malware analysis journey

Malware Arcane Repository of notes and scripts I use when doing malware analysis

Releases(v0.10.1)
  • v0.10.1(Apr 27, 2022)

    While the 'oat' pipeline has undergone continuous development since its inception, this is the first designated release (well, pre-release). The pipeline should be fully functional, but please let me know if you encounter any errors or bugs.

    The pipeline has the most incorporated features for SARS-CoV-2, but works well with other viruses if you install them properly as per the instructions in the README.md file.

    Source code(tar.gz)
    Source code(zip)
xray多线程批量扫描工具

Auto_xray xray多线程批量扫描工具 简介 xray社区版貌似没有批量扫描,这就让安服仔使用起来很不方便,扫站得一个个手动添加,非常难受 Auto_xray目录下记得放xray,就跟平时一样的。 选项1:oneforall+xray 输入一个主域名,自动采集子域名然后添加到xray任务列表

1frame 13 Nov 09, 2022
Simple Python 3 script to detect the "Log4j" Java library vulnerability (CVE-2021-44228) for a list of URL with multithreading

log4j-detect Simple Python 3 script to detect the "Log4j" Java library vulnerability (CVE-2021-44228) for a list of URL with multithreading The script

Víctor García 187 Jan 03, 2023
Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives

pywb Remote Browsers This repository provides a simple configuration for deploying any pywb with remote browsers provided by OWT/Shepherd Remote Brows

Webrecorder 10 Jul 28, 2022
ProxyLogon Pre-Auth SSRF To Arbitrary File Write

ProxyLogon Pre-Auth SSRF To Arbitrary File Write For Education and Research Usage: C:\python proxylogon.py mail.evil.corp lulz 117 Nov 28, 2022

IDA loader for Apple's iBoot, SecureROM and AVPBooter

IDA iBoot Loader IDA loader for Apple's iBoot, SecureROM and AVPBooter Installation Copy iboot-loader.py to the loaders folder in IDA directory. Credi

matteyeux 74 Dec 23, 2022
Python library to prevent XSS(cross site scripting attach) by removing harmful content from data.

A tool for removing malicious content from input data before saving data into database. It takes input containing HTML with XSS scripts and returns va

2 Jul 05, 2022
CVE-2021-40346 integer overflow enables http smuggling

CVE-2021-40346-POC CVE-2021-40346 integer overflow enables http smuggling Reference: https://jfrog.com/blog/critical-vulnerability-in-haproxy-cve-2021

donky16 34 Nov 15, 2022
Deobfuscate Log4Shell payloads with ease

Ox4Shell Deobfuscate Log4Shell payloads with ease. Description Since the release

Oxeye 137 Jan 02, 2023
A tool to extract the IdP cert from vCenter backups and log in as Administrator

vCenter SAML Login Tool A tool to extract the Identity Provider (IdP) cert from vCenter backups and log in as Administrator Background Commonly, durin

Horizon 3 AI Inc 343 Dec 31, 2022
Hashpic - Hashpic creates an image from a MD5 or SHA512 hash

Hashpic Hashpic creates an image from the MD5 hash of your input. Since v0.2.0 i

0xflotus 15 Nov 23, 2022
LaxrFar Python Obfuscator

LaxrFar Python Obfuscator Usage First do the things from "Upload to Webserver" o

LaxrFar 5 Jul 19, 2022
Search Shodan for Minecraft server IPs to grief

GriefBuddy This script searches Shodan for Minecraft server IPs to grief. This will return all servers connected to the public internet which Shodan h

26 Dec 29, 2022
Tool to check if your DNS comply to Polish Ministry of Finance gambling domains restrictions

dns-mf-hazard Tool to check if your DNS comply to Polish Ministry of Finance gambling domains restrictions How to use it? Installation You need python

Marek Wajdzik 2 Jan 01, 2022
Scanning for CVE-2021-44228

Filesystem log4j_scanner for windows and Unix. Scanning for CVE-2021-44228, CVE-2021-45046, CVE-2019-17571 Requires a minimum of Python 2.7. Can be ex

Brett England 4 Jan 09, 2022
The Web Application Firewall Paranoia Level Test Tool.

Quick WAF "paranoid" Doctor Evaluation WAFPARAN01D3 The Web Application Firewall Paranoia Level Test Tool. — From alt3kx.github.io Introduction to Par

22 Jul 25, 2022
WhPhisher: a Phishing tool With Python

WhPhisher Herramienta para hacer phishing con muchos métodos de túneling -----Como Instalarlo------- pkg install python3 pkg install git git clone htt

WhBeatZ 80 Jan 02, 2023
GRR Rapid Response: remote live forensics for incident response

GRR Rapid Response is an incident response framework focused on remote live forensics. Build Type Status Tests End-to-end Tests Windows Templates Linu

Google 4.3k Jan 05, 2023
Magicspoofing - A python3 script for search possible misconfiguration in a DNS related to security protections of email service from the domain name

A python3 script for search possible misconfiguration in a DNS related to security protections of email service from the domain name. This project is for educational use, we are not responsible for i

20 Dec 02, 2022
Operational information regarding the vulnerability in the Log4j logging library.

Log4j Vulnerability (CVE-2021-44228) This repo contains operational information regarding the vulnerability in the Log4j logging library (CVE-2021-442

Nationaal Cyber Security Centrum (NCSC-NL) 1.9k Dec 26, 2022