HiFi DeepVariant + WhatsHap workflow

Workflow steps

align HiFi reads to reference with pbmm2
call small variants with DeepVariant, using two-pass method (DeepVariant ➡️ WhatsHap phase ➡️ WhatsHap haplotag ➡️ DeepVariant)
phase small variants with WhatsHap
haplotag aligned BAMs with WhatsHap and merge

Directory structure within basedir

.
├── cluster_logs  # slurm stderr/stdout logs
├── reference
│   ├── reference.chr_lengths.txt  # cut -f1,2 reference.fasta > reference.chr_lengths.txt
│   ├── reference.fasta
│   └── reference.fasta.fai
├── samples
│   └── 
   
      # sample_id regex: r'[A-Za-z0-9_-]+'
│       ├── whatshap/  # phased small variants; merged haplotagged alignments
│       ├── logs/  # per-rule stdout/stderr logs
│       ├── aligned/  # intermediate
│       ├── deepvariant/  # intermediate
│       ├── deepvariant_intermediate/  # intermediate
│       └── whatshap_intermediate/  # intermediate
├── smrtcells
│   ├── done  # move folders from smrtcells/ready to smrtcells/done to prevent re-processing
│   └── ready
│       └── 
    
       # uBAMs or FASTQs per sample
│                        # filename regex: r'm\d{5}[Ue]?_\d{6}_\d{6}).(ccs|hifi_reads).bam' or r'm\d{5}[Ue]?_\d{6}_\d{6}).fastq.gz'
└── workflow  # clone of this repo

To run the pipeline

$ conda create \
    --channel bioconda \
    --channel conda-forge \
    --prefix ./conda_env \
    python=3 snakemake mamba lockfile

$ conda activate ./conda_env

$ sbatch workflow/run_snakemake.sh <sample_id>

HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

Related tags

Overview

HiFi DeepVariant + WhatsHap workflow

Workflow steps

Directory structure within basedir

To run the pipeline

Owner

William Rowell

List of GSoC organisations with number of times they have been selected.

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

ReCoin - Restoring our environment and businesses in parallel

PRAnCER is a web platform that enables the rapid annotation of medical terms within clinical notes.

A CRM department in a local bank works on classify their lost customers with their past datas. So they want predict with these method that average loss balance and passive duration for future.

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

A framework for cleaning Chinese dialog data

Toward Model Interpretability in Medical NLP

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

Asr abc - Automatic speech recognition(ASR),中文语音识别

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

texlive expressions for documents

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Text preprocessing, representation and visualization from zero to hero.

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

An evaluation toolkit for voice conversion models.

Training open neural machine translation models