minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

Overview

rust-mdbg: Minimizer-space de Bruijn graphs (mdBG) for whole-genome assembly

rust-mdbg is an ultra-fast minimizer-space de Bruijn graph (mdBG) implementation, geared towards the assembly of long and accurate reads such as PacBio HiFi.

Rationale

rust-mdbg performs mdBG construction of a 52x human genome HiFi data in around 10 minutes on 8 threads, with 10GB of maximum RAM usage.

rust-mdbg is fast because it operates in minimizer-space, meaning that the reads, the assembly graph, and the final assembly, are all represented as ordered lists of minimizers, instead of strings of nucleotides. A conversion step then yields a classical base-space representation.

Limitations

However, this high speed comes at a cost! :)

  • rust-mdbg gives good-quality results but still of lower contiguity and completeness than state-of-the-art assemblers such as HiCanu and hifiasm.
  • rust-mdbg performs best with at least 40x to 50x of coverage.
  • No polishing step is implemented; so, assemblies will have around the same accuracy as the reads.

Installation

Clone the repository (make sure you have a working Rust environment), and run

cargo build --release

For performing graph simplifications, gfatools is required.

Quick start

cargo build --release
target/release/rust-mdbg reads-0.00.fa.gz -k 7 --density 0.0008 -l 10 --minabund 2 --prefix example
utils/magic_simplify example

Multi-k assembly

For better contiguity, try the provided multi-k assembly script. It performs assembly iteratively, starting with k= 10, up to an automatically-determined largest k. This comes at the expense of ~7x longer running time.

utils/multik <reads.fq.gz> <some_output_prefix> <nb_threads>

Overview

rust-mdbg is a modular assembler. It consists of three components:

  1. rust-mdbg, to perform assembly in minimizer-space
  2. gfatools (external component), to perform graph simplifications
  3. to_basespace, to convert a minimizer-space assembly to base-space

For convenience, components 2 and 3 are wrapped into a script called magic_simplify.

Input

rust-mdbg takes a single FASTA/FASTQ input (gzip-compressed or not). Multi-line sequences, and sequences with lowercase characters, are not supported.

If you have seqtk installed, you can use

seqtk seq -A reads.unformatted.fq > reads.fa

to format reads accordingly.

Output data

The output of rust-mdbg consists of:

  • A .gfa file containing the minimizer-space de Bruijn graph, without sequences,
  • Several .sequences files containing the sequences of the nodes of the graph.

The executable to_basespace allows to combine both outputs and produce a .gfa file, with sequences.

Running an example

A sample set of reads is provided in the example/ folder. Run

target/release/rust-mdbg reads-0.00.fa.gz -k 7 --density 0.0008 -l 10 --minabund 2 --prefix example

which will create an example.gfa file.

In order to populate the .gfa file with base-space sequences and perform graph simplification, run

utils/magic_simplify example

which will create example.msimpl.gfa and example.msimpl.fa files.

Parameters

The main parameters of rust-mdbg are the k-min-mer value k, the minimizer length l, and the minimizer density d (delta in the paper). Another parameter is --presimp, set by default to 0.01, which performs a graph simplification: a neighbor node is deleted if its abundance is below 1% that of min(max(abundance of other neighbors), abundance of current node). For better results, and also without the need to set any parameter, try the multi-k strategy (see Multi-k assembly section). This section explains how parameters are set in single-k assembly.

All three parameters k, l, and d significantly impact the quality of results. One can think of them as a generalization of the k parameter in classical de Bruijn graphs. When you run rust-mdbg without specifying parameters, it sets them to:

d = 0.003

l = 12

k = 0.75 * average_readlen * d

These parameters will give reasonable, but far from optimal, draft assemblies. We experimentally found that the best results are often obtained with k values within 20-40, l within 10-14, and d within 0.001-0.005. Setting k and d such that the ratio k/d is slightly below the read length appears to be an effective strategy.

For further information on usage and parameters, run

target/release/rust-mdbg -h

for a one-line summary of each flag, or run

target/release/rust-mdbg --help

for a lengthy explanation of each flag.

Performance

Dataset Genome size (HPC) Coverage
Parameters
N50 Runtime Memory
D. melanogaster HiFi 98Mbp 100x auto
multi-k
k=35,l=12,d=0.002
2.5Mbp
2.5Mbp
6.0Mbp
2m15s
15m
1m9s
2.5GB
1.8GB
1.5GB
Strawberry HiFi 0.7Gbp 36x auto
multi-k
k=38,l=14,d=0.003
0.5Mbp
1Mbp
0.7Mbp
6m12s
40m
5m31s
12GB
11GB
10GB
H. sapiens (HG002) HiFi 2.2Gbp 52x auto
multi-k
k=21,l=14,d=0.003
1.0Mbp
16.9Mbp
13.9Mbp
27m30s
3h15m
10m23s
16.9GB
20GB
10.1GB

Runtime breakdown:
H. sapiens: 10m23s = 6m51s rust-mdbg + 1m48s gfatools + 1m44s to_basespace

The runs with custom parameters (from the paper) were made with commit b99d938, and unlike in the paper, we did not use robust minimizers which requires additional l-mer counting beforehand. For historical reasons, reads and assemblies were homopolymer-compressed in those experiments and the homopolymer-compressed genome size is reported. So beware that these numbers are not directly comparable to the output of other assemblers. In addition to the parameters shown in the table, the rust-mdbg command line also contained --bf --no-error-correct --threads 8.

Running rust-mdbg without graph simplifications

To convert an assembly to base-space without performing any graph simplifications, there are two ways:

  • with gfatools
gfatools asm -u  example.gfa > example.unitigs.gfa
target/release/to_basespace --gfa example.unitigs.gfa --sequences example.sequences
  • without gfatools (slower, but the code is more straightforward to understand)

utils/complete_gfa.py example.sequences example.gfa

In both cases, this will create an example.complete.gfa file that you can convert to FASTA with

bash utils/gfa2fasta.sh example.complete

License

rust-mdbg is freely available under the MIT License.

Developers

  • Barış Ekim, supervised by Bonnie Berger at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT)
  • Rayan Chikhi at the Department of Computational Biology at Institut Pasteur

Citation

Minimizer-space de Bruijn graphs (2021) BiorXiv

@article {mdbg,
	author = {Ekim, Bar{\i}{\c s} and Berger, Bonnie and Chikhi, Rayan},
	title = {Minimizer-space de Bruijn graphs},
	year = {2021},
	doi = {10.1101/2021.06.09.447586},
	publisher = {Cold Spring Harbor Laboratory},
	journal = {bioRxiv}
}

Contact

Should you have any inquiries, please contact Barış Ekim at baris [at] mit [dot] edu, or Rayan Chikhi at rchikhi [at] pasteur [dot] fr.

Comments
  • m1 arm support

    m1 arm support

    Hello rust-mdbg team,

    It seems that there is no support for ARM structure yet, I have the following error when compiling on ARM64:

    The following warnings were emitted during compilation:

    warning: cc: error: unrecognized command-line option '-msse4.2' warning: cc: error: unrecognized command-line option '-maes' warning: cc: error: unrecognized command-line option '-mavx' warning: cc: error: unrecognized command-line option '-mavx2'

    error: failed to run custom build command for fasthash-sys v0.3.2

    Caused by: process didn't exit successfully: /Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-17495dcf061597dc/build-script-build (signal: 6, SIGABRT: process abort signal) --- stdout TARGET = Some("aarch64-apple-darwin") OPT_LEVEL = Some("3") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") CC_aarch64-apple-darwin = None CC_aarch64_apple_darwin = None HOST_CC = None CC = None HOST = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") CFLAGS_aarch64-apple-darwin = None CFLAGS_aarch64_apple_darwin = None HOST_CFLAGS = None CFLAGS = None DEBUG = Some("false") running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-Wno-implicit-fallthrough" "-Wno-unknown-attributes" "-msse4.2" "-maes" "-mavx" "-mavx2" "-DT1HA0_RUNTIME_SELECT=1" "-DT1HA0_AESNI_AVAILABLE=1" "-Wall" "-Wextra" "-o" "/Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-d509c7de4ba60bc4/out/src/fasthash.o" "-c" "src/fasthash.cpp" cargo:warning=cc: error: unrecognized command-line option '-msse4.2' cargo:warning=cc: error: unrecognized command-line option '-maes' cargo:warning=cc: error: unrecognized command-line option '-mavx' cargo:warning=cc: error: unrecognized command-line option '-mavx2' exit status: 1

    --- stderr thread 'main' panicked at '

    Internal error occurred: Command "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-Wno-implicit-fallthrough" "-Wno-unknown-attributes" "-msse4.2" "-maes" "-mavx" "-mavx2" "-DT1HA0_RUNTIME_SELECT=1" "-DT1HA0_AESNI_AVAILABLE=1" "-Wall" "-Wextra" "-o" "/Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-d509c7de4ba60bc4/out/src/fasthash.o" "-c" "src/fasthash.cpp" with args "cc" did not execute successfully (status code exit status: 1).

    ', /Users/jianshuzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.55/src/lib.rs:1672:5 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace fatal runtime error: failed to initiate panic, error 5 warning: build failed, waiting for other jobs to finish... error: build failed

    Any possibilities to provide support?

    Thanks,

    Jianshu

    opened by jianshu93 6
  • Recommended parameters for metagenome assembly and a related question

    Recommended parameters for metagenome assembly and a related question

    Hi,

    I want to try mdBG on real metagenome samples. I wonder if you could suggest a parameter combo to use (or combos to try out). And should I do the multi-k mode?

    For the real samples, I could crudely guess the number of species in the library, and perhaps an exaggerated total genome size from it as well. I'm not sure if these could be useful.

    Another question is: could mdBG output contig coverage estimates?

    Thank you!

    question 
    opened by xfengnefx 5
  • Unable to assemble the D.melanogaster genome from 24kb HiFi reads

    Unable to assemble the D.melanogaster genome from 24kb HiFi reads

    thread 'main' panicked at 'called Result::unwrap() on an Err value: Error { kind: BufferLimit }', src/main.rs:187:33

    Reads taken from: https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos2/sra-pub-run-17/SRR1023860/SRR10238607.1

    Command: utils/multik <reads> <output prefix> 56

    Happens both with and without homopolymer compression.

    bug 
    opened by sebschmi 5
  • multik executes run with k < l

    multik executes run with k < l

    When assembling E.coli with the multik script, it runs mdbg with k = 10 and l = 12, resulting in mdbg panicking with "Non-ACGTN nucleotide encountered!"

    The multik script then continues silently.

    output
    thread '<unnamed>' panicked at 'Non-ACGTN nucleotide encountered!', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/nthash-0.5.0/src/lib.rs:43:9
    stack backtrace:
       0: std::panicking::begin_panic
       1: <nthash::NtHashIterator as core::iter::traits::iterator::Iterator>::next
       2: rust_mdbg::read::Read::extract
       3: rust_mdbg::main::{{closure}}
       4: rust_mdbg::main::{{closure}}
       5: <F as scoped_threadpool::FnBox>::call_box
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: SendError { .. }', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:213:73
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
       1: core::panicking::panic_fmt
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
       2: core::result::unwrap_failed
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1613:5
       3: scoped_threadpool::Pool::scoped
       4: core::ops::function::FnOnce::call_once{{vtable.shim}}
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: SendError { .. }', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:219:72
    stack backtrace:
       0:     0x55e87265b2ec - std::backtrace_rs::backtrace::libunwind::trace::h09f7e4e089375279
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
       1:     0x55e87265b2ec - std::backtrace_rs::backtrace::trace_unsynchronized::h1ec96f1c7087094e
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
       2:     0x55e87265b2ec - std::sys_common::backtrace::_print_fmt::h317b71fc9a5cf964
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:67:5
       3:     0x55e87265b2ec - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::he3555b48e7dfe7f0
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:46:22
       4:     0x55e87267d4fc - core::fmt::write::h513b07ca38f4fb1b
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/fmt/mod.rs:1149:17
       5:     0x55e872657995 - std::io::Write::write_fmt::haf8c932b52111354
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/io/mod.rs:1697:15
       6:     0x55e87265cec0 - std::sys_common::backtrace::_print::h195c38364780a303
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:49:5
       7:     0x55e87265cec0 - std::sys_common::backtrace::print::hc09dfdea923b6730
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:36:9
       8:     0x55e87265cec0 - std::panicking::default_hook::{{closure}}::hb2e38ec0d91046a3
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:211:50
       9:     0x55e87265ca75 - std::panicking::default_hook::h60284635b0ad54a8
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:228:9
      10:     0x55e87265d574 - std::panicking::rust_panic_with_hook::ha677a669fb275654
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:606:17
      11:     0x55e87265d050 - std::panicking::begin_panic_handler::{{closure}}::h976246fb95d93c31
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:502:13
      12:     0x55e87265b794 - std::sys_common::backtrace::__rust_end_short_backtrace::h38077ee5b7b9f99a
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:139:18
      13:     0x55e87265cfb9 - rust_begin_unwind
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
      14:     0x55e872545651 - core::panicking::panic_fmt::h35f3a62252ba0fd2
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
      15:     0x55e872545743 - core::result::unwrap_failed::hb53671404b9e33c2
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1613:5
      16:     0x55e8725e8f9f - scoped_threadpool::Scope::join_all::hcb532061605ab1b0
      17:     0x55e87255ee33 - scoped_threadpool::Pool::scoped::hb64980f16173dad1
      18:     0x55e87255b128 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hf2fa39940289df70
      19:     0x55e87255ee5e - std::sys_common::backtrace::__rust_begin_short_backtrace::h6bd664fd6d7bb829
      20:     0x55e87259a883 - core::ops::function::FnOnce::call_once{{vtable.shim}}::ha5de8d6fee3bff3e
      21:     0x55e872660893 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hcbc6d2d80772be64
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9
      22:     0x55e872660893 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h9bffa2ca65a1d6e6
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9
      23:     0x55e872660893 - std::sys::unix::thread::Thread::new::thread_start::ha678a8b0caec8f55
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys/unix/thread.rs:106:17
      24:     0x7f16121a96db - start_thread
                                   at /build/glibc-S9d2JN/glibc-2.27/nptl/pthread_create.c:463
      25:     0x7f161193071f - __GI___clone
                                   at /build/glibc-S9d2JN/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      26:                0x0 - <unknown>
    thread panicked while panicking. aborting.
    Command terminated by signal 4
    625.22user 29.59system 1:35.45elapsed 685%CPU (0avgtext+0avgdata 579660maxresident)k
    
    bug 
    opened by sebschmi 5
  • Missing assembly-final.msimpl.fa in multik mode

    Missing assembly-final.msimpl.fa in multik mode

    Hello,

    Thank you for this tool.

    I ran mdbg with the following command line: multik reads.fastq.gz assembly 56 10 1000

    I get the files assembly-k*.gfa, assembly-k*.msimpl.gfa and assembly-k*.msimpl.fa with k from 10 to 1000, but I do not get the final output assembly-final.msimpl.fa.

    bug 
    opened by nadegeguiglielmoni 5
  • example.sequences file

    example.sequences file

    Sorry if I'm being slow but when I create the gfa file multiple .sequences files are created but when then in the readme to_basespace takes only a single example.sequences file. Where does this come from? Do you combine the .sequences files in some way or..?

    Thanks!

    question 
    opened by samlipworth 4
  • KSizeOutOfRange errors during rust-mdbg run

    KSizeOutOfRange errors during rust-mdbg run

    Hi there,

    Trying out multik with some Nanopore metagenomics reads (seqtk-formatted) and I'm currently getting the errors below as it iteratively goes through the different -k values. Any ideas on what might be going wrong and how I might fix it?

    So far, the run hasn't fully aborted and I'm letting it run until I get some output - will let y

    $ ../rust-mdbg/utils/multik sup.fastq.gz std_sipp 10
    avg readlen: 6147875, max k: 17521
    assembly with k=10
        Finished release [optimized] target(s) in 0.09s
         Running `sup.fastq.gz -k 10 -l 12 --density 0.003 --minabund 2 --threads 10 --prefix std_sipp-k10 --bf`
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 4 }', src/read.rs:148:63
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 11 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 4 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 5 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 3 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 10 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 10 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', /home/andre/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:213:73
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', /home/andre/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:219:72
    stack backtrace:
       0:     0x5591d2438050 - std::backtrace_rs::backtrace::libunwind::trace::h63b7a90188ab5fb3
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
       1:     0x5591d2438050 - std::backtrace_rs::backtrace::trace_unsynchronized::h80aefbf9b851eca7
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
       2:     0x5591d2438050 - std::sys_common::backtrace::_print_fmt::hbef05ae4237a4d72
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:67:5
       3:     0x5591d2438050 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h28abce2fdb9884c2
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:46:22
       4:     0x5591d245670f - core::fmt::write::h3b84512577ca38a8
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/fmt/mod.rs:1092:17
       5:     0x5591d24352b2 - std::io::Write::write_fmt::h465f8feea02e2aa1
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/io/mod.rs:1572:15
       6:     0x5591d243a185 - std::sys_common::backtrace::_print::h525280ee0d29bdde
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:49:5
       7:     0x5591d243a185 - std::sys_common::backtrace::print::h1f0f5b9f3ef8fb78
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:36:9
       8:     0x5591d243a185 - std::panicking::default_hook::{{closure}}::ha5838f6faa4a5a8f
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:208:50
       9:     0x5591d2439c33 - std::panicking::default_hook::hfb9fe98acb0dcb3b
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:225:9
      10:     0x5591d243a78d - std::panicking::rust_panic_with_hook::hb89f5f19036e6af8
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:591:17
      11:     0x5591d243a327 - std::panicking::begin_panic_handler::{{closure}}::h119e7951427f41da
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:497:13
      12:     0x5591d243850c - std::sys_common::backtrace::__rust_end_short_backtrace::hce386c44bf47a128
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:141:18
      13:     0x5591d243a289 - rust_begin_unwind
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:493:5
      14:     0x5591d2323341 - core::panicking::panic_fmt::h2242888e8769cd33
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/panicking.rs:92:14
      15:     0x5591d2323233 - core::option::expect_none_failed::hb1edf11f73e63728
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/option.rs:1329:5
      16:     0x5591d23c108f - scoped_threadpool::Scope::join_all::hd6132fc8a04c2f8d
      17:     0x5591d233fcbb - core::ops::function::FnOnce::call_once{{vtable.shim}}::h198262ef865dc7ad
      18:     0x5591d2391412 - std::sys_common::backtrace::__rust_begin_short_backtrace::he7799c2fe1d42088
      19:     0x5591d234f443 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hc12e7712db099355
      20:     0x5591d243d28a - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hc444a77f8dd8d825
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546:9
      21:     0x5591d243d28a - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h8b68a0a9a2093dfc
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546:9
      22:     0x5591d243d28a - std::sys::unix::thread::Thread::new::thread_start::hb95464447f61f48d
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys/unix/thread.rs:71:17
      23:     0x7f00c05ae6ba - start_thread
      24:     0x7f00bfdd751d - clone
      25:                0x0 - <unknown>
    thread panicked while panicking. aborting.
    Command terminated by signal 4
    537.76user 35.81system 3:28.70elapsed 274%CPU (0avgtext+0avgdata 855256maxresident)k
    0inputs+1781280outputs (0major+10608356minor)pagefaults 0swaps
    + /usr/bin/time /home/andre/gfatools/gfatools asm std_sipp-k10.gfa -t 10,50000 -t 10,50000 -b 100000 -b 100000 -t 10,50000 -b 100000 -b 100000 -b 100000 -t 10,50000 -b 100000 -t 10,50000 -b 1000000 -t 10,150000 -b 1000000 -u
    ERROR: failed to read the graph
    Command exited with non-zero status 2
    0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 1572maxresident)k
    0inputs+0outputs (0major+68minor)pagefaults 0swaps
    + python /home/andre/rust-mdbg/utils/gfa_break_loops.py std_sipp-k10.tmp1.gfa
    + [[ ! std_sipp-k10 == *--old-behavior* ]]
    + cargo run --manifest-path /home/andre/rust-mdbg/utils/../Cargo.toml --release --bin to_basespace -- --gfa std_sipp-k10.tmp2.gfa --sequences std_sipp-k10
        Finished release [optimized] target(s) in 0.07s
         Running `/home/andre/rust-mdbg/target/release/to_basespace --gfa std_sipp-k10.tmp2.gfa --sequences std_sipp-k10`
    + mv std_sipp-k10.tmp2.gfa.complete.gfa std_sipp-k10.tmp2.gfa
    + /usr/bin/time /home/andre/gfatools/gfatools asm std_sipp-k10.tmp2.gfa -t 10,50000 -b 100000 -t 10,100000 -b 1000000 -t 10,150000 -b 1000000 -u
    [M::main] Version: 0.4-r214-dirty
    [M::main] CMD: /home/andre/gfatools/gfatools asm -t 10,50000 -b 100000 -t 10,100000 -b 1000000 -t 10,150000 -b 1000000 -u std_sipp-k10.tmp2.gfa
    [M::main] Real time: 0.000 sec; CPU: 0.000 sec
    0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 1752maxresident)k
    0inputs+0outputs (0major+74minor)pagefaults 0swaps
    ++ stat -c%s std_sipp-k10.tmp2.gfa
    + filesize=9
    + ((  filesize > 100000000 ))
    + mv std_sipp-k10.tmp3.gfa std_sipp-k10.msimpl.gfa
    + [[ std_sipp-k10 != *\-\-\k\e\e\p* ]]
    + rm -rf std_sipp-k10.tmp1.gfa std_sipp-k10.tmp2.gfa
    + bash /home/andre/rust-mdbg/utils/gfa2fasta.sh std_sipp-k10.msimpl
    2.19user 0.28system 0:02.50elapsed 99%CPU (0avgtext+0avgdata 24008maxresident)k
    0inputs+8outputs (0major+9763minor)pagefaults 0swaps
    
    bug 
    opened by GeoMicroSoares 4
  • Nanopore metagenome assembly parameters

    Nanopore metagenome assembly parameters

    Hi there,

    Congratulations, this tool seems amazing and I can't wait to use it with my data! Are there specific parameters that I can use/optimize with rust-mdbg to assemble Nanopore metagenomes?

    Thanks.

    enhancement 
    opened by GeoMicroSoares 4
  • Problems in ruinning `rust-mdbg` without graph simplifications

    Problems in ruinning `rust-mdbg` without graph simplifications

    Hi,

    I've installed rust-mdbg

    git clone --recursive https://github.com/ekimb/rust-mdbg.git
    cd rust-mdbg
    cargo build --release
    

    I've run it

    ~/git/rust-mdbg/target/release/rust-mdbg ~/git/rust-mdbg/example/reads-0.00.fa.gz -k 7 --threads 1 --density 0.0008 -l 10 --minabund 2 --prefix example
    ls example*
    
    example.140646999713344.sequences  example.gfa
    

    and finally tried both approaches to go in base-space

    gfatools asm -u  example.gfa > example.unitigs.gfa
    ~/git/rust-mdbg/target/release/to_basespace --gfa example.unitigs.gfa --sequences example.sequences
    
    [M::main] Version: 0.5-r250-dirty
    [M::main] CMD: gfatools asm -u example.gfa
    [M::main] Real time: 0.001 sec; CPU: 0.003 sec
    Done parsing unitigs GFA, got 1 unitigs.
    Done parsing original GFA, with 0 k-min-mers.
    Done parsing .sequences file, recorded 0 sequences.
    thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/to_basespace.rs:258:55
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    

    and

    python3 ~/git/rust-mdbg/utils/complete_gfa.py example.*.sequences example.gfa
    
    Traceback (most recent call last):
      File "/home/guarracino/git/rust-mdbg/utils/complete_gfa.py", line 32, in <module>
        source_minims = node_minims[spl[1]]
    KeyError: '7'
    

    Am I doing silly errors somewhere?

    opened by AndreaGuarracino 3
  • Differences between uncompressed & compressed fastq files

    Differences between uncompressed & compressed fastq files

    This seems like a bug, but maybe I'm just misunderstanding something with how mdbg works.

    I discovered this after trying to run several human assemblies of varying input coverage (20x,30x,40x,50x) starting from hifi_reads.fq.gz files.

    The contiguity (n50) of all of the assemblies was in the same ballpark as the read n50 and there appeared to be no benefit to increased coverage. This coupled with the poor results in general had me scratching my head so I tried a different test dataset that was an uncompressed hifi_reads.fq and I got a great assembly.

    Curiosity piqued, I went back an unzipped the 20x coverage point I had tried earlier and got a much better assembly.

    See attached logs for logs from both the 20x assemblies starting from both hifi_reads.fq and hifi_reads.fq.gz

    Is this an actual bug, or is it just user error?

    hifi_reads_gzipped.log hifi_reads.log

    bug 
    opened by gconcepcion 3
  • magic_simplify crashes while running in Docker container on HPC cluster

    magic_simplify crashes while running in Docker container on HPC cluster

    Hey! I'm currently trying to run rust-mdbg as part of a fungi genome assembly pipeline using nextflow and docker containers on an HPC cluster and I'm running into these issues where the magic_simplify script crashes with os error 30 : read-only file system. I already checked the docker container and made sure the rust-mdbg dir is not read-only so I'm not sure what exactly is happening here. Maybe someone knows whats up? I'm using singularity to run the docker containers on the HPC cluster I just hope this is not some compatibility issue with singularity/rust..

    command.log

    bug 
    opened by fischer-hub 3
Releases(v1.0.1)
Owner
Barış Ekim
PhD student in Berger Group at @mit.
Barış Ekim
ML-Decoder: Scalable and Versatile Classification Head

ML-Decoder: Scalable and Versatile Classification Head Paper Official PyTorch Implementation Tal Ridnik, Gilad Sharir, Avi Ben-Cohen, Emanuel Ben-Baru

189 Jan 04, 2023
WTTE-RNN a framework for churn and time to event prediction

WTTE-RNN Weibull Time To Event Recurrent Neural Network A less hacky machine-learning framework for churn- and time to event prediction. Forecasting p

Egil Martinsson 727 Dec 28, 2022
Solutions and questions for AoC2021. Merry christmas!

Advent of Code 2021 Merry christmas! 🎄 🎅 To get solutions and approximate execution times for implementations, please execute the run.py script in t

Wilhelm Ågren 5 Dec 29, 2022
Adabelief-Optimizer - Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"

AdaBelief Optimizer NeurIPS 2020 Spotlight, trains fast as Adam, generalizes well as SGD, and is stable to train GANs. Release of package We have rele

Juntang Zhuang 998 Dec 29, 2022
Code for the upcoming CVPR 2021 paper

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel J. Brostow and Michael

Niantic Labs 496 Dec 30, 2022
Corruption Invariant Learning for Re-identification

Corruption Invariant Learning for Re-identification The official repository for Benchmarks for Corruption Invariant Person Re-identification (NeurIPS

Minghui Chen 73 Dec 08, 2022
QICK: Quantum Instrumentation Control Kit

QICK: Quantum Instrumentation Control Kit The QICK is a kit of firmware and software to use the Xilinx RFSoC to control quantum systems. It consists o

81 Dec 15, 2022
A Temporal Extension Library for PyTorch Geometric

Documentation | External Resources | Datasets PyTorch Geometric Temporal is a temporal (dynamic) extension library for PyTorch Geometric. The library

Benedek Rozemberczki 1.9k Jan 07, 2023
🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱

Monitor deep learning model training and hardware usage from mobile. 🔥 Features Monitor running experiments from mobile phone (or laptop) Monitor har

labml.ai 1.2k Dec 25, 2022
Time Series Cross-Validation -- an extension for scikit-learn

TSCV: Time Series Cross-Validation This repository is a scikit-learn extension for time series cross-validation. It introduces gaps between the traini

Wenjie Zheng 222 Jan 01, 2023
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 08, 2023
Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

ming71 215 Nov 28, 2022
Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

Gty 44 Dec 17, 2022
Prototype python implementation of the ome-ngff table spec

Prototype python implementation of the ome-ngff table spec

Kevin Yamauchi 8 Nov 20, 2022
ExCon: Explanation-driven Supervised Contrastive Learning

ExCon: Explanation-driven Supervised Contrastive Learning Link to the paper: https://arxiv.org/pdf/2111.14271.pdf Contributors of this repo: Zhibo Zha

Zhibo (Darren) Zhang 18 Nov 01, 2022
Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation'

OD-Rec Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation' Paper, saved teacher models and Andro

Xin Xia 11 Nov 22, 2022
Human motion synthesis using Unity3D

Human motion synthesis using Unity3D Prerequisite: Software: amc2bvh.exe, Unity 2017, Blender. Unity: RockVR (Video Capture), scenes, character models

Hao Xu 9 Jun 01, 2022
Video2x - A lossless video/GIF/image upscaler achieved with waifu2x, Anime4K, SRMD and RealSR.

Official Discussion Group (Telegram): https://t.me/video2x A Discord server is also available. Please note that most developers are only on Telegram.

K4YT3X 5.9k Dec 31, 2022
EdMIPS: Rethinking Differentiable Search for Mixed-Precision Neural Networks

EdMIPS is an efficient algorithm to search the optimal mixed-precision neural network directly without proxy task on ImageNet given computation budgets. It can be applied to many popular network arch

Zhaowei Cai 47 Dec 30, 2022
[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting

[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting [Paper] [Project Website] [Google Colab] We propose a method for converting a

Virginia Tech Vision and Learning Lab 6.2k Jan 01, 2023