A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

Overview

ParallelFold

Author: Bozitao Zhong

This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model) of Alphafold2 local version.

How to install

First you should install Alphafold2. You can choose one of the following methods to install Alphafold locally.

  • Use official version from DeepMind with docker.
  • There are some other versions install Alphafold without docker.
  • Also you can use my guide which based on non_docker version and it can adjust to different cuda versions (cuda driver >= 10.1)

Then, put these 4 files in your Alphafold folder, this folder should have an original run_alphafold.py file, and I use a run_alphafold.sh file to run Alphafold easily (learned from non_docker version)

4 files:

  • run_alphafold.py: modified version of original run_alphafold.py, it skips featuring steps when there exists feature.pkl in output folder
  • run_alphaold.sh: bash script to run run_alphafold.py
  • run_feature.py: modified version of original run_alphafold.py, it exit python process after finished writing feature.pkl
  • run_feature.sh: bash scripts to run run_feature.py

How to run

First, you need CPUs to run run_feature.sh:

./run_feature.sh -d data -o output -m model_1 -f input/test3.fasta -t 2021-07-27

8 CPUs is enough, according to my test, more CPUs won't help with speed.

GPU can accelerate the hhblits step (but I think you choose this repo because GPU is expensive)

Featuring step will output the feature.pkl and MSA folder in your output folder: ./output/JOBNAME/

PS: Here I put my input files in an input folder to better organize my files, you can remove this.

Second, you can run run_alphafold.sh using GPU:

./run_alphafold.sh -d data -o output -m model_1,model_2,model_3,model_4,model_5 -f input/test.fasta -t 2021-07-27

If you have successfully output feature.pkl, you can have a very fast featuring step

I have also upload my scripts in SJTU HPC (using slurm): sub_alphafold.slurm and sub_feature.slurm

Other Files

In ./Alphafold folder, I modified some python files (hhblits.py, hmmsearch.py, jackhmmer.py) , give these steps more CPUs for acceleration. But these processes have been tested and shown to be unable to accelerate by providing more CPU. Maybe this is because

Probably because DeepMind uses a wrapped process, I'm trying to improve it (work in progress).

If you have any question, please send your problem in issues

Comments
  • 运行脚本后,还是有问题。

    运行脚本后,还是有问题。

    博士好! 我发现我运行脚本后,cpu部分是可以正常运行了,但是GPU部分不管短序列(200+aa)还是长序列(1800+aa),都会报错,我的脚本如下: #!/bin/bash module load anaconda/2020.11 source activate /data/home/zhoujy/run/alphafold2 ./run_feature.sh -d /data/public/alphafold2 -o /data/home/zhoujy/run/output -m model_1 -f /data/home/zhoujy/run/input/Q9NYP9.fasta -t 2021-07-27 ./run_alphafold.sh -d /data/public/alphafold2 -o /data/home/zhoujy/run/output -m model_1,model_2,model_3,model_4,model_5 -f /data/home/zhoujy/run/input/Q9NYP9.fasta -t 2021-07-27

    用了1张GPU卡提交的。

    报错内容如下:

    87 I0927 17:05:14.162350 139818804778816 xla_bridge.py:226] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available. 88 I0927 17:05:23.883118 139818804778816 run_alphafold.py:272] Have 5 models: ['model_1', 'model_2', 'model_3', 'model_4', 'model_5'] 89 I0927 17:05:23.883379 139818804778816 run_alphafold.py:285] Using random seed 491376288278862761 for the data pipeline 90 I0927 17:05:23.892619 139818804778816 run_alphafold.py:151] Running model model_1 91 I0927 17:05:34.480318 139818804778816 model.py:131] Running predict with shape(feat) = {'aatype': (4, 233), 'residue_index': (4, 233), 'seq_length': (4,) , 'template_aatype': (4, 4, 233), 'template_all_atom_masks': (4, 4, 233, 37), 'template_all_atom_positions': (4, 4, 233, 37, 3), 'template_sum_probs': (4 , 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 233), 'msa_mask': (4, 508, 233), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'templat e_mask': (4, 4), 'template_pseudo_beta': (4, 4, 233, 3), 'template_pseudo_beta_mask': (4, 4, 233), 'atom14_atom_exists': (4, 233, 14), 'residx_atom14_to_ atom37': (4, 233, 14), 'residx_atom37_to_atom14': (4, 233, 37), 'atom37_atom_exists': (4, 233, 37), 'extra_msa': (4, 5120, 233), 'extra_msa_mask': (4, 51 20, 233), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 233), 'true_msa': (4, 508, 233), 'extra_has_deletion': (4, 5120, 233), 'extra_deletion_v alue': (4, 5120, 233), 'msa_feat': (4, 508, 233, 49), 'target_feat': (4, 233, 22)} 92 2021-09-27 17:05:35.143686: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:81] Couldn't get ptxas version string: Internal: Run ning ptxas --version returned 32512 93 2021-09-27 17:05:35.324896: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilati on of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, pl ease verify that sufficient filesystem space is provided. 94 Fatal Python error: Aborted 95 96 Thread 0x00007f2a1a311740 (most recent call first): 97 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 387 in backend_compile 98 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 324 in xla_primitive_callable 99 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/util.py", line 188 in cached 100 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/util.py", line 195 in wrapper 101 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 275 in apply_primitive 102 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/core.py", line 612 in process_primitive 103 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/core.py", line 267 in bind 104 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 388 in shift_right_logical 105 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/prng.py", line 229 in threefry_seed 106 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/prng.py", line 191 in seed_with_impl 107 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/random.py", line 105 in PRNGKey 108 File "/data/run01/zhoujy/ParallelFold-main/alphafold/model/model.py", line 133 in predict 109 File "/data/run01/zhoujy/ParallelFold-main/run_alphafold.py", line 158 in predict_structure 110 File "/data/run01/zhoujy/ParallelFold-main/run_alphafold.py", line 289 in main 111 File "/data/home/zhoujy/.local/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main 112 File "/data/home/zhoujy/.local/lib/python3.8/site-packages/absl/app.py", line 312 in run 113 File "/data/run01/zhoujy/ParallelFold-main/run_alphafold.py", line 316 in

    从92-113行,不论序列长短都会出现这种报错。这是什么原因引起的呢? @Zuricho

    opened by zhoujingyu13687306871 16
  • Where Can I find The Protein sequence?

    Where Can I find The Protein sequence?

    After Reading the Article, AlphaFold Deployment and Optimization on HPC Platform, I want make some experiments according to the arctile, But I cannot find the Protein sequence online. Can you tell me the way to downloading the fasta file in the article?

    opened by yanchenmochen 4
  • How to run GPU part?

    How to run GPU part?

    How do I run model inference on GPU part of the process after featurization step? Does the model inference step automatically find feature.pkl in some folder?

    opened by hrzolix 4
  • How to accelerate the HHBLITS step with GPU

    How to accelerate the HHBLITS step with GPU

    Halo! Thanks for your good job! I have some question about this job:

    Q1: Do you Know how to accelerate the HHBLITS step with GPU? image

    Q2: I use --cpu 8 to run jackhmmer but alway just use 2 cpu and I dont know why

    image

    opened by Licko0909 4
  • 2022-01-11 09:19:03.536275: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: '  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Fatal Python error: Aborted

    2022-01-11 09:19:03.536275: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Fatal Python error: Aborted

    2022-01-11 09:19:02.638037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 28422 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:41:00.0, compute capability: 7.0) I0111 09:19:03.171788 47078973446272 model.py:165] Running predict with shape(feat) = {'aatype': (4, 45), 'residue_index': (4, 45), 'seq_length': (4,), 'template_aatype': (4, 4, 45), 'template_all_atom_masks': (4, 4, 45, 37), 'template_all_atom_positions': (4, 4, 45, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 45), 'msa_mask': (4, 508, 45), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 45, 3), 'template_pseudo_beta_mask': (4, 4, 45), 'atom14_atom_exists': (4, 45, 14), 'residx_atom14_to_atom37': (4, 45, 14), 'residx_atom37_to_atom14': (4, 45, 37), 'atom37_atom_exists': (4, 45, 37), 'extra_msa': (4, 5120, 45), 'extra_msa_mask': (4, 5120, 45), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 45), 'true_msa': (4, 508, 45), 'extra_has_deletion': (4, 5120, 45), 'extra_deletion_value': (4, 5120, 45), 'msa_feat': (4, 508, 45, 49), 'target_feat': (4, 45, 22)} 2022-01-11 09:19:03.503247: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:81] Couldn't get ptxas version string: Internal: Running ptxas --version returned 32512 2022-01-11 09:19:03.536275: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Fatal Python error: Aborted

    Thread 0x00002ad16d7d1880 (most recent call first): File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 360 in backend_compile File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 297 in xla_primitive_callable File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/util.py", line 179 in cached File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/util.py", line 186 in wrapper File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 248 in apply_primitive File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 603 in process_primitive File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 264 in bind File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 382 in shift_right_logical File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/random.py", line 75 in PRNGKey File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/alphafold/model/model.py", line 167 in predict File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 210 in predict_structure File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 429 in main File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312 in run File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 455 in ./run_alphafold.sh: line 233: 7015 Aborted python $alphafold_script --fasta_paths=$fasta_path --model_names=$model_selection --data_dir=$data_dir --output_dir=$output_dir --jackhmmer_binary_path=$jackhmmer_binary_path --hhblits_binary_path=$hhblits_binary_path --hhsearch_binary_path=$hhsearch_binary_path --hmmsearch_binary_path=$hmmsearch_binary_path --hmmbuild_binary_path=$hmmbuild_binary_path --kalign_binary_path=$kalign_binary_path --uniref90_database_path=$uniref90_database_path --mgnify_database_path=$mgnify_database_path --bfd_database_path=$bfd_database_path --small_bfd_database_path=$small_bfd_database_path --uniclust30_database_path=$uniclust30_database_path --uniprot_database_path=$uniprot_database_path --pdb70_database_path=$pdb70_database_path --pdb_seqres_database_path=$pdb_seqres_database_path --template_mmcif_dir=$template_mmcif_dir --max_template_date=$max_template_date --obsolete_pdbs_path=$obsolete_pdbs_path --db_preset=$db_preset --model_preset=$model_preset --benchmark=$benchmark --amber_relaxation=$amber_relaxation --recycling=$recycling --run_feature=$run_feature --logtostderr

    opened by chenshixinnb 3
  • ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74.

    ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74.

    根据您的步骤安装conda环境, 在conda环境中执行:import jax; print(jax.devices()) 报错:ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74. 请问如何解决呢,谢谢!

    opened by chenshixinnb 3
  • somthing wrong occured when I run the job

    somthing wrong occured when I run the job

    hi,dear author , I installed the required modules according to the link requirements, but the following error occurred when I was running the script. Can you help me find out what is causing it? My installation steps are as follows: 1、conda create --prefix=/data/home/zhoujy/run/alphafold2 python=3.8 2、conda activate /data/home/zhoujy/run/alphafold2 3、conda install cudatoolkit=10.1 cudnn 4、pip install tensorflow==2.3.0 5、pip install biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0 6、pip install --upgrade jax jaxlib==0.1.69+cuda101 -f https://storage.googleapis.com/jax-releases/jax_releases.html

    and then , I run the script:

    #!/bin/bash module load anaconda/2020.11 source activate /data/home/zhoujy/run/alphafold2 ./run_feature.sh -d /data/public/alphafold2 -o /data/home/zhoujy/run/output -m model_1 -f /data/home/zhoujy/run/input/Tb927.10.2950.fasta -t 2021-07-27

    result show as follows: Traceback (most recent call last): File "/data/run01/zhoujy/ParallelFold-main/run_feature.py", line 33, in from alphafold.model import data File "/data/run01/zhoujy/ParallelFold-main/alphafold/model/data.py", line 20, in from alphafold.model import utils File "/data/run01/zhoujy/ParallelFold-main/alphafold/model/utils.py", line 21, in import haiku as hk File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/init.py", line 17, in from haiku import data_structures File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/data_structures.py", line 17, in from haiku._src.data_structures import to_immutable_dict File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/_src/data_structures.py", line 30, in from haiku._src import utils File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/_src/utils.py", line 24, in import jax File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/init.py", line 16, in from .api import ( File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/api.py", line 38, in from . import core File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/core.py", line 31, in from . import dtypes File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/dtypes.py", line 31, in from .lib import xla_client File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/lib/init.py", line 51, in from jaxlib import pytree ImportError: cannot import name 'pytree' from 'jaxlib' (/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jaxlib/init.py)

    why ? I need you help

    opened by zhoujingyu13687306871 2
  • Limit RAM usage

    Limit RAM usage

    Im trying to run a fasta file with 3643 in length. MSA part was done, but the inference part tried to allocate 80 GB of VRAM on GPU which I dont have access to, Graphic cards are NVIDIA Tesla V100 16 GB. Now im trying to run inference on CPU which is a very slow process, and the job keeps using a lot of RAM and expand the usage as the time passes. Can I limit usage of RAM somehow? Or can I run inference on more graphic cards maybe with parallel process?

    opened by hrzolix 1
  • GPU利用率问题

    GPU利用率问题

    博士好!我昨天进行多次尝试后,现在可以运行了,但是我发现运行run_alphafold.sh脚本的时候,涉及GPU计算部分,在相当长的一段时间处于CPU运行状态,GPU利用率长时间为0,我尝试计算一条序列长为2000的蛋白质,用了4个V100的卡,计算了9天,这个速度和情况这个是否正常呢?另外前面在安装tensorflow阶段,是否有必要安装GPU版的tensorflow呢?

    @Zuricho

    opened by zhoujingyu13687306871 1
  • Error after GPU part

    Error after GPU part

    Hi, after installation the "CPU part" (jackhammer and hhblits) work well. But when i start the gpu part, i've got this error message: TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.

    1st part: ./run_feature.sh -d data -o ./tmp -m model_1,model_2,model_3,model_4,model_5 -f ./query/1crn.fasta -t 2021-07-27 2st part: ./run_alphafold.sh -d data -o ./tmp -m model_1,model_2,model_3,model_4,model_5 -f ./query/1crn.fasta -t 2021-07-27

    Full error message: File "/softwares/alphafold/run_alphafold.py", line 316, in app.run(main) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/softwares/alphafold/run_alphafold.py", line 289, in main predict_structure( File "/softwares/alphafold/run_alphafold.py", line 188, in predict_structure relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein) File "/softwares/alphafold/alphafold/relax/relax.py", line 58, in process out = amber_minimize.run_pipeline( File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 482, in run_pipeline ret.update(get_violation_metrics(prot)) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 356, in get_violation_metrics structural_violations, struct_metrics = find_violations(prot) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 338, in find_violations violations = folding.find_structural_violations( File "/softwares/alphafold/alphafold/model/folding.py", line 757, in find_structural_violations atom14_atom_radius = batch['atom14_atom_exists'] * utils.batched_gather( File "/softwares/alphafold/alphafold/model/utils.py", line 39, in batched_gather return take_fn(params, indices) File "/softwares/alphafold/alphafold/model/utils.py", line 36, in take_fn = lambda p, i: jnp.take(p, i, axis=axis) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5383, in take return _take(a, indices, None if axis is None else operator.index(axis), out, File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/api.py", line 411, in cache_miss out_flat = xla.xla_call( File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1618, in bind return call_bind(self, fun, *args, **params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1609, in call_bind outs = primitive.process(top_trace, fun, tracers, params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1621, in process return trace.process_call(self, fun, tracers, params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 615, in process_call return primitive.impl(f, *tracers, **params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 622, in _xla_call_impl compiled_fun = _xla_callable(fun, device, backend, name, donated_invars, File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/linear_util.py", line 262, in memoized_fun ans = call(fun, *args) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 694, in _xla_callable return lower_xla_callable(fun, device, backend, name, donated_invars, *arg_specs).compile().unsafe_call File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 702, in lower_xla_callable jaxpr, out_avals, consts = pe.trace_to_jaxpr_final( File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1522, in trace_to_jaxpr_final jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(fun, main, in_avals) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1500, in trace_to_subjaxpr_dynamic ans = fun.call_wrapped(*in_tracers) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/linear_util.py", line 166, in call_wrapped ans = self.f(*args, **dict(self.params, **kwargs)) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5390, in _take _check_arraylike("take", a) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 559, in _check_arraylike raise TypeError(msg.format(fun_name, type(arg), pos)) jax._src.traceback_util.UnfilteredStackTrace: TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.

    The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.


    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "/softwares/alphafold/run_alphafold.py", line 316, in app.run(main) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/softwares/alphafold/run_alphafold.py", line 289, in main predict_structure( File "/softwares/alphafold/run_alphafold.py", line 188, in predict_structure relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein) File "/softwares/alphafold/alphafold/relax/relax.py", line 58, in process out = amber_minimize.run_pipeline( File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 482, in run_pipeline ret.update(get_violation_metrics(prot)) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 356, in get_violation_metrics structural_violations, struct_metrics = find_violations(prot) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 338, in find_violations violations = folding.find_structural_violations( File "/softwares/alphafold/alphafold/model/folding.py", line 757, in find_structural_violations atom14_atom_radius = batch['atom14_atom_exists'] * utils.batched_gather( File "/softwares/alphafold/alphafold/model/utils.py", line 39, in batched_gather return take_fn(params, indices) File "/softwares/alphafold/alphafold/model/utils.py", line 36, in take_fn = lambda p, i: jnp.take(p, i, axis=axis) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5383, in take return _take(a, indices, None if axis is None else operator.index(axis), out, File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5390, in _take _check_arraylike("take", a) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 559, in _check_arraylike raise TypeError(msg.format(fun_name, type(arg), pos)) TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.

    opened by ebettler 1
  • Running ParallelFold on reduced database?

    Running ParallelFold on reduced database?

    Is it possible to run ParallelFold on reduced_dbs, or is it not yet supported? I tried to use -c reduced_dbs but it did not work. Then I tried modifying the bfd_path set in run_alphafold.sh, somehow it threw directory/file cannot found error. (I'm pretty sure it's there bc I'm able to run alphafold using it). Thank you for your help in advance!

    opened by xinyu-g 0
  • Is CPU acceleration failed?

    Is CPU acceleration failed?

    Last day, I make some experiments in a Server to run the ./run_alphafold.sh -d /dataset/ -o result -p monomer -m model_2 -i input/T1061.fasta and I read the log, confused, the T1061 is 949AA. ` I0822 07:33:00.806264 140553952322112 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpxbrk9wt6/output.sE 0.0001 -E 0.0001 --cpu 8 -N 1 input/T1061.fasta /dataset//uniref90/uniref90.fasta" I0822 07:33:01.157015 140553952322112 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0822 07:37:27.058227 140553952322112 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 265.901 seconds I0822 07:37:27.072012 140553952322112 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpnn6am537/output.sE 0.0001 -E 0.0001 --cpu 8 -N 1 input/T1061.fasta /dataset//mgnify/mgy_clusters_2018_12.fa" I0822 07:37:27.439405 140553952322112 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query I0822 07:42:42.192071 140553952322112 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 314.752 seconds I0822 07:42:42.364272 140553952322112 hhsearch.py:85] Launching subprocess "/opt/conda/bin/hhsearch -i /tmp/tmpog4q4684/query.a3m -o /tmp/tmpog40/pdb70" I0822 07:42:42.712445 140553952322112 utils.py:36] Started HHsearch query I0822 07:44:18.199999 140553952322112 utils.py:40] Finished HHsearch query in 95.487 seconds I0822 07:44:18.555797 140553952322112 hhblits.py:128] Launching subprocess "/opt/conda/bin/hhblits -i input/T1061.fasta -cpu 4 -oa3m /tmp/tmpz9oq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /dataset//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_optst30_2018_08" I0822 07:44:19.050110 140553952322112 utils.py:36] Started HHblits query

    I0822 09:01:02.278290 140553952322112 utils.py:40] Finished HHblits query in 4603.228 seconds ` feature extraction spend time: 5305.185729026794 feature extraction Completed succesfully

    I print the feature extraction time, find that , the 5305 is almost equals to the sum of each db search time, but according to the article, I think the feature extraction spend time should be almost equal to HHblits search, so can you explain the confusing problem?

    opened by yanchenmochen 3
  • failed to alloc 2147483648 bytes on host: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS

    failed to alloc 2147483648 bytes on host: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS

    When I use the code to compute T1050.fasta, which is composed of 700 residuses, the command line output the problem。 The Environment is GPU: A100, Ubuntu,but I use higher version jax and jaxlib, is it the problem causing this?

    (parafold) [email protected]:~# pip list | grep jax jax 0.3.15 jaxlib 0.3.15+cuda11.cudnn82

    opened by yanchenmochen 3
  • Too many command-line arguments

    Too many command-line arguments

    Hi,

    First of all, thanks for developing this tool, I'm looking forward to playing with it!

    I installed the ParallelFold into a Ubuntu 18 machine, and the full alphafold database into an external drive.

    When running the command: $ ./run_alphafold.sh -d /media/qhr/"My Passport"/alphafold/AlphaFold_DB -o output -p monomer_ptm -i input/GA98.fasta -m model_1 -f

    I get the Error: Too many command-line arguments.

    Also get the same error by calling directly to run_alphafold.py: $ python3 run_alphafold.py --fasta_paths=input/GA98.fasta --model_preset=monomer --data_dir=/media/qhr/"My Passport"/alphafold/AlphaFold_DB --output_dir=output --uniref90_database_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/uniref90 --mgnify_database_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/mgnify --template_mmcif_dir=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/pdb_mmcif --obsolete_pdbs_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/pdb_mmcif/obsolete.dat --use_gpu_relax=True bfd_database_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/bfd --max_template_date=2020-05-14

    Is it possible that the space in the name of the external drive "My Passport" is causing such error?

    Thanks! Ana

    opened by AnaValero 1
  • Alphafold2 v/s Parafold timings

    Alphafold2 v/s Parafold timings

    I have a fundamental doubt about the difference between Alphafold2 and Parafold running procedure, how to determine whether Parafold is doing Parallel task unlike sequential tasks performed by Alphafold2 for the first step involving Jackhmmer, Jackhmmer and HHblits searches.

    Snippets of log files obtained from running Alphafold2 and Parafold

    Alphafold2 log:

    I0409 14:04:28.020900 139865793787712 run_alphafold.py:376] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
    I0409 14:04:28.021180 139865793787712 run_alphafold.py:393] Using random seed 1420247507508611084 for the data pipeline
    I0409 14:04:28.021463 139865793787712 run_alphafold.py:161] Predicting seq1
    I0409 14:04:28.037414 139865793787712 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpm1u84thu/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1/fasta_files/seq1.fasta /alphafold_data//uniref90/uniref90.fasta"
    I0409 14:04:28.111756 139865793787712 utils.py:36] Started Jackhmmer (uniref90.fasta) query
    I0409 14:10:17.276236 139865793787712 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 349.164 seconds
    I0409 14:10:17.462168 139865793787712 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpub1qi595/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /fasta_files/seq1.fasta /alphafold_data//mgnify/mgy_clusters_2018_12.fa"
    I0409 14:10:17.513182 139865793787712 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
    I0409 14:16:32.112656 139865793787712 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 374.599 seconds
    I0409 14:16:33.369129 139865793787712 hhsearch.py:85] Launching subprocess "/.conda/envs/alphafold/bin/hhsearch -i /tmp/tmpyot74k7r/query.a3m -o /tmp/tmpyot74k7r/output.hhr -maxseq 1000000 -d /alphafold_data//pdb70/pdb70"
    I0409 14:16:33.466009 139865793787712 utils.py:36] Started HHsearch query
    I0409 14:22:32.148045 139865793787712 utils.py:40] Finished HHsearch query in 358.682 seconds
    I0409 14:22:32.838686 139865793787712 hhblits.py:128] Launching subprocess "/.conda/envs/alphafold/bin/hhblits -i /fasta_files/seq1.fasta -cpu 4 -oa3m /tmp/tmpedyoxta1/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /alphafold_data//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /alphafold_data//uniclust30/uniclust30_2018_08/uniclust30_2018_08"
    I0409 14:22:32.926801 139865793787712 utils.py:36] Started HHblits query
    I0409 18:56:30.223437 139865793787712 utils.py:40] Finished HHblits query in 16437.296 seconds
    

    Parafold log:

    I0427 21:17:27.915049 140305630689088 run_alphafold.py:397] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
    I0427 21:17:27.915312 140305630689088 run_alphafold.py:414] Using random seed 1534697036303804749 for the data pipeline
    I0427 21:17:27.915629 140305630689088 run_alphafold.py:165] Predicting seq2
    I0427 21:17:27.925500 140305630689088 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmp5fo28348/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /fasta_files/seq2.fasta /alphafold_data//uniref90/uniref90.fasta"
    I0427 21:17:27.996705 140305630689088 utils.py:36] Started Jackhmmer (uniref90.fasta) query
    I0427 21:23:54.643056 140305630689088 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 386.646 seconds
    I0427 21:23:54.829476 140305630689088 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmprs3za6w_/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /fasta_files/seq2.fasta /alphafold_data//mgnify/mgy_clusters_2018_12.fa"
    I0427 21:23:54.875119 140305630689088 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
    I0427 21:31:38.409492 140305630689088 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 463.534 seconds
    I0427 21:31:39.768360 140305630689088 hhsearch.py:85] Launching subprocess "/.conda/envs/alphafold/bin/hhsearch -i /tmp/tmpjgr58ebb/query.a3m -o /tmp/tmpjgr58ebb/output.hhr -maxseq 1000000 -d /alphafold_data//pdb70/pdb70"
    I0427 21:31:39.850885 140305630689088 utils.py:36] Started HHsearch query
    I0427 21:39:23.420352 140305630689088 utils.py:40] Finished HHsearch query in 463.569 seconds
    I0427 21:39:24.173583 140305630689088 hhblits.py:128] Launching subprocess "/.conda/envs/alphafold/bin/hhblits -i /fasta_files/seq2.fasta -cpu 4 -oa3m /tmp/tmpmzl5arhr/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /alphafold_data//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /alphafold_data//uniclust30/uniclust30_2018_08/uniclust30_2018_08"
    I0427 21:39:24.259592 140305630689088 utils.py:36] Started HHblits query
    I0428 01:34:31.302148 140305630689088 utils.py:40] Finished HHblits query in 14107.042 seconds
    

    They look similar to me, and both use 8cpus, 8cpus, and 4cpus, respectively. Please clarify this for me.

    Thank you Aditi

    opened by adi1bioinfo 0
  • An error in feature generation

    An error in feature generation

    Hi, When I used your new version to make fearure.pkl, this error occurred, could you give any advice on how to solve it?

    FATAL Flags parsing error: Unknown command line flag 'model_names'. Did you mean: model_preset ? Pass --helpshort or --helpfull to see help on flags.

    opened by YiningWang2 1
Releases(v1.1)
Owner
Bozitao Zhong
Protein Design
Bozitao Zhong
商品推荐系统

商品top50推荐系统 问题建模 本项目的数据集给出了15万左右的用户以及12万左右的商品, 以及对应的经过脱敏处理的用户特征和经过预处理的商品特征,旨在为用户推荐50个其可能购买的商品。 推荐系统架构方案 本项目采用传统的召回+排序的方案。

107 Dec 29, 2022
FastyAPI is a Stack boilerplate optimised for heavy loads.

FastyAPI A FastAPI based Stack boilerplate for heavy loads. Explore the docs » View Demo · Report Bug · Request Feature Table of Contents About The Pr

Ali Chaayb 47 Dec 27, 2022
Improving XGBoost survival analysis with embeddings and debiased estimators

xgbse: XGBoost Survival Embeddings "There are two cultures in the use of statistical modeling to reach conclusions from data

Loft 242 Dec 30, 2022
Collapse by Conditioning: Training Class-conditional GANs with Limited Data

Collapse by Conditioning: Training Class-conditional GANs with Limited Data Moha

Mohamad Shahbazi 33 Dec 06, 2022
A study project using the AA-RMVSNet to reconstruct buildings from multiple images

3d-building-reconstruction This is part of a study project using the AA-RMVSNet to reconstruct buildings from multiple images. Introduction It is exci

17 Oct 17, 2022
Proof-Of-Concept Piano-Drums Music AI Model/Implementation

Rock Piano "When all is one and one is all, that's what it is to be a rock and not to roll." ---Led Zeppelin, "Stairway To Heaven" Proof-Of-Concept Pi

Alex 4 Nov 28, 2021
The Ludii general game system, developed as part of the ERC-funded Digital Ludeme Project.

The Ludii General Game System Ludii is a general game system being developed as part of the ERC-funded Digital Ludeme Project (DLP). This repository h

Digital Ludeme Project 50 Jan 04, 2023
The dynamics of representation learning in shallow, non-linear autoencoders

The dynamics of representation learning in shallow, non-linear autoencoders The package is written in python and uses the pytorch implementation to ML

Maria Refinetti 4 Jun 08, 2022
Photographic Image Synthesis with Cascaded Refinement Networks - Pytorch Implementation

Photographic Image Synthesis with Cascaded Refinement Networks-Pytorch (https://arxiv.org/abs/1707.09405) This is a Pytorch implementation of cascaded

Soumya Tripathy 63 Mar 27, 2022
Voice control for Garry's Mod

WIP: Talonvoice GMod integrations Very work in progress voice control demo for Garry's Mod. HOWTO Install https://talonvoice.com/ Press https://i.imgu

Meta Construct 5 Nov 15, 2022
Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation

Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation Our paper is accepted by ICCV2021. Picture: Overview of the proposed Plug-an

Yunfei Liu 32 Dec 10, 2022
Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format

ttopt Description Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train (TT) format and maximu

5 May 23, 2022
Self-Supervised Contrastive Learning of Music Spectrograms

Self-Supervised Music Analysis Self-Supervised Contrastive Learning of Music Spectrograms Dataset Songs on the Billboard Year End Hot 100 were collect

27 Dec 10, 2022
Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology Sharon Zhou, Eric Zelikman

Stanford Machine Learning Group 34 Nov 16, 2022
code for Fast Point Cloud Registration with Optimal Transport

robot This is the repository for the paper "Accurate Point Cloud Registration with Robust Optimal Transport". We are in the process of refactoring the

28 Jan 04, 2023
ExCon: Explanation-driven Supervised Contrastive Learning

ExCon: Explanation-driven Supervised Contrastive Learning Link to the paper: https://arxiv.org/pdf/2111.14271.pdf Contributors of this repo: Zhibo Zha

Zhibo (Darren) Zhang 18 Nov 01, 2022
The code of paper 'Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection'

Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection Pytorch implemetation of paper 'Learning to Aggregate and Personalize

Tencent YouTu Research 136 Dec 29, 2022
Constrained Language Models Yield Few-Shot Semantic Parsers

Constrained Language Models Yield Few-Shot Semantic Parsers This repository contains tools and instructions for reproducing the experiments in the pap

Microsoft 43 Nov 23, 2022
IEGAN — Official PyTorch Implementation Independent Encoder for Deep Hierarchical Unsupervised Image-to-Image Translation

IEGAN — Official PyTorch Implementation Independent Encoder for Deep Hierarchical Unsupervised Image-to-Image Translation Independent Encoder for Deep

30 Nov 05, 2022
Rethinking Nearest Neighbors for Visual Classification

Rethinking Nearest Neighbors for Visual Classification arXiv Environment settings Check out scripts/env_setup.sh Setup data Download the following fin

Menglin Jia 29 Oct 11, 2022