当前位置:网站首页>Alphafpld upgrade alphafold multimer
Alphafpld upgrade alphafold multimer
2022-04-23 03:28:00 【Python code doctor】
Preface
Alphafold2(AF2, Two dogs ) After release , A lot of scientists have used this most accurate protein prediction model in human history as a toy , And posted your new findings on twitter (open science In the form of ). One of the most important findings was that they found AF2 Although it's training on a single body , But I don't have the ability to predict the complex . stay open science Under the call of ,deepmind Team in 2021 year 10 month 6 It's on Alphafold-multimer, A neural network for complex retraining . Okay , Enter the text below ( Pretty good , Welcome criticism and correction ).
1. stay alphafold Directory execution
get fetch origin main
2. download uniport database
uniport The database provides species information ( Probably with eukaryotes / Prokaryote related )
./download_uniprot.sh <DOWNLOAD_DIR>
3. remove pdb_mmcif file
rm <DOWNLOAD_DIR>/pdb_mmcif
4. download pdb_mmcif file , Use multithreading to synchronize
#!/bin/sh
src='rsync.rcsb.org::ftp_data/structures/divided/mmCIF' # The source path , There is no slash at the end
dst='./pdb_mmcif/raw' # The target path , There is no slash at the end
opt="--recursive --links --perms --times --compress --info=progress2 --delete --port=33444" # Synchronization options
num=10 # Number of concurrent processes
depth='5 4 3 2 1' # Return directory depth
task=/tmp/`echo $src$ | md5sum | head -c 16`
[ -f $task-next ] && cp $task-next $task-skip
[ -f $task-skip ] || touch $task-skip
# Create target directory structure
rsync $opt --include "*/" --exclude "*" $src/ $dst
# Synchronize directories from deep to shallow
for l in $depth ;do
# start-up rsync process
for i in `find $dst -maxdepth $l -mindepth $l -type d`; do
i=`echo $i | sed "s#$dst/##"`
if `grep -q "$i$" $task-skip`; then
echo "skip $i"
continue
fi
while true; do
now_num=`ps axw | grep rsync | grep $dst | grep -v '\-\-daemon' | wc -l`
if [ $now_num -lt $num ]; then
echo "rsync $opt $src/$i/ $dst/$i" >>$task-log
rsync $opt $src/$i/ $dst/$i &
echo $i >>$task-next
sleep 1
break
else
sleep 5
fi
done
done
done
Save the above code to DOWNLOAD_DIR Run again under the directory . Number of concurrent processes in use num Do not set it too large, or it will be hacked by the server .
5. download pdb_seqres file
./download_pdb_seqres.sh <DOWNLOAD_DIR>
6. Remove the old parameter file
Delete the monomer structure prediction model
rm <DOWNLOAD_DIR>/params
7. Download a new parameter file
to update 5 A new model params_model_{1,2,3,4,5}_multimer.npz
./download_alphafold_params.sh <DOWNLOAD_DIR>
8. After downloading, be sure to check the file structure one by one , The actual file size may be larger , But not less . Be sure to match them one by one as much as possible !!!!!!!
The command to see the current folder size is du -sh
$DOWNLOAD_DIR/ # Total: ~ 2.2 TB (download: 438 GB)
bfd/ # ~ 1.7 TB (download: 271.6 GB)
# 6 files.
mgnify/ # ~ 64 GB (download: 32.9 GB)
mgy_clusters_2018_12.fa
params/ # ~ 3.5 GB (download: 3.5 GB)
# 5 CASP14 models,
# 5 pTM models,
# 5 AlphaFold-Multimer models,
# LICENSE,
# = 16 files.
pdb70/ # ~ 56 GB (download: 19.5 GB)
# 9 files.
pdb_mmcif/ # ~ 206 GB (download: 46 GB)
mmcif_files/
# About 180,000 .cif files.
obsolete.dat
pdb_seqres/ # ~ 0.2 GB (download: 0.2 GB)
pdb_seqres.txt
small_bfd/ # ~ 17 GB (download: 9.6 GB)
bfd-first_non_consensus_sequences.fasta
uniclust30/ # ~ 86 GB (download: 24.9 GB)
uniclust30_2018_08/
# 13 files.
uniprot/ # ~ 98.3 GB (download: 49 GB)
uniprot.fasta
uniref90/ # ~ 58 GB (download: 29.7 GB)
uniref90.fasta
9. Run the command
Predictable monomers , Homopolymer , Heteropolymer
--db_preset=reduced_dbs
Run a simple database (8CPU,600G Space )
--db_preset=full_dbs
Run the whole database ( All data types use CASP14 specifications )
1.monomer model preset Monomeric protein
Sequence file format
>sequence_name
<SEQUENCE>
python3 docker/run_docker.py \
--fasta_paths=T1050.fasta \
--max_template_date=2020-05-14 \
--model_preset=monomer \
--db_preset=reduced_dbs \
--data_dir=$DOWNLOAD_DIR
2.multiple sequences Homopolymer protein
One theory is to predict whether a polymer is eukaryote or prokaryote , The prediction is more accurate , Code --is_prokaryote_list=true
Written in --max_template_date
front
Each model runs by default 5 A seed , Total forecast 25 A protein , Can pass –num_multimer_predictions_per_model=1 The command lets each model run a seed
>sequence_1
<SEQUENCE>
>sequence_2
<SEQUENCE>
>sequence_3
<SEQUENCE>
python3 docker/run_docker.py \
--fasta_paths=multimer.fasta \
--max_template_date=2020-05-14 \
--model_preset=multimer \
--data_dir=$DOWNLOAD_DIR
3.heteromer model Heteromeric protein such as 2 individual A Sequence ,3 individual B Sequence
>sequence_1
<SEQUENCE A>
>sequence_2
<SEQUENCE A>
>sequence_3
<SEQUENCE B>
>sequence_4
<SEQUENCE B>
>sequence_5
<SEQUENCE B>
Command homologous polymer
4. The command to continuously predict multiple monomers
python3 docker/run_docker.py \
--fasta_paths=monomer1.fasta,monomer2.fasta \
--max_template_date=2021-11-01 \
--model_preset=monomer \
--data_dir=$DOWNLOAD_DIR
5. Continuous prediction of multiple specific commands
python3 docker/run_docker.py \
--fasta_paths=multimer1.fasta,multimer2.fasta \
--max_template_date=2021-11-01 \
--model_preset=multimer \
--data_dir=$DOWNLOAD_DIR
10.outputpdb Document interpretation
1.features.pkl To generate a structure NumPy Array
2.unrelaxed_model_*.pdb Predicted pdb Format , Consistent with the output model
3.relaxed_model_*.pdb Perform... On unrelaxed structures Amber Relaxed structure
4.ranked_*.pdb according to PLddt Scoring ( Including the relaxed structure ) Sort ,ranked_0.pdb Include the prediction with the highest confidence ,ranked_4.pdb It should include the prediction with the lowest confidence .
5.ranking_debug.json json file , contain PLddt Mapping with the original model name
6.timings.json json file , Include run AlphaFold Every pipeline The time it takes .
7.msas/ structure MSA Documents of various genetic tools
8.result_model_*.pkl Structure output module and auxiliary output module
Reference article :https://blog.csdn.net/qq_20291997/article/details/122613497
https://www.rehiy.com/post/134
版权声明
本文为[Python code doctor]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220619243571.html
边栏推荐
- Batch download of files ---- compressed and then downloaded
- JS implementation of new
- QT uses drag and drop picture to control and mouse to move picture
- MySQL之explain关键字详解
- oracle 查询外键含有逗号分隔的数据
- Student achievement management
- 打卡:4.23 C语言篇 -(1)初识C语言 - (12)结构体
- C set
- List interface of collection
- Quartz. Www. 18fu Used in net core
猜你喜欢
L3-011 direct attack Huanglong (30 points)
移植tslib时ts_setup: No such file or directory、ts_open: No such file or director
. net 5 Web custom middleware implementation returns the default picture
12. < tag linked list and common test site synthesis > - lt.234 palindrome linked list
Database SQL -- simulate inserting a large amount of data, importing / exporting database scripts, timestamp conversion and database basics
Knowledge of software testing~
Visual programming -- how to customize the mouse cursor
Test questions (2)
Visual programming - Experiment 2
关于idea调试模式下启动特别慢的优化
随机推荐
There is no index in the database table. When inserting data, SQL statements are used to prevent repeated addition (Reprint)
淺學一下I/O流和File類文件操作
数据库表中不建索引,在插入数据时,通过sql语句防止重复添加(转载)
Flink customizes the application of sink side sinkfunction
2022 团体程序设计天梯赛 模拟赛 1-8 均是素数 (20 分)
WinForm allows the form form to switch between the front and active states
Chapter 7 of C language programming (fifth edition of Tan Haoqiang) analysis and answer of modular programming exercises with functions
Knowledge of software testing~
It can receive multiple data type parameters - variable parameters
Test questions and some space wars
JS, bind the event for a label with input, and then bind the stand-alone event in the parent element. The event is executed twice and solved
Why is bi so important to enterprises?
Unity knowledge points (ugui)
C set
Problem C: realize Joseph Ring with linked list
Quartz. Www. 18fu Used in net core
Optimization of especially slow startup in idea debugging mode
Utgard connection opcserver reported an error caused by: org jinterop. dcom. common. JIRuntimeException: Access is denied. [0x800
MySQL之explain关键字详解
Idea view history [file history and project history]