Python script to download (TCR) genes from IMGT/GENE-DB

Related tags

DownloaderIMGTgeneDL
Overview

IMGTgeneDL

0.1.0

Jamie Heather | CCR @ MGH | 2021

This script provides an alternative way to access TCR and IG genes stored in IMGT/GENE-DB. It's primarily designed for downloading human/mouse TCRs, but it's readily adaptable to other species/loci.

Usage

This script is tested on python >= 3.6, not requiring any non-standard packages.

Download specific gene types

The primary way this script is intended to be used is to tell it the species, loci, and sequence types that you want to download. This will be downloaded and saved to a file named with the date the IMGT release used, and details of the combination of parameters searched for (unless overriden with the -o / --out_path flag).

Species

The script must be run on single species at a time, given via the -s / --species flag as a full genus species with a '+' symbol in place of the space. E.g.:

  • -s Homo+sapiens
  • -s Mus+musculus

Note that it doesn't seem that the IMGT URL interface will accept either genus or species alone, and it can be particular about formatting so maintaining proper case is advised. However it seems to download sub-species (e.g.\ searching for Mus+musculus will return all covered strains).

Loci

This script is currently configured to download the four common TCR loci:

  • A / TRA / alpha
  • B / TRB / beta
  • G / TRG / gamma
  • D / TRD / delta

These must be provided to the script using the -L / --loci flag, giving it the desired loci as a string of characters, e.g. -L AB to just download alpha and beta sequences, or -L G to just download gamma. Alternatively -L TR will simply download all four chains' sequences (equivalent to -L ABGD).

Sequence types

This script is designed to help in the aid in the analysis of typical expressed repertoires, and thus is configured by default to download the relevant parts of the loci that end up involved in expressed transcripts. The download of each is achieved using specific flags:

  • -l / --get_l: download leader sequences
  • -v / --get_v: download V sequences
  • -d / --get_d: download D sequences
  • -j / --get_j: download J sequences
  • -c / --get_c: download constant region sequences

Note that these can be combined, e.g. -vdj will just download the V, D, and J gene sequences. Alternatively users can apply the -r / --get_all_regions flag to just download all of these regions (equivalent to -lvdjc).

Examples

The following is the basic command to download all relevant human sequences for all chains:

python3 IMGTgeneDL.py -s Homo+sapiens -L TR -r

While this is a command to just download delta chain J genes from mice:

python3 -i IMGTgeneDL.py -s Mus+musculus -j -L D

Download whole database

If no locus and gene type flags are used, or if the -a / --get_all flag is used, then the script will just download the whole of GENE-DB - all species, all genes, all loci. By default this downloads the ungapped nucleotide file, with all pseudogenes, and saves this to a file named with the date and the IMGT release used. This can be changed using the following flags:

  • -gap / --gapped: downloads the gapped FASTA instead of the ungapped
  • -ifp / --in_frame_p: downloads the 'inframeP' FASTA instead of the 'allP'
  • -o / --out_path: as above, sets the path to a specific file if you don't wish to use the automatic names or save in the same directory

Notes

Downloaded regions

The architecture of the TCR loci differs a little between the genes and across species, which the IMGT nomenclature has specific terms to cope with. However the URL based searching this script does requires provision exact exon names for each species. This script assumes generic defaults, but these can be overriden by providing specific details in the tab-delimited region-overrides.tsv file. This allows users to override which fields are downloaded, or download additional fields, by adding an entry for the relevant gene/species combination and filling in the final 'Field(s)' comma-delimited field with the IMGT labels to be downloaded.

The most relevant place this comes in to place is in the constant regions, which have differing numbers and names of exons. The relevant differences for these loci/species are that exon 4 of the alpha and delta chains is an UTR, while gamma chains lack a fourth exon and have duplicated exon 2 variants. If users wish to run the script to download specific sequences including constant regions for species other than humans or mice they will need to edit this document appropriately first.

The other default IMGT labels downloaded are:

  • L-PART1+L-PART2 for leader sequences
  • V-/D-/J-REGION for V/D/J genes

IMGT FASTA headers

The IMGT header FASTA fields (as reported in the output of GENE-DB) are:

The FASTA header contains 15 fields separated by '|':

1. IMGT/LIGM-DB accession number(s)
2. IMGT gene and allele name
3. species
4. IMGT allele functionality
5. exon(s), region name(s), or extracted label(s)
6. start and end positions in the IMGT/LIGM-DB accession number(s)
7. number of nucleotides in the IMGT/LIGM-DB accession number(s)
8. codon start, or 'NR' (not relevant) for non coding labels
9. +n: number of nucleotides (nt) added in 5' compared to the corresponding label extracted from IMGT/LIGM-DB
10. +n or -n: number of nucleotides (nt) added or removed in 3' compared to the corresponding label extracted from IMGT/LIGM-DB
11. +n, -n, and/or nS: number of added, deleted, and/or substituted nucleotides to correct sequencing errors, or 'not corrected' if non corrected sequencing errors
12. number of amino acids (AA): this field indicates that the sequence is in amino acids
13. number of characters in the sequence: nt (or AA)+IMGT gaps=total
14. partial (if it is)
15. reverse complementary (if it is)
Disclaimer

I am not affiliated with IMGT, and this tool is only shared as a way to increase the utility of their platform. Please TCR responsibly.

Owner
Jamie Heather
Postdoc research working in cancer immunology at MGH.
Jamie Heather
🔥 A Bot To Telegram For Download High Qulity Videos & Songs From Youtube

🔥 A Bot To Telegram For Download High Qulity Videos & Songs From Youtube 🎗 Fast And Free Bot No Need To Pay ✅ By SL-Alpha-X-Team ⚡

Official Alpha-X-Team Account 7 Aug 31, 2022
Python/Selenium script to scrape data about university courses

university-courses Python/Selenium script to scrape data about university courses. Script first extracts URLs of each courses homepage, then trawls ea

Sam Brown 1 Feb 02, 2022
A Python script to download PDB files associated with a Portable Executable (PE)

A Python script to download PDB files associated with a Portable Executable (PE)

Podalirius 33 Jan 03, 2023
A program which takes an Anime name or URL and downloads the specified range of episodes.

super-anime-downloader A console application written in Python3.x (GUI will be added soon) which takes a Anime Name/URL as input and downloads the ran

Sayyid Ali Sajjad Rizavi 26 Jul 18, 2022
😷 Dowload dos documentos da CPI da Pandemia

A CPI da Pandemia recebeu milhares de documentos públicos, todos disponibilizados no site do Senado Federal.

Eduardo Cuducos 98 Sep 23, 2022
A prometheus exporter for torrent downloader like qbittorrent/transmission/deluge

downloader-exporter A prometheus exporter for qBitorrent/Transmission/Deluge. Get metrics from multiple servers and offers them in a prometheus format

Lei Shi 41 Nov 18, 2022
Pypixiv - A fully-typed, asynchronous api wrapper for pixiv

pypixiv this library is a fully-typed, asynchronous api wrapper for pixiv. featu

DeltaLaboratory 2 Nov 16, 2022
You Can download any video/image in all social medias very easy and High Speed.

All-Downloader You Can download any video/image in all social medias very easy and High Speed. also you can easily download videos from web browsers s

Razor Kenway 14 Dec 16, 2022
Um projeto modesto para baixar vídeos do youtube usando tkinter como gui

Youtube Downloader Um projeto modesto para baixar vídeos do youtube usando tkinter como gui Instalação dos requirements: python3 setup.py ou python se

Sunlyx 2 Nov 25, 2021
⚙️ A CLI tool that can download songs from youtube.

⚙️ Music Downloader Music Downloader is a tool that can download songs from Youtube. Installation Base requirements: Python 3.7+ If you have Python 3.

matjs 4 Nov 03, 2021
Archivist - Easily archive 📦 Download folder to Google Drive ☁️

Archivist Script for archiving Download folder by uploading unmodified files to a Google Drive folder. Modified files will remain in the Download fold

Timing Liu 3 Sep 30, 2022
Make YouTube videos tasks in Todoist faster and time efficient!

Youtubist Basically fork of yt-dlp python module to my needs. You can paste playlist or channel link on the YouTube. It will automatically format to s

Konrad Konieczny 1 Dec 04, 2022
Automatically download and crop key information from the arxiv daily paper. (cpu version)

Automatically download and crop key information from the arxiv daily paper. (cpu version)

HeoLis 4 Jul 30, 2022
A cross platform front-end GUI of the popular youtube-dl written in wxPython.

youtube-dlG A cross platform front-end GUI of the popular youtube-dl media downloader written in wxPython. Supported sites Screenshots Requirements Py

8.7k Dec 31, 2022
Python-Youtube-Downloader - An Open Source Python Youtube Downloader

Python-Youtube-Downloader Hello There This Is An Open Source Python Youtube Down

Flex Tools 3 Jun 14, 2022
A Simple YouTube Video Downloader With Python

Simple YouTube Video Downloader Simple YouTube Video Downloader is an open source project with a very simple UI that tries to speed up the process of

Brian Han 2 Jan 03, 2022
Twitter Media Downloader (Telegram Bot)

Twitter Media Downloader (Telegram Bot)

Matin Baloochestani 8 Oct 27, 2022
Youtube Downloader Telegram Bot 😉

Youtube Dl bot 😉 Prerequisite ffmpeg install dependencies pip3 install -r requirements.txt Setup Bot - Change configuration config.py File - insta

Aryan Vikash 285 Dec 06, 2022
A tool written in Python to download all Snapmaps content from a specific location.

snapmap-archiver A tool written in Python to download all Snapmaps content from a specific location.

46 Dec 09, 2022
YouTube-Video-Downloader - Download Youtube Videos for free.

YouTube-Video-Downloader Download Youtube Videos for free. Installing Dependencies:- Windows pip install pytube Mac/Linux pip3 install pytube Clonin

Xception Inc. 1 Jan 01, 2022