This program analyzes a DNA sequence and outputs snippets of DNA that are likely to be protein-coding genes.

Overview

The Gene_finder program

This program is designed to manipulate data from DNA and find the genes that are encoded by the DNA sequence. It contains 11 independent functions.These functions are :

  • get_complement(nucleotide)

    • Returns the complementary nucleotide nucleotide: a nucleotide (A, C, G, or T) represented as a string returns: the complementary nucleotide get_complement('A') 'T'
  • get_reverse_complement(dna)

    • Computes the reverse complementary sequence of DNA for the specfied DNA sequence
  • rest_of_ORF(dna)

    • Takes a DNA sequence that is assumed to begin with a start codon and returns the sequence up to but not including the first in frame stop codon. If there is no in frame stop codon, returns the whole string.
  • find_all_ORFs_oneframe(dna)

    • Finds all non-nested open reading frames in the given DNA sequence and returns them as a list. This function should only find ORFs that are in the default frame of the sequence (i.e. they start on indices that are multiples of 3). By non-nested we mean that if an ORF occurs entirely within another ORF, it should not be included in the returned list of ORFs.
  • find_all_ORFs(dna)

    • Finds all non-nested open reading frames in the given DNA sequence in all 3 possible frames and returns them as a list. By non-nested we mean that if an ORF occurs entirely within another ORF and they are both in the same frame, it should not be included in the returned list of ORFs.
  • longest_ORF(dna)

    • Finds the longest ORF on both strands of the specified DNA and returns it as a string
  • find_all_ORFs_both_strands(dna)

    • Finds all non-nested open reading frames in the given DNA sequence on both strands
  • shuffle_string(s)

    • Shuffles the characters in the input string
  • longest_ORF_noncoding(dna, num_trials)

    • Computes the maximum length of the longest ORF over num_trials shuffles of the specfied DNA sequence
  • coding_strand_to_AA(dna)

    • Computes the Protein encoded by a sequence of DNA. This function does not check for start and stop codons (it assumes that the input DNA sequence represents an protein coding region)
  • gene_finder(dna)

    • Returns the amino acid sequences that are likely coded by the specified dna

To start the program import the MainMenu() function from the menu module:

disclaimer!!

In the menu module, MainMenu function ... you need to ensure the path matches your path. The path given here matches the data given

COVID-19 deaths statistics around the world

COVID-19-Deaths-Dataset COVID-19 deaths statistics around the world This is a daily updated dataset of COVID-19 deaths around the world. The dataset c

Nisa Efendioğlu 4 Jul 10, 2022
Automated Exploration Data Analysis on a financial dataset

Automated EDA on financial dataset Just a simple way to get automated Exploration Data Analysis from financial dataset (OHLCV) using Streamlit and ta.

Darío López Padial 28 Nov 27, 2022
collect training and calibration data for gaze tracking

Collect Training and Calibration Data for Gaze Tracking This tool allows collecting gaze data necessary for personal calibration or training of eye-tr

Pascal 5 Dec 17, 2022
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and lo

Coiled 102 Nov 10, 2022
Flenser is a simple, minimal, automated exploratory data analysis tool.

Flenser Have you ever been handed a dataset you've never seen before? Flenser is a simple, minimal, automated exploratory data analysis tool. It runs

John McCambridge 79 Sep 20, 2022
Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Hatchet Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data. It is intended for analyzing

Lawrence Livermore National Laboratory 14 Aug 19, 2022
Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era.

Overview docs tests package Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era

Tensorwerk 193 Nov 29, 2022
ETL pipeline on movie data using Python and postgreSQL

Movies-ETL ETL pipeline on movie data using Python and postgreSQL Overview This project consisted on a automated Extraction, Transformation and Load p

Juan Nicolas Serrano 0 Jul 07, 2021
EOD Historical Data Python Library (Unofficial)

EOD Historical Data Python Library (Unofficial) https://eodhistoricaldata.com Installation python3 -m pip install eodhistoricaldata Note Demo API key

Michael Whittle 20 Dec 22, 2022
Scraping and analysis of leetcode-compensations page.

Leetcode compensations report Scraping and analysis of leetcode-compensations page.

utsav 96 Jan 01, 2023
This mini project showcase how to build and debug Apache Spark application using Python

Spark app can't be debugged using normal procedure. This mini project showcase how to build and debug Apache Spark application using Python programming language. There are also options to run Spark a

Denny Imanuel 1 Dec 29, 2021
Tools for working with MARC data in Catalogue Bridge.

catbridge_tools Tools for working with MARC data in Catalogue Bridge. Borrows heavily from PyMarc

1 Nov 11, 2021
Codes for the collection and predictive processing of bitcoin from the API of coinmarketcap

Codes for the collection and predictive processing of bitcoin from the API of coinmarketcap

Teo Calvo 5 Apr 26, 2022
Pipeline to convert a haploid assembly into diploid

HapDup (haplotype duplicator) is a pipeline to convert a haploid long read assembly into a dual diploid assembly. The reconstructed haplotypes

Mikhail Kolmogorov 50 Jan 05, 2023
Multiple Pairwise Comparisons (Post Hoc) Tests in Python

scikit-posthocs is a Python package that provides post hoc tests for pairwise multiple comparisons that are usually performed in statistical data anal

Maksim Terpilowski 264 Dec 30, 2022
ASTR 302: Python for Astronomy (Winter '22)

ASTR 302, Winter 2022, University of Washington: Python for Astronomy Mario Jurić Location When: 2:30-3:50, Monday & Wednesday, Winter quarter 2022 Wh

UW ASTR 302: Python for Astronomy 4 Jan 12, 2022
Approximate Nearest Neighbor Search for Sparse Data in Python!

Approximate Nearest Neighbor Search for Sparse Data in Python! This library is well suited to finding nearest neighbors in sparse, high dimensional spaces (like text documents).

Meta Research 906 Jan 01, 2023
CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner.

CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner. It is aimed to integrate this tool with several more features including providing a U

Ravi Prakash 3 Jun 27, 2021
DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN. Allowing for both categorical and numerical data, DenseClus makes it possible to incorporate all features in cluste

Amazon Web Services - Labs 53 Dec 08, 2022
Nobel Data Analysis

Nobel_Data_Analysis This project is for analyzing a set of data about people who have won the Nobel Prize in different fields and different countries

Mohammed Hassan El Sayed 1 Jan 24, 2022