Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
Telegram bot for stream music on telegram, powered by py-tgcalls and Pyrogram

Telegram Streamer Bot Telegram bot for stream music on telegram, powered by py-tgcalls and Pyrogram ✨ Features Coming soon, help me to improve it 🛠 C

Shohih Abdul 11 Oct 21, 2022
A code that can make an account bump your discord server 24/7!

BumpCord A code that can make an account bump your discord server 24/7! The main.py is the main file. keep_alive.py prevents your repl from going to s

Phantom 28 Aug 20, 2022
2b2t Priority queue discord bot announcer

2b2t Priority queue discord bot announcer Commands !prioq - Checks the priority queue length and sends it. !start - Starts a loop that sends the sta

Gumi 5 Jun 06, 2022
To send an Instagram message using Python

To send an Instagram message using Python, you must have an Instagram account and install the Instabot library in your Python virtual environment.

Coding Taggers 1 Dec 18, 2021
An async-ready Python wrapper around FerrisChat's API.

FerrisWheel An async-ready Python wrapper around FerrisChat's API. Installation Instructions Linux: $ python3.9 -m pip install -U ferriswheel Python 3

FerrisChat 8 Feb 08, 2022
Covid19 API. (Currently Scrapes: worldometers)

Covid19-API An opensource Covid19 API (currently uses worldometer only) Output Examples Covid19 Every Country Data Request URL your-ip/api/all Resp

Amresh Prasad Sinha 14 Oct 03, 2022
A Simple Telegram Inline Torrent Search Bot by @AbirHasan2005

A Simple Telegram Inline Torrent Search Bot by @AbirHasan2005

Abir Hasan 61 Oct 28, 2022
Simple script to extract useful informations from the combo BloodHound + Neo4j

bloodhound-quickwin Simple script to extract useful informations from the combo BloodHound + Neo4j. Can help to choose a target. Prerequisites python3

140 Dec 21, 2022
Flaga ze Szturmu na AWS.

Witaj Jesteś na GitHub'ie i czytasz właśnie plik README.md który znajduje się wewnątrz repozytorium Flaga z 7 i 8 etapu Szturmu na AWS. W tym etapie w

9 May 16, 2022
Discord spam bots with multiple account support and more

Discord spam bots with multiple account support and more. PLEASE READ EVERYTHING BEFORE WRITING AN ISSUE!! Server Messages Text Image Dm Messages Text

Mr. Nobody 6 Sep 14, 2022
Official API documentation for Highrise

Highrise API The Highrise API is implemented as vanilla XML over HTTP using all four verbs (GET/POST/PUT/DELETE). Every resource, like Person, Deal, o

Basecamp 128 Dec 06, 2022
Azure Neural Speech Service TTS

Written in Python using the Azure Speech SDK. App.py provides an easy way to create an Text-To-Speech request to Azure Speech and download the wav file.

Rodney 1 Oct 11, 2021
Spore Api

SporeApi Spore Api Simple example: import asyncio from spore_api.client import SporeClient async def main() - None: async with SporeClient() a

LEv145 16 Aug 02, 2022
TM1py is a Python package that wraps the TM1 REST API in a simple to use library.

By wrapping the IBM Planning Analytics (TM1) REST API in a concise Python framework, TM1py facilitates Python developments for TM1. Interacting with T

Cubewise CODE 147 Dec 15, 2022
Telegram Bot to Filter posts in Bot Inline search

Inline-Filter-Bot A Telegram Bot for filter in Inline Features Unlimited Filters Supports all type of filters Supports Alert Button Using Common Marku

Code X Botz 67 Dec 26, 2022
Rock API is an API that allows you to view rocks and find the ratings on them

Rock API The best Rock API What is Rock API? Rock API is an API that allows you to view rocks and find the ratings on them. However, this isn't a regu

Conos 21 Sep 21, 2022
Unofficial Discord Rich Presence for HackTheBox platform

HTBRichPresence Unofficial Discord Rich Presence for HackTheBox platform The project is under lazy development. How to run Install requirements: // I'

Antonio 4 Apr 19, 2022
A Python wrapper for the Dogehouse API.

Python wrapper for the dogehouse API Installation pip install dogehouse Example from dogehouse import DogeClient, event, command from dogehouse.entiti

Arthur 36 Jun 15, 2022
Script que realiza a identificação de todos os logins e senhas dos wifis conectados em uma máquina e envia os dados para um e-mail especificado.

getWIFIConnection Script que realiza a identificação de todos os logins e senhas dos wifis conectados em uma máquina e envia os dados para um e-mail e

Vinícius Azevedo 3 Nov 27, 2022
Uses discords api to see if a token has a valid payment method.

Discord Payment Checker Uses discords api to see if a token has a valid payment method. Report Bug · Request Feature Features Checks tokens Checks all

dropout 10 Dec 01, 2022