Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
An open-source, multipurpose, configurable discord bot that does it all

Spacebot is an open source discord bot that is designed to be fun, easy to use, and replace every other discord bot out there!! Feel free to add a star ⭐ to the repository to promote the project!

Dhravya Shah 41 Dec 10, 2022
A modular Telegram Python bot running on python3 with a sqlalchemy database

Nao Tomori Robot Found Me On Telegram As Nao Tomori 🌼 A modular Telegram Python bot running on python3 with a sqlalchemy database. How to setup/deplo

Sena 84 Jan 04, 2023
WeChat SDK for Python

___ __ _______ ________ ___ ___ ________ _________ ________ ___ ___ |\ \ |\ \|\ ___ \ |\ ____\|\ \|\ \|\ __ \|\___

wechatpy 3.3k Dec 26, 2022
A Python client for the Softcite software mention recognizer server

Softcite software mention recognizer client Python client for using the Softcite software mention recognition service. It can be applied to individual

4 Feb 02, 2022
Validate all your Customer IAM Policies against AWS Access Analyzer - Policy Validation

✅ Access Analyzer - Batch Policy Validator This script will analyze using AWS Access Analyzer - Policy Validation all your account customer managed IA

Victor GRENU 41 Dec 12, 2022
Use Node JS Keywords In Python!!!

Use Node JS Keywords In Python!!!

Sancho Godinho 1 Oct 23, 2021
a discord libary that use to make discord bot with low efficiency and bad performance because I don't know how to manage the project

Aircord 🛩️ a discord libary that use to make discord bot with low efficiency and bad performance because I don't know how to manage the project Examp

Aircord 2 Oct 24, 2021
Project made to analyse movie trends

MovieTrends Project to analyse the daily movie trends from the website The Movie DataBase. The main idea is upload the results to a PostgreSQL server

Jazmín López Chacón 0 Feb 15, 2022
CryptoBar - A simple MenuBar app that shows the price of 3 cryptocurrencies

CryptoBar A very simple MenuBar app that shows the price of the following crypto

4 Jul 04, 2022
A simple Facebook Account generator, written in python (needs different Email so Accounts do not get banned)

FacebookAccountGenerator FAB is a Facebook-Account generating script, written in python Installation Use the package manager pip to install selenium p

MrOverload 7 Jan 05, 2023
RequestTrackerBot - Request Tracker Bot With Python

Request Tracker Bot This is a Request Tracker Bot repo, It is for those who uplo

Prince Jaiswal 1 Dec 30, 2021
Role Discord Members (by username) from File

Role Discord Members (by username) from File Bot Setup Navigate to https://discord.com/developers/applications Create a new application Navigate to th

Dylan Orrell 3 Jan 06, 2022
A wrapper for aqquiring Choice Coin directly through a Python Terminal. Leverages the TinyMan Python-SDK.

CHOICE_TinyMan_Wrapper A wrapper that allows users to acquire Choice Coin directly through their Terminal using ALGO and various Algorand Standard Ass

Choice Coin 16 Sep 24, 2022
A modular telegram Python bot running on python3 with an sqlalchemy database.

Saber A modular telegram Python bot running on python3 with an sqlalchemy database. Originally a marie fork - Saber has evolved further and was built

ZERO • アクバル . 4 Nov 09, 2021
Bot that embeds a random hysterical meme from Reddit into your text channel as an embedded message, using an API call.

Discord_Meme_Bot 🤣 Bot that embeds a random hysterical meme from Reddit into your text channel as an embedded message, using an API call. Add the bot

2 Jan 16, 2022
A free, minimal, lightweight, cross-platform, easily expandable Twitch IRC/API bot.

parky's twitch bot A free, minimal, lightweight, cross-platform, easily expandable Twitch IRC/API bot. Features 🔌 Connect to Twitch IRC chat! 🔌 Conn

Andreas Schneider 10 Dec 30, 2022
🤖 Telegram UserBot Untuk Memutar Lagu Dan Video Di Obrolan Suara Telegram.

🤖 Telegram UserBot Untuk Memutar Lagu Dan Video Di Obrolan Suara Telegram.

Fariz 2 Nov 13, 2021
Say "good morning" on Discord, in batch, one-click.

🌞 gm Good Morning! Usage Simply copy the channel_list to gm.py and fill authorization_list with authorization token(s). Enjoy. Authorization Please r

e 3 Nov 18, 2022
Request based Python module(s) to help with the Newegg raffle.

Newegg Shuffle Python module(s) to help you with the Newegg raffle How to use $ git clone https://github.com/Matthew17-21/Newegg-Shuffle $ cd Newegg-S

Matthew 45 Dec 01, 2022
An async python wrapper to interact with the Steam API and its CMs

steam.py A modern, easy to use, and async ready package to interact with the Steam API. Heavily inspired by discord.py and borrowing functionality fro

James Hilton-Balfe 90 Dec 15, 2022