Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
Gathers data and displays metrics related to climate change and resource depletion on a PowerBI report.

Apocalypse Status Dashboard Purpose Climate change and resource depletion are grave long-term dangers. The code in this repository will pull data from

Summer Is Here 1 Nov 12, 2021
:electric_plug: Generating short urls with python has never been easier

pyshorteners A simple URL shortening API wrapper Python library. Installing pip install pyshorteners Documentation https://pyshorteners.readthedocs.i

Ellison 351 Jan 03, 2023
A WhatsApp Crashing Tool for Termux

CrashW A WhatsApp Crashing Tool For Termux Users Installing : apt update && apt upgrade -y pkg install python3 pkg install git git clone git://gith

Gokul Mahato 20 Dec 27, 2022
Discord-Mass-Mention - Yup the title says it all

Protocol - Mass Mention (i havent tested this with any token other than my own t

Mallowies 14 Nov 06, 2022
Wrapper for the Swiss Parliament API for Python

swissparlpy This module provides easy access to the data of the OData webservice of the Swiss parliament. Table of Contents Installation Usage Get tab

Stefan Oderbolz 8 Jun 13, 2022
Adriano's Diets Consulting Bot - Parses and extracts informations about your diet (files in the Adriano's format).

Adriano's Diets Consulting Bot - Parses and extracts informations about your diet (files in the Adriano's format).

Marco A. 2 Feb 07, 2022
Parse discord tokens from any file, even if there is other shit in the file with them.

Discord-Token-Parser Parse discord tokens from any file, even if there is other shit in the file with them. Any. File. I glued together all html from

4 May 07, 2022
MusicBot is the original Discord music bot written for Python 3.5+, using the discord.py library

The original MusicBot for Discord (formerly SexualRhinoceros/MusicBot)

Just Some Bots 2.9k Jan 02, 2023
PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

PRAW: The Python Reddit API Wrapper PRAW, an acronym for "Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's AP

Python Reddit API Wrapper Development 3k Dec 29, 2022
Sms-bomber - A Simple Browser Automated Bomber

A Simple Browser Automated Bomber which uses selenium :D Star the Repo and Follo

Terminal1337 9 Apr 11, 2022
Status-embed - Cool open source profile embed for Discord

Current Status : Incomplete Status Embed Status Embed is an awesome open source

Ritabrata Das 2 Feb 17, 2022
Discord E-Store Bot

A delivery bot for Discord, works like Amazon where real users can pack & deliver orders in different servers!

Amit Pathak 2 Jan 28, 2022
The Research PACS on AWS solution facilitates researchers' access medical images stored in the clinical PACS in a secure and seamless manner

Research PACS on AWS Challenge to solve Solution presentation Deploy the solution Further reading Releases License Challenge to solve The rise of new

AWS Samples 23 Sep 09, 2022
Python library for generating sequences with uniform stimulus history

Sampling Euler tours for uniform stimulus history Table of Contents About Examples Experiment 1 Experiment 2 Experiment 3 Experiment 4 Experiment 5 Co

5 Nov 11, 2021
A Discord bot that rewards players in Minecraft for sending messages on Discord

MCRewards-Discord-Bot A Discord bot that rewards players in Minecraft for sending messages on Discord How to setup: Download this git as a .zip, or cl

3 Dec 26, 2021
Python library for Spurwing API to schedule appointments, manage calendars and custom integrations.

Spurwing API Python Library Lightweight Python library for Spurwing's API. Spurwing's API makes it easy to add robust scheduling and booking to your a

Spurwing 1 Jul 14, 2021
⚡ Simple mass dm selfbot for Discord written in python3.

Zapp Simple mass dm selfbot for Discord written in python3. Warning. This project was made for educational purposes only! I take no responsibility for

Ѵιcнч 34 Nov 01, 2022
Full-featured Python interface for the Slack API

This repository is archived and will not receive any updates It's time to say goodbye. I'm archiving Slacker. It's been getting harder to find time to

Oktay Sancak 1.6k Dec 13, 2022
A python Discord wrapper made in well, python.

discord.why A python Discord wrapper made in well, python. Made to be used by devs who want something a bit more, general. Basic Examples Sending a me

HellSec 6 Mar 26, 2022
A customizable, multilanguage Telegram shop bot with Telegram Payments support

Greed A customizable, multilanguage Telegram shop bot with Telegram Payments support! Demo Send a message to @greedtestbot on Telegram to view a demo

Stefano Pigozzi 328 Dec 29, 2022