Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
toldium is a modular, fast, reliable and customizable multiplatform bot library for your communities

toldium The easy multiplatform bot toldium is a modular, fast, reliable and customizable multiplatform bot library for your communities, from a commun

Stockdroid Fans 5 Nov 03, 2021
This discord bot preview user 42intra login picture.

42intra_Pic BOT This discord bot preview user 42intra login picture. created by: @YOPI#8626 Using: Python 3.9 (64-bit) (You don't need 3.9 but some fu

Zakaria Yacoubi 7 Mar 22, 2022
Open API to list Viet Nam administrative divisions

Viet Nam province API Homepage: https://provinces.open-api.vn This is online tool to let my VietnamProvinces library reach more users. VietnamProvince

Nguyễn Hồng Quân 52 Dec 05, 2022
OpenEmu Discord Rich Presence provided with Python!

A simple application that provides your current OpenEmu game as an RPC state in Discord via PyPresence. How to use Unzip and open the latest x86_64 ve

Deltaion Lee 6 May 30, 2022
A simple worker for OpenClubhouse to sync data.

OpenClubhouse-Worker This is a simple worker for OpenClubhouse to sync CH channel data.

100 Dec 17, 2022
Policy and data administration, distribution, and real-time updates on top of Open Policy Agent

⚡ OPAL ⚡ Open Policy Administration Layer OPAL is an administration layer for Open Policy Agent (OPA), detecting changes to both policy and policy dat

8 Dec 07, 2022
BlueMoonVampireBot - A Telegram Antispam Based Bot

Blue Moon Vampire Bot An Telegram Antispam Based Bot A Pyogram Bot to make banne

13 Nov 24, 2022
Source code for Profile REST API

PROJECT PROFILE REST API Creating local development server: We will create a local development server that can run and test our API as we build it. We

1 Mar 29, 2022
Palo Alto Networks PAN-OS SDK for Python

Palo Alto Networks PAN-OS SDK for Python The PAN-OS SDK for Python (pan-os-python) is a package to help interact with Palo Alto Networks devices (incl

Palo Alto Networks 281 Dec 09, 2022
An advanced crypto trading bot written in Python

Jesse Jesse is an advanced crypto trading framework which aims to simplify researching and defining trading strategies. Why Jesse? In short, Jesse is

Jesse 4.4k Jan 09, 2023
Python3 wrapper for the Sibyl System antispam API for telegram

SibylSystem-Py Python3 wrapper for the Sibyl System antispam API for telegram Installation pip install sibylsystem Usage from SibylSystem import

Kaizoku 6 Nov 04, 2022
Telegram bot to trim and download videos from youtube.

Inline-YouTube-Trim-Bot Telegram bot to trim and download youtube videos Deploy You can deploy this bot anywhere. Demo - YouTubeBot Required Variables

SUBIN 56 Dec 11, 2022
trading strategy for freqtrade crypto bot it base on CDC-ActionZone

ft-action-zone trading strategy for freqtrade crypto bot it base on CDC-ActionZone Indicator by piriya33 Clone The Repository if you just clone this r

Miwtoo 17 Aug 13, 2022
Basic-Discord-Response-Bot, in Python

Response bot for Discord. EG: User: Hello! Bot: Hello there! About Very customizable, no credits needed. Edit the bot.py to what you want, basic Pytho

rhys 1 Nov 20, 2021
A telegram mirror bot with an integrated RSS feed reader.

About What is this repo? This is a slightly modified fork which includes some extra features & memes added to my liking. How's this different from the

11 May 15, 2022
❝𝐓𝐡𝐞 𝐌𝐨𝐬𝐭 𝐏𝐨𝐰𝐞𝐫𝐟𝐮𝐥𝐥 𝐆𝐫𝐨𝐮𝐩 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐁𝐨𝐭❞

❝𝐓𝐡𝐞 𝐌𝐨𝐬𝐭 𝐏𝐨𝐰𝐞𝐫𝐟𝐮𝐥𝐥 𝐆𝐫𝐨𝐮𝐩 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐁𝐨𝐭❞

Abdisamad Omar Mohamed 5 Jun 24, 2022
An implementation of webhook used to notify GitHub repository events to DingTalk.

GitHub to DingTask An implementation of webhook used to notify GitHub repository events to DingTalk.

Prodesire 5 Oct 02, 2022
An instagram bot developed in Python with Selenium that helps you get more Instagram followers.

instabot An instagram bot developed in Python with Selenium that helps you get more Instagram followers. Install You’ll need to have: Python Selenium

65 Nov 22, 2022
arweave-nft-uploader is a Python tool to improve the experience of uploading NFTs to the Arweave storage for use with the Metaplex Candy Machine.

arweave-nft-uploader arweave-nft-uploader is a Python tool to improve the experience of uploading NFTs to the Arweave storage for use with the Metaple

0xEnrico 84 Dec 26, 2022
pymobiledevice fork with more recent coding standards and many more features

Description Features Installation Usage Sending your own messages Lockdown messages Instruments messages Example Lockdown services com.apple.instrumen

255 Dec 28, 2022