Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWSย EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
๐€ ๐ฆ๐จ๐๐ฎ๐ฅ๐š๐ซ ๐“๐ž๐ฅ๐ž๐ ๐ซ๐š๐ฆ ๐†๐ซ๐จ๐ฎ๐ฉ ๐ฆ๐š๐ง๐š๐ ๐ž๐ฆ๐ž๐ง๐ญ ๐›๐จ๐ญ ๐ฐ๐ข๐ญ๐ก ๐ฎ๐ฅ๐ญ๐ข๐ฆ๐š๐ญ๐ž ๐Ÿ๐ž๐š๐ญ๐ฎ๐ซ๐ž๐ฌ

๐‡๐จ๐ฐ ๐“๐จ ๐ƒ๐ž๐ฉ๐ฅ๐จ๐ฒ For easiest way to deploy this Bot click on the below button ๐Œ๐š๐๐ž ๐๐ฒ ๐’๐ฎ๐ฉ๐ฉ๐จ๐ซ๐ญ ๐†๐ซ๐จ๐ฎ๐ฉ ๐’๐จ๐ฎ๐ซ๐œ๐ž๐ฌ ๐†๐ž๐ง๐ž?

Mukesh Solanki 2 Oct 06, 2021
A Python wrapper for the DeepL API

deepl.py A Python wrapper for the DeepL API installing Install and update using pip: pip install deepl.py A simple example. # Sync Sample import deep

grarich 18 Dec 12, 2022
RChecker - Checker for minecraft servers

๐Ÿ”Ž RChecker v1.0 Checker for Minecraft Servers ๐Ÿ’ป Supported operating systems: โœ…

Pedro Vega 1 Aug 30, 2022
Generate discord nitro codes and check them

Discord Nitro Generator and Checker A discord nitro generator and checker for all your nitro needs Explore the docs ยป Report Bug ยท Request Feature ยท J

509 Jan 02, 2023
The best discord Nuk3r !

Discord - Nuker โ˜ข๏ธ Nuk3r โ˜ข๏ธ STEP 1 โœ… We go create discord bot ! [] Go on https://discord.com/developers/applications [] Set the name of your applica

2s.py 1 Apr 16, 2022
A simple tool that lets you know when you are out of Lost Ark's queues

Overview A simple tool that lets you know when you are out of Lost Ark's queues. You can be notified via: Sound: the app will play a sound Discord web

Nelson 3 Feb 15, 2022
Telegram bot to provide links of different types of files you send

File To Link Bot - IDN-C-X Telegram bot to provide links of different types of files you send. WHAT CAN THIS BOT DO Is it a nuisance to send huge file

IDNCoderX 3 Oct 26, 2021
Yes, it's true :yellow_heart: This repository has 326 stars.

Yes, it's true! Inspired by a similar repository from @RealPeha, but implemented using a webhook on AWS Lambda and API Gateway, so it's serverless! If

510 Dec 28, 2022
Implement backup and recovery with AWS Backup across your AWS Organizations using a CI/CD pipeline (AWS CodePipeline).

Backup and Recovery with AWS Backup This repository provides you with a management and deployment solution for implementing Backup and Recovery with A

AWS Samples 8 Nov 22, 2022
Binance Futures Client

Binance Futures Client

4 Aug 02, 2022
Cookiecutter templates for Serverless applications using AWS SAM and the Rust programming language.

Cookiecutter SAM template for Lambda functions in Rust This is a Cookiecutter template to create a serverless application based on the Serverless Appl

AWS Samples 24 Nov 11, 2022
An API wrapper for the file.io web service.

๐Ÿ—ƒ๏ธ File.io An API wrapper for the file.io web service. Install $ pip3 install fileio or

nkot56297 1 Dec 18, 2021
Discord Bot for server hosts, devs, and admins. Analyzes timings reports & uploads text files to hastebin. Developed by https://birdflop.com.

"Botflop" Click here to invite Botflop to your server. Current abilities Analyze timings reports Paste a timings report to review an in-depth descript

Purpur 76 Dec 31, 2022
Flask-SQLAlchemy API for daisuki-web

๐Ÿ’Ÿ Anime Daisuki! API API de animes com cadastro de usuรกrios. O usuรกrio autenticado pode avaliar e favoritar animes, comentar episรณdios e verificar o

Paulo Thor 1 Nov 04, 2021
A supabase client for python

supabase-client A Supabase client for Python. This mirrors the design of supabase-js Full documentation: https://keosariel.github.io/2021/08/08/supaba

kenneth gabriel 11 Dec 19, 2022
โ›‘ REDCap API interface in Python

REDCap API in Python Description Supports structured data extraction for REDCap projects. The API module d3b_redcap_api.redcap.REDCapStudy can be logi

D3b 1 Nov 21, 2022
Free and Open Source Group Voice chat music player for telegram โค๏ธ with button support youtube playback support

Free and Open Source Group Voice chat music player for telegram โค๏ธ with button support youtube playback support

Sehath Perera 1 Jan 08, 2022
A multi-purpose Discord bot with simple moderation commands, reaction roles, reminders, and much more!

Nokari This is the rewrite of Nokari. There are still a lot of things to be done. I'm still working on the internal logic, so the bot basically has no

Norizon 13 Nov 17, 2022
Python wrappers for INHECO ODTC and SCILA libraries by INHECO GmbH.

Python wrappers for INHECO ODTC and SCILA libraries by INHECO GmbH.

1 Feb 09, 2022