Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
Wrapper for the Swiss Parliament API for Python

swissparlpy This module provides easy access to the data of the OData webservice of the Swiss parliament. Table of Contents Installation Usage Get tab

Stefan Oderbolz 8 Jun 13, 2022
A little proxy tool based on Tencent Cloud Function Service.

SCFProxy 一个基于腾讯云函数服务的免费代理池。 安装 python3 -m venv .venv source .venv/bin/activate pip3 install -r requirements.txt 项目配置 函数配置 开通腾讯云函数服务 在 函数服务 新建 中使用自定义

Mio 716 Dec 26, 2022
AI-El-Yazisini-Tanima - Fotoğraflardaki El Yazını Yapay Zeka İle Otomatik Tanıma Yazılımı

AI-El Yazısını Tanıma Fotoğraflardaki El Yazını Yapay Zeka İle Otomatik Tanıma Yazılımı Amaç : Birden fazla makine öğrenmesi modelini bir arada kullan

Özgür Tokay 3 Mar 02, 2022
Netflix Movies and TV Series Downloader Tool including CDM L1 which you guys can Donwload 4K Movies

NFRipper2.0 I could not shared all the code here Because its has lots of files inisde it https://new.gdtot.me/file/86651844 - Downoad File From Here.

Kiran 15 May 06, 2022
Pretend to be a discord bot

Pretendabot © Pretend to be a discord bot! About Pretendabot© is an app that lets you become a discord bot!. It uses discord intrigrations(webhooks) a

Advik 3 Apr 24, 2022
Turns any script into a telegram bot

pytobot Turns any script into a telegram bot Install pip install --upgrade pytobot Usage Script: while True: message = input() if message == "

Dmitry Kotov 17 Jan 06, 2023
Lending-Club-Loans - Using TensorFlow to create an ANN model to predict whether people would charge off or pay back their loans.

Lending Club Loans: Brief Introduction LendingClub is a US peer-to-peer lending company, headquartered in San Francisco, California.[3] It was the fir

Ali Akram 1 Jan 03, 2022
A bot that connects your guild chat to a Discord channel, written in Python.

Guild Chat Bot A bot that connects your guild chat to a discord channel. Uses discord.py and pyCraft Deploy on Railway Railway is a cloud development

Evernote 10 Sep 25, 2022
Py hec token mgr - Create HEC tokens in Cribl Stream through the API

Add HEC tokens via API calls This script is intended as an example of how to aut

Jon Rust 3 Mar 04, 2022
Signs API calls to SberCloud.Advanced with AK/SK

sbercloud-api-aksk Signs API calls to SberCloud.Advanced with AK/SK This script is a courtesy of @sadpdtchr Description Sometimes there is a need to m

Peter Predtechensky 1 Nov 30, 2021
使用appium进行抖音粉丝的自动化获取

DYfans 使用appium进行抖音粉丝的自动化获取 工具: appium appium inspector Fiddler 夜神模拟器或者安卓手机 mitmdump mitmproxy 推荐使用安卓5.0夜神模拟器 库: appium selenium json 环境: jdk 安卓sdk 安卓

kaba 0 Mar 25, 2022
Stock market bot that will be used to learn about API calls and database connections.

Stock market bot that will be used to learn about API calls and database connections.

1 Dec 24, 2021
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Modern Data Lake Storage Layers This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache

Damon P. Cortesi 25 Oct 31, 2022
Black-hat with python

black-hat_python Advantages - More advance tool Easy to use allows updating tool update - run bash update.sh Here -: Command to install tool main- clo

Hackers Tech 2 Feb 10, 2022
Georeferencing large amounts of data for free.

Geolocate Georeferencing large amounts of data for free. Special thanks to @brunodepauloalmeida and the whole team for the contributions. How? It's us

Gabriel Gazola Milan 23 Dec 30, 2022
A Discord Rich Presence App to set your own custom rich presence.

discord-rich-presence A Discord Rich Presence App to set your own custom rich presence. #BUILDS Ready to use package are available inside "finalpackag

1 Nov 22, 2021
A Fork of Gitlab's Permifrost tool for managing Snowflake Permissions

permifrost-fork This is a fork of the GitLab permifrost project. As the GitLab team is not currently maintaining the project, we've taken on maintenac

Hightouch 7 Oct 13, 2021
A lightweight Python wrapper for the IG Markets API

trading_ig A lightweight Python wrapper for the IG Markets API. Simplifies access to the IG REST and Streaming APIs with a live or demo account. What

IG Python 247 Dec 08, 2022
A python script to send sms anonymously with SMS Gateway API. Works on command line terminal.

incognito-sms-sender A python script to send sms anonymously with SMS Gateway API. Works on command line terminal. Download and run script Go to API S

ʀᴇxɪɴᴀᴢᴏʀ 1 Oct 25, 2021