Rootski - Full codebase for rootski.io (without the data)

Overview

breakdown-svg

📣 Welcome to the Rootski codebase!

This is the codebase for the application running at rootski.io.

🗒 Note: You can find information and training on the architecture, ticket board, development practices, and how to contribute on our knowledge base.

Rootski is a full-stack application for studying the Russian language by learning roots.

Rootski uses an A.I. algorithm called a "transformer" to break Russian words into roots. Rootski enriches the word breakdowns with data such as definitions, grammar information, related words, and examples and then displays this information to users for them to study.

How is the Rootski project run? (Hint, get involved here 😃 )

Rootski is developed by volunteers!

We use Rootski as a platform to learn and mentor anyone with an interest in frontend/backend development, developing data science models, data engineering, MLOps, DevOps, UX, and running a business. Although the code is open-source, the license for reuse and redistribution is tightly restricted.

The premise for building Rootski "in the open" is this: possibly the best ways to learn to write production-ready, high quality software is to

  1. explore other high-quality software that is already written
  2. develop an application meant to support a large number of users
  3. work with experienced mentors

For better or worse, it's hard to find code for large software systems built to be hosted in the cloud and used by a large number of customers. This is because virtually all apps that fit this description... are proprietary 🤣 . That makes (1) hard.

(2) can be inaccessible due to the amount of time it takes to write well-written software systems without a team (or mentorship). If you're only interested in a sub-part of engineering, or if you are a beginner, it can be infeasible to build an entire production system on your own. Think of this as working on a personal project... with a bunch of other fun people working on it with you.

Contributors

Onboarded and contributed features :D

  • Eric Riddoch - Been working on Rootski for 3 years and counting!
  • Ryan Gardner - Helping with all of the legal/business aspects and dabbling in development

Friends

Completed a lot of the Rootski onboarding and chat with us in our Slack workspace about miscellanious code questions, careers, advice, etc.

  • Isaac Robbins - Learning and building experience in MLOps and DevOps!
  • Colin Varney - Full-stack python guy. Is working his first full-time software job!
  • Fazleem Baig - MLOps guy. Quite experienced with Python and learning about AWS. Working for an AI startup in Canada.
  • Ayse (Aysha) Arslan - Learning about all things MLOps. Working her first MLE/MLOps job!
  • Sebastian Sanchez - Learning about frontend development.
  • Yashwanth (Yash) Kumar - Finishing up the Georgia Tech online masters in CS.






The Technical Stuff

How to deploy an entire Rootski environment from scratch

Going through this, you'll notice that there are several one-time, manual steps. This is common even for teams with a heavily automated infrastructure-as-code workflow, particularly when it comes to the creation of users and storing of credentials.

Once these steps are complete, all subsequent interactions with our Rootski infrastructure can be done using our infrastructure as code and other automation tools.

1. Create an AWS account and user

  1. Create an IAM user with programmatic access
  2. Install the AWS CLI
  3. Run aws configure --profile rootski and copy the credentials from step (1). Set the region to us-west-2.

🗒 Note: this IAM user will need sufficient permissions to create and access the infrastructure that will be discussed below. This includes creating several types of infrastructure using CloudFormation.

2. Create an SSH key pair

  1. In the AWS console, go to EC2 and create an SSH key pair named rootski.
  2. Download the key pair.
  3. Save the key pair somewhere you won't forget. If the pair isn't already named, I like to rename them and store them at ~/.ssh/rootski/rootski.id_rsa (private key) and ~/.ssh/rootski/rootski.id_rsa.pub (public key).
  4. Create a new GitHub account for a "Machine User". Copy/paste the contents of rootski.id_rsa.pub into any boxes you have to to make this work :D this "machine user" is now authorized to clone the rootski repository!

3. Create several parameters in AWS SSM Parameter Store

Parameter Description
/rootski/ssh/private_key The contents of the private key needed to clone the rootski repository.
/rootski/prod/database_config A stringified JSON object with database connection information (see below)
{
    "postgres_user": "rootski-db-user",
    "postgres_password": "rootski-db-pass",
    "postgres_host": "database.rootski.io",
    "postgres_port": "5432",
    "postgres_db": "rootski-db-database-name"
}

4. Purchase a domain name that happens to be rootski.io

You know, the domain name rootski.io is hard coded in a few places throughout the Rootski infrastructure. It felt wasteful to parameterize this everywhere since... it's unlikely that we will ever change our domain name.

If we ever have a need for this, we can revisit it :D

5. Create an ACM TLS certificate verified with the DNS challenge for *.rootski.io

You'll need to do this in the AWS console. This certificate will allow us to access rootski.io and all of its subdomains over HTTPS. You'll need the ARN of this certificate for a later step.

4. Create the rootski infrastructure

Before running these commands, copy/paste the ARN of the *.rootski.io ACM certificate into the appropriate place in infrastructure/iac/cloudformation/front-end/static-website.yml.

# create the S3 bucket and Route53 hosted zone for hosting the React application as a static site
...

# create the AWS Cognito user pool
...

# create the AWS Lightsail instance with the backend database (simultaneously deploys the database)
...

# deploy the API Gateway and Lambda function
...

5. Deploy the frontend site

make deploy-frontend

DONE!

Owner
Eric
In modern Applied Mathematics, we specialize in algorithms. I'm a data scientist with a strong background in algorithm design and software development.
Eric
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Speech-Backbones This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab. Grad-TTS Official implementation of the Grad-

HUAWEI Noah's Ark Lab 295 Jan 07, 2023
Utilities for preprocessing text for deep learning with Keras

Note: This utility is really old and is no longer maintained. You should use keras.layers.TextVectorization instead of this. Utilities for pre-process

Hamel Husain 180 Dec 09, 2022
Continuously update some NLP practice based on different tasks.

NLP_practice We will continuously update some NLP practice based on different tasks. prerequisites Software pytorch = 1.10 torchtext = 0.11.0 sklear

0 Jan 05, 2022
PyTorch implementation of Tacotron speech synthesis model.

tacotron_pytorch PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron. Currently not as much good speech quality

Ryuichi Yamamoto 279 Dec 09, 2022
Basic yet complete Machine Learning pipeline for NLP tasks

Basic yet complete Machine Learning pipeline for NLP tasks This repository accompanies the article on building basic yet complete ML pipelines for sol

Ivan 20 Aug 22, 2022
Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

PEGASUS library Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models, or PEGASUS, uses self-supervised

Google Research 1.4k Dec 22, 2022
Toward Model Interpretability in Medical NLP

Toward Model Interpretability in Medical NLP LING380: Topics in Computational Linguistics Final Project James Cross ( 1 Mar 04, 2022

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

T-TA (Transformer-based Text Auto-encoder) This repository contains codes for Transformer-based Text Auto-encoder (T-TA, paper: Fast and Accurate Deep

Jeong Ukjae 13 Dec 13, 2022
Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

Build a Discord AI Chatbot that Speaks like Your Favorite Character! This is a Discord AI Chatbot that uses the Microsoft DialoGPT conversational mode

Lynn Zheng 231 Dec 30, 2022
Task-based datasets, preprocessing, and evaluation for sequence models.

SeqIO: Task-based datasets, preprocessing, and evaluation for sequence models. SeqIO is a library for processing sequential data to be fed into downst

Google 290 Dec 26, 2022
Text-to-Speech for Belarusian language

title emoji colorFrom colorTo sdk app_file pinned Belarusian TTS 🐸 green green gradio app.py false Belarusian TTS 📢 🤖 Belarusian TTS (text-to-speec

Yurii Paniv 1 Nov 27, 2021
Natural Language Processing Tasks and Examples.

Natural Language Processing Tasks and Examples With the advancement of A.I. technology in recent years, natural language processing technology has bee

Soohwan Kim 53 Dec 20, 2022
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.5k Dec 05, 2022
This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

FALLABOUT-SRMMIC 21 POETRY-GENERATION HINGLISH DESCRIPTION We have developed a NLP(natural language processing) model which automatically generates a

7 Sep 28, 2021
Simple python code to fix your combo list by removing any text after a separator or removing duplicate combos

Combo List Fixer A simple python code to fix your combo list by removing any text after a separator or removing duplicate combos Removing any text aft

Hamidreza Dehghan 3 Dec 05, 2022
Unlimited Call - Text Bombing Tool

FastBomber Unlimited Call - Text Bombing Tool Installation On Termux

Aryan 6 Nov 10, 2022
Leon is an open-source personal assistant who can live on your server.

Leon Your open-source personal assistant. Website :: Documentation :: Roadmap :: Contributing :: Story 👋 Introduction Leon is an open-source personal

Leon AI 11.7k Dec 30, 2022
Installation, test and evaluation of Scribosermo speech-to-text engine

Scribosermo STT Setup Scribosermo is a LGPL licensed, open-source speech recognition engine to "Train fast Speech-to-Text networks in different langua

Florian Quirin 3 Jun 20, 2022
Chinese Pre-Trained Language Models (CPM-LM) Version-I

CPM-Generate 为了促进中文自然语言处理研究的发展,本项目提供了 CPM-LM (2.6B) 模型的文本生成代码,可用于文本生成的本地测试,并以此为基础进一步研究零次学习/少次学习等场景。[项目首页] [模型下载] [技术报告] 若您想使用CPM-1进行推理,我们建议使用高效推理工具BMI

Tsinghua AI 1.4k Jan 03, 2023
This simple Python program calculates a love score based on your and your crush's full names in English

This simple Python program calculates a love score based on your and your crush's full names in English. There is no logic or reason in the calculation behind the love score. The calculation could ha

p.katekomol 1 Jan 24, 2022