Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

Overview

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

The script can be used for any channel or video for scraping, in addition will provide you with the option to get any automatic captions. Automatic captions are available in Dutch, English, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, Vietnamese and more or any, so use it as you wish.

usage:

pip install youtube_transcript_api scrapetube codext

for default channel

python tube.py 

Custom channel

python tube.py UCSs5vZi0U7qHLkUjF3QnaWg

Get all videos for a channel

import scrapetube

videos = scrapetube.get_channel("UCCezIgC97PvUuR4_gbFUs5g")

for video in videos:
    print(video['videoId'])

Filter for manually created transcripts

transcript = transcript_list.find_manually_created_transcript(['de', 'en'])

or automatically generated ones

transcript = transcript_list.find_generated_transcript(['de', 'en'])

The methods find_generated_transcript, find_manually_created_transcript, find_generated_transcript return Transcript objects. They contain metadata regarding the transcript:

print(
    transcript.video_id,
    transcript.language,
    transcript.language_code,
    # whether it has been manually created or generated by YouTube
    transcript.is_generated,
    # whether this transcript can be translated or not
    transcript.is_translatable,
    # a list of languages the transcript can be translated to
    transcript.translation_languages,
)

Codext, contraction of "codecs" and "extension", is a tiny library that gathers a few additional encodings for use with codecs. While imported, it registers new encodings to a proxy codecs registry for making the encodings available from the codecs.(decode|encode|open) calls.

Currently set on Braille codext.encode("Little Endian", "braille") accept even morse

Codecs categories

  • native: the built-in codecs from the original codecs package
  • non-native: this special category regroups all the categories mentioned hereafter
  • base: baseX codecs (e.g. base, base100)
  • binary: codecs working on strings but applying their algorithms on their binary forms (e.g. baudot, manchester)
  • common: common codecs not included in the native ones or simly added for the purpose of standardization (e.g. octal, ordinal)
  • crypto: codecs related to cryptography algorithms (e.g. barbie, rot, xor)
  • language: language-related codecs (e.g. morse, navajo)
  • other: uncategorized codecs (e.g. letters, url)
  • stegano: steganography-related codecs (e.g. sms, resistor)
  • Except the native and non-native categories, the other ones are simply the name of the subdirectories (with "s" right-stripped) of the codext package.
codext.list("binary")
['baudot', 'baudot-spaced', 'baudot-tape', 'bcd', 'bcd-extended0', 'bcd-extended1', 'excess3', 'gray', 'manchester', 'manchester-inverted']
codext.list("language")
['braille', 'leet', 'morse', 'navajo', 'radio', 'southpark', 'southpark-icase', 'tom-tom']
codext.list("native")
['ascii', 'base64_codec', 'big5', 'big5hkscs', 'bz2_codec', 'cp037', 'cp273', 'cp424', 'cp437', 'cp500', 'cp775', 'cp850', 'cp852', 'cp855', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', ...]

Current channels for scrapping the transcript subtitles in English language and translate them to Braille language

Up to you list, just replace the Youtube channel ID string at 🤯

videoListName = scrapetube.get_channel("UClnw_bcNg4CAzF772qEtq4g")

YouTube uses automatic speech recognition to add automatic captions to videos. The feature is available in English, Dutch, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish. ASR is not available for all videos.

You can eding the language at 😇

transcript = transcript_list.find_generated_transcript(['en']).fetch()

Example output:

https://www.youtube.com/watch?v=ouMK-Q9S7cc
Web3 Foundation - The Next Evolution of the Internet - Dr. Gavin Wood
⠺⠑⠃⠒⠀⠋⠕⠥⠝⠙⠁⠞⠊⠕⠝⠀⠤⠀⠞⠓⠑⠀⠝⠑⠭⠞⠀⠑⠧⠕⠇⠥⠞⠊⠕⠝⠀⠕⠋⠀⠞⠓⠑⠀⠊⠝⠞⠑⠗⠝⠑⠞⠀⠤⠀⠙⠗⠨⠀⠛⠁⠧⠊⠝⠀⠺⠕⠕⠙
⠊⠀⠞⠓⠊⠝⠅⠀⠞⠓⠑⠗⠑⠀⠺⠑⠗⠑⠀⠁⠀⠇⠕⠞⠀⠕⠋⠀⠏⠑⠕⠏⠇⠑⠀⠞⠓⠁⠞⠀⠗⠑⠁⠇⠇⠽⠀⠃⠑⠇⠊⠑⠧⠑⠙⠀⠞⠓⠑⠀⠊⠝⠞⠑⠗⠝⠑⠞⠀⠺⠁⠎⠀⠺⠁⠎⠀⠛⠕⠝⠝⠁⠀⠃⠑⠀⠁⠀⠞⠗⠁⠝⠎⠋⠕⠗⠍⠁⠞⠊⠧⠑⠀⠞⠑⠉⠓⠝⠕⠇⠕⠛⠽⠀⠋⠕⠗⠀⠎⠕⠉⠊⠑⠞⠽⠀⠁⠝⠙⠀⠊⠀⠞⠓⠊⠝⠅⠀⠺⠓⠁⠞⠀⠓⠁⠏⠏⠑⠝⠑⠙⠀⠺⠁⠎⠀⠞⠓⠑⠀⠊⠝⠞⠑⠗⠝⠑⠞⠀⠺⠁⠎⠀⠙⠑⠎⠊⠛⠝⠑⠙⠀⠊⠝⠀⠎⠥⠉⠓⠀⠁⠀⠺⠁⠽⠀⠞⠓⠁⠞⠀⠊⠞⠀⠁⠇⠇⠕⠺⠑⠙⠀⠊⠞⠀⠺⠁⠎⠀⠋⠇⠑⠭⠊⠃⠇⠑⠀⠊⠞⠀⠁⠇⠇⠕⠺⠑⠙⠀⠑⠭⠊⠎⠞⠊⠝⠛⠀⠎⠞⠗⠥⠉⠞⠥⠗⠑⠎⠀⠕⠋⠀⠎⠕⠉⠊⠑⠞⠽⠀⠑⠭⠊⠎⠞⠊⠝⠛⠀⠺⠁⠽⠎⠀⠕⠋⠀⠙⠕⠊⠝⠛⠀⠃⠥⠎⠊⠝⠑⠎⠎⠀⠞⠕⠀⠎⠊⠍⠏⠇⠽⠀⠍⠕⠧⠑⠀⠕⠧⠑⠗⠀⠕⠝⠞⠕⠀⠞⠓⠑⠀⠙⠊⠛⠊⠞⠁⠇⠀⠙⠕⠍⠁⠊⠝⠀⠎⠕⠀⠺⠓⠑⠝⠀⠺⠑⠀⠙⠕⠀⠃⠁⠝⠅⠊⠝⠛⠀⠕⠝⠀⠞⠓⠑⠀⠊⠝⠞⠑⠗⠝⠑⠞⠀⠺⠑⠀⠎⠞⠊⠇⠇⠀⠥⠎⠑⠀⠁⠀⠃⠁⠝⠅⠀⠺⠑⠀⠎⠞⠊⠇⠇⠀⠥⠎⠑⠀⠕⠥⠗⠀⠑⠭⠊⠎⠞⠊⠝⠛⠀⠃⠗⠊⠉⠅⠤⠁⠝⠙⠤⠍⠕⠗⠞⠁⠗⠀⠞⠗⠁⠙⠊⠞⠊⠕⠝⠁⠇⠀⠲⠴⠴⠀⠽⠑⠁⠗⠀⠕⠇⠙⠀⠃⠁⠝⠅⠊⠝⠛⠀⠕⠗⠛⠁⠝⠊⠵⠁⠞⠊⠕⠝⠀⠊⠞⠄⠎⠀⠚⠥⠎⠞⠀⠞⠓⠁⠞⠀⠺⠑⠀⠁⠉⠉⠑⠎⠎⠀⠞⠓⠑⠍⠀⠞⠓⠗⠕⠥⠛⠓⠀⠁⠀⠺⠑⠃⠀⠏⠁⠛⠑⠀⠊⠞⠀⠓⠁⠎⠝⠄⠞⠀⠗⠑⠁⠇⠇⠽⠀⠁⠇⠞⠑⠗⠑⠙⠀⠎⠕⠉⠊⠑⠞⠽⠀⠊⠞⠀⠗⠑⠁⠇⠇⠽⠀⠺⠁⠎⠝⠄⠞⠀⠞⠗⠁⠝⠎⠋⠕⠗⠍⠁⠞⠊⠧⠑⠀⠁⠝⠙⠀⠊⠀⠞⠓⠊⠝⠅⠀⠞⠓⠁⠞⠄⠎⠀⠞⠓⠁⠞⠄⠎⠀⠑⠧⠑⠗⠍⠕⠗⠑⠀⠉⠇⠑⠁⠗⠀⠺⠓⠑⠝⠀⠺⠑⠀⠺⠓⠑⠝⠀⠺⠑⠀⠞⠓⠊⠝⠅⠀⠁⠃⠕⠥⠞⠀⠋⠁⠉⠑⠃⠕⠕⠅⠀⠁⠝⠙⠀⠺⠑⠀⠞⠓⠊⠝⠅⠀⠁⠃⠕⠥⠞⠀⠛⠕⠕⠛⠇⠑⠀⠞⠓⠑⠎⠑⠀⠁⠗⠑⠀⠝⠕⠞⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠺⠕⠗⠅⠊⠝⠛⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠏⠑⠕⠏⠇⠑⠀⠺⠕⠗⠅⠊⠝⠛⠀⠞⠕⠛⠑⠞⠓⠑⠗⠀⠊⠝⠀⠗⠑⠁⠇⠊⠞⠽⠀⠞⠓⠑⠽⠄⠗⠑⠀⠞⠓⠑⠀⠎⠁⠍⠑⠀⠅⠊⠝⠙⠎⠀⠕⠋⠀⠎⠞⠗⠥⠉⠞⠥⠗⠑⠎⠀⠞⠓⠁⠞⠀⠞⠓⠑⠀⠎⠁⠍⠑⠀⠓⠊⠑⠗⠁⠗⠉⠓⠊⠉⠁⠇⠀⠕⠗⠛⠁⠝⠊⠵⠁⠞⠊⠕⠝⠎⠀⠞⠓⠁⠞⠀⠓⠁⠧⠑⠀⠞⠓⠑⠀⠎⠁⠍⠑⠀⠉⠑⠝⠞⠗⠁⠇⠊⠵⠑⠙⠀⠃⠁⠝⠅⠀⠁⠉⠉⠕⠥⠝⠞⠎⠀⠞⠓⠁⠞⠀⠓⠁⠧⠑⠀⠞⠓⠑⠀⠎⠁⠍⠑⠀⠎⠕⠗⠞⠀⠕⠋⠀⠍⠥⠇⠞⠊⠝⠁⠞⠊⠕⠝⠁⠇⠀⠎⠞⠗⠥⠉⠞⠥⠗⠑⠀⠁⠎⠀⠁⠇⠇⠀⠕⠋⠀⠞⠓⠑⠀⠧⠁⠗⠊⠕⠥⠎⠀⠕⠞⠓⠑⠗⠀⠋⠕⠗⠞⠥⠝⠑⠀⠢⠴⠴⠀⠉⠕⠗⠏⠕⠗⠁⠞⠑⠀⠉⠕⠍⠏⠁⠝⠊⠑⠎⠀⠊⠝⠀⠗⠑⠁⠇⠊⠞⠽⠀⠞⠕⠀⠉⠓⠁⠝⠛⠑⠀⠎⠕⠉⠊⠑⠞⠽⠀⠺⠑⠀⠗⠑⠁⠇⠇⠽⠀⠝⠑⠑⠙⠀⠞⠕⠀⠙⠕⠀⠎⠕⠍⠑⠞⠓⠊⠝⠛⠀⠃⠑⠞⠞⠑⠗⠀⠞⠓⠁⠝⠀⠉⠗⠑⠁⠞⠊⠝⠛⠀⠞⠑⠉⠓⠝⠕⠇⠕⠛⠊⠑⠎⠀⠞⠓⠁⠞⠀⠚⠥⠎⠞⠀⠁⠇⠇⠕⠺⠀⠥⠎⠀⠞⠕⠀⠍⠊⠗⠗⠕⠗⠀⠓⠕⠺⠀⠎⠕⠉⠊⠑⠞⠽⠀⠺⠕⠗⠅⠎⠀⠁⠝⠽⠺⠁⠽⠀⠺⠑⠀⠝⠑⠑⠙⠀⠞⠕⠀⠉⠗⠑⠁⠞⠑⠀⠞⠑⠉⠓⠝⠕⠇⠕⠛⠊⠑⠎⠀⠞⠓⠁⠞⠀⠋⠕⠗⠛⠑⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠃⠑⠊⠝⠛⠀⠁⠃⠇⠑⠀⠞⠕⠀⠺⠕⠗⠅⠀⠺⠊⠞⠓⠀⠑⠁⠉⠓⠀⠕⠞⠓⠑⠗⠀⠁⠝⠙⠀⠞⠓⠁⠞⠄⠎⠀⠙⠊⠋⠋⠑⠗⠑⠝⠞⠀⠞⠕⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠃⠑⠊⠝⠛⠀⠁⠃⠇⠑⠀⠞⠕⠀⠉⠕⠍⠍⠥⠝⠊⠉⠁⠞⠑⠀⠺⠊⠞⠓⠀⠑⠁⠉⠓⠀⠕⠞⠓⠑⠗⠀⠊⠞⠄⠎⠀⠁⠇⠎⠕⠀⠛⠕⠞⠀⠞⠕⠀⠃⠑⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠃⠑⠊⠝⠛⠀⠁⠃⠇⠑⠀⠞⠕⠀⠕⠗⠛⠁⠝⠊⠵⠑⠀⠁⠝⠙⠀⠞⠗⠥⠎⠞⠀⠞⠓⠁⠞⠀⠑⠁⠉⠓⠀⠕⠞⠓⠑⠗⠀⠊⠎⠀⠛⠕⠊⠝⠛⠀⠞⠕⠀⠙⠕⠀⠺⠓⠁⠞⠀⠺⠓⠁⠞⠀⠞⠓⠑⠽⠀⠝⠑⠑⠙⠀⠞⠕⠀⠙⠕⠀⠊⠝⠀⠕⠗⠙⠑⠗⠀⠞⠕⠀⠓⠁⠧⠑⠀⠎⠕⠍⠑⠀⠎⠕⠗⠞⠀⠕⠋⠀⠎⠓⠁⠗⠑⠙⠀⠉⠕⠝⠉⠇⠥⠎⠊⠕⠝⠀⠕⠗⠀⠗⠁⠍⠊⠋⠊⠉⠁⠞⠊⠕⠝⠀⠞⠕⠀⠞⠓⠑⠀⠉⠕⠕⠏⠑⠗⠁⠞⠊⠕⠝⠀⠁⠝⠙⠀⠞⠓⠁⠞⠄⠎⠀⠗⠑⠁⠇⠇⠽⠀⠁⠀⠃⠊⠛⠀⠉⠕⠍⠏⠕⠝⠑⠝⠞⠀⠕⠋⠀⠺⠑⠃⠀⠒⠀⠺⠑⠃⠀⠒⠀⠊⠎⠀⠗⠑⠁⠇⠇⠽⠀⠁⠃⠕⠥⠞⠀⠁⠇⠇⠕⠺⠊⠝⠛⠀⠏⠑⠕⠏⠇⠑⠀⠞⠕⠀⠉⠕⠍⠑⠀⠞⠕⠛⠑⠞⠓⠑⠗⠀⠁⠝⠙⠀⠉⠕⠕⠗⠙⠊⠝⠁⠞⠑⠀⠞⠓⠑⠊⠗⠀⠑⠋⠋⠕⠗⠞⠎⠀⠋⠕⠗⠀⠎⠕⠍⠑⠞⠓⠊⠝⠛⠀⠛⠗⠑⠁⠞⠑⠗⠀⠞⠓⠑⠀⠞⠓⠁⠝⠀⠞⠓⠑⠀⠎⠥⠍⠀⠕⠋⠀⠊⠞⠎⠀⠏⠁⠗⠞⠎⠀⠪⠍⠥⠎⠊⠉⠻

With Git Actions Workflow file for this run as example in real-time

available OS's: [ windows-latest, macos-latest, ubuntu-latest ]

name: Cross-platform matrix run
on: [push]
jobs:
  build:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest]
        python-version: ['3.6', '3.9']
        exclude:
          - os: ubuntu-latest
            python-version: '3.6'
    steps:
      - uses: actions/[email protected]
      - name: Set up Python
        uses: actions/[email protected]
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install dependencies 
        run: pip install youtube_transcript_api scrapetube codext
      - name: Web3 Foundation videos to braille language 
        run: python tube.py

For Support && Nominations

  • Display name. KSMNETWORK

  • Email [email protected]

  • Riot @gtoocool:matrix.org

  • KUSAMA (KSM) Address

  • H1bSKJxoxzxYRCdGQutVqFGeW7xU3AcN6vyEdZBU7Qb1rsZ

  • PolkaDOT (DOT) Address:

  • 15FxvBFDd3X7H9qcMGqsiuvFYEg4D3mBoTA2LQufreysTHKA

  • https://ksm.network

Owner
Little Endian
Riot @gtoocool:matrix.org                  KUSAMA Address:  H1bSKJxoxzxYRCdGQutVqFGeW7xU3AcN6vyEdZBU7Qb1rsZ
Little Endian
Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication

Applied Natural Language Processing in the Enterprise This is the companion repo for Applied Natural Language Processing in the Enterprise, an O'Reill

Applied Natural Language Processing in the Enterprise 95 Jan 05, 2023
Leon is an open-source personal assistant who can live on your server.

Leon Your open-source personal assistant. Website :: Documentation :: Roadmap :: Contributing :: Story 👋 Introduction Leon is an open-source personal

Leon AI 11.7k Dec 30, 2022
A fast, efficient universal vector embedding utility package.

Magnitude: a fast, simple vector embedding utility library A feature-packed Python package and vector storage file format for utilizing vector embeddi

Plasticity 1.5k Jan 02, 2023
Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Yu Zhang 50 Nov 08, 2022
RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Stefan Dumitrescu 9 Nov 07, 2022
Client library to download and publish models and other files on the huggingface.co hub

huggingface_hub Client library to download and publish models and other files on the huggingface.co hub Do you have an open source ML library? We're l

Hugging Face 644 Jan 01, 2023
Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

Phone Level Mixture Density Network for TTS This repo contains pytorch implementation of paper Rich Prosody Diversity Modelling with Phone-level Mixtu

Rishikesh (ऋषिकेश) 42 Dec 13, 2022
Python SDK for working with Voicegain Speech-to-Text

Voicegain Speech-to-Text Python SDK Python SDK for the Voicegain Speech-to-Text API. This API allows for large vocabulary speech-to-text transcription

Voicegain 3 Dec 14, 2022
Code-autocomplete, a code completion plugin for Python

Code AutoComplete code-autocomplete, a code completion plugin for Python.

xuming 13 Jan 07, 2023
Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

Sean Robertson 159 Apr 04, 2022
Paddlespeech Streaming ASR GUI

Paddlespeech-Streaming-ASR-GUI Introduction A paddlespeech Streaming ASR GUI. Us

Niek Zhen 3 Jan 05, 2022
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

DELTA 1.5k Dec 26, 2022
NLP made easy

GluonNLP: Your Choice of Deep Learning for NLP GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you l

Distributed (Deep) Machine Learning Community 2.5k Jan 04, 2023
Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

Soohwan Kim 40 Sep 19, 2022
Mycroft Core, the Mycroft Artificial Intelligence platform.

Mycroft Mycroft is a hackable open source voice assistant. Table of Contents Getting Started Running Mycroft Using Mycroft Home Device and Account Man

Mycroft 6.1k Jan 09, 2023
Perform sentiment analysis and keyword extraction on Craigslist listings

craiglist-helper synopsis Perform sentiment analysis and keyword extraction on Craigslist listings Background I love Craigslist. I've found most of my

Mark Musil 1 Nov 08, 2021
TruthfulQA: Measuring How Models Imitate Human Falsehoods

TruthfulQA: Measuring How Models Imitate Human Falsehoods

69 Dec 25, 2022
Code for text augmentation method leveraging large-scale language models

HyperMix Code for our paper GPT3Mix and conducting classification experiments using GPT-3 prompt-based data augmentation. Getting Started Installing P

NAVER AI 47 Dec 20, 2022
Nested Named Entity Recognition for Chinese Biomedical Text

CBio-NAMER CBioNAMER (Nested nAMed Entity Recognition for Chinese Biomedical Text) is our method used in CBLUE (Chinese Biomedical Language Understand

8 Dec 25, 2022
Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

CIRPLANT This repository contains the code and pre-trained models for Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) For d

Zheyuan (David) Liu 29 Nov 17, 2022