In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Overview

Transformers are all you need

In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Table of Content

The workshop will be divided into four parts

  1. Introduction to Transformers as a HYPE
  2. Sneak peek to the theory behind Transfomers
  3. Quick tour (Huggingface framework)
  4. Lab

Note that you can always open the notebooks on Google Colab ( No need to install anything ) you just need a stable internet connection :

- fine tune a translation model Open In Colab

2. How to get started

  1. Fork this repository
  2. Create a branch by your name
  3. Go through the notebook and complete all tasks
  4. Submit a pull request

Homework exercise

Your task is to fine-tune a classification model

  1. Using HuggingFace transformers and datasets.
  2. fine tune it to one of the classification task of the GLUE Benchmark(CoLa to be specific).
  3. Use a checkpoint from the Hub ("distilbert-base-uncased" for example)
  4. Once finished submit a pull request to this repo, make sure to place your .ipynb file in the submissions folder (YOUR_NAME.ipynb)

Useful ressources : text_classification

Owner
Aymen Berriche
CS student at ESI Algiers | Dev Co-Manger at GDG Algiers | GitHub Campus Expert
Aymen Berriche
Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"

ERNIE Source code and dataset for "ERNIE: Enhanced Language Representation with Informative Entities" Reqirements: Pytorch=0.4.1 Python3 tqdm boto3 r

THUNLP 1.3k Dec 30, 2022
A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.

Sentiment Analysis on Yelp's Dataset Author: Roberto Sanchez, Talent Path: D1 Group Docker Deployment: Deployment of this application can be found her

Roberto Sanchez 0 Aug 04, 2021
Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

[UPDATED] A TensorFlow Implementation of Attention Is All You Need When I opened this repository in 2017, there was no official code yet. I tried to i

Kyubyong Park 3.8k Dec 26, 2022
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval. CLIP4Clip is a video-text retrieval model based

ArrowLuo 456 Jan 06, 2023
Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger In this project, our aim is to tune, compare, and contrast the perf

Chirag Daryani 0 Dec 25, 2021
πŸ’› Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 50 Dec 21, 2022
chaii - hindi & tamil question answering

chaii - hindi & tamil question answering This is the solution for rank 5th in Kaggle competition: chaii - Hindi and Tamil Question Answering. The comp

abhishek thakur 33 Dec 18, 2022
OpenChat: Opensource chatting framework for generative models

OpenChat is opensource chatting framework for generative models.

Hyunwoong Ko 427 Jan 06, 2023
Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Neural G2P to portuguese language Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written for

fluz 11 Nov 16, 2022
λ‰΄μŠ€ 도메인 μ§ˆμ˜μ‘λ‹΅ μ‹œμŠ€ν…œ (21-1ν•™κΈ° μ‘Έμ—… ν”„λ‘œμ νŠΈ)

λ‰΄μŠ€ 도메인 μ§ˆμ˜μ‘λ‹΅ μ‹œμŠ€ν…œ λ³Έ ν”„λ‘œμ νŠΈλŠ” λ‰΄μŠ€κΈ°μ‚¬μ— λŒ€ν•œ μ§ˆμ˜μ‘λ‹΅ μ„œλΉ„μŠ€ λ₯Ό μ œκ³΅ν•˜κΈ° μœ„ν•΄μ„œ μ§„ν–‰ν•œ ν”„λ‘œμ νŠΈμž…λ‹ˆλ‹€. μ•½ 3κ°œμ›”κ°„ ( 21. 03 ~ 21. 05 ) μ§„ν–‰ν•˜μ˜€μœΌλ©° Transformer 아킀텍쳐 기반의 Encoderλ₯Ό μ‚¬μš©ν•˜μ—¬ ν•œκ΅­μ–΄ μ§ˆμ˜μ‘λ‹΅ λ°μ΄ν„°μ…‹μœΌλ‘œ

TaegyeongEo 4 Jul 08, 2022
Stack based programming language that compiles to x86_64 assembly or can alternatively be interpreted in Python

lang lang is a simple stack based programming language written in Python. It can

Christoffer Aakre 1 May 30, 2022
News-Articles-and-Essays - NLP (Topic Modeling and Clustering)

NLP T5 Project proposal Topic Modeling and Clustering of News-Articles-and-Essays Students: Nasser Alshehri Abdullah Bushnag Abdulrhman Alqurashi OVER

2 Jan 18, 2022
A PyTorch implementation of VIOLET

VIOLET: End-to-End Video-Language Transformers with Masked Visual-token Modeling A PyTorch implementation of VIOLET Overview VIOLET is an implementati

Tsu-Jui Fu 119 Dec 30, 2022
"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

transformers-arithmetic This repository contains the code to reproduce the experiments from the paper: Nogueira, Jiang, Lin "Investigating the Limitat

Castorini 33 Nov 16, 2022
LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation Tasks | Datasets | LongLM | Baselines | Paper Introduction LOT is a ben

46 Dec 28, 2022
Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

fake-news-detector-1.0 Lists, lists and more lists... Spam filter list, quality keyword list, stoplist list, top-domains urls list, news agencies webs

Memo Sim 1 Jan 04, 2022
Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library.

GI-Pi Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library. The SP0

Nick Bild 8 Dec 15, 2021
The ibet-Prime security token management system for ibet network.

ibet-Prime The ibet-Prime security token management system for ibet network. Features ibet-Prime is an API service that enables the issuance and manag

BOOSTRY 8 Dec 22, 2022
Smart discord chatbot integrated with Dialogflow

academic-NLP-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

NLP learning Trying to learn NLP to use in my projects! Table of Contents About The Project Built With Getting Started Requirements Run Usage License

Faraz Farangizadeh 3 Aug 25, 2022