Natural Language Processing

Here you will find the teaching materials for the "Natural Language Processing" course at EDHEC Business School, 2022

What is the course about?

The course is designed as an introduction to the basics of natural language processing for analyzing unstructured, user-generated content. It is for beginners to the topic (and NLP in general), but it will be helpful to have basic knowledge of Python and a familarity with data science techniques.

Topics covered include:

text preprocessing in Python,
collecting your own data from Twitter and Reddit,
content analysis,
text embeddings, and
supervised learning with text data.

What materials are available here?

The sildes will be posted on the course BlackBoard page. They mostly serve as a high-level introduction to the examples and exercies (in Colab notebooks), which are linked to from the slides themselves. Copies of the Colab notebooks can also be found in the folder called /colab in this repository.

Can I work through the material on my own?

If you didn't attend the class, you can certainly work through the materials on your own (the Colab notebooks are designed to be readable and doable for individuals working at their own pace). The slides posted on BlackBoard will guide you through the content. The notebooks are intendend to be worked through in order. Each one will have examples to view and 1 or 2 practice exercises to complete.

Aknowledgements

I would like to aknowledge Steve Wilson at Oakland University for making his DS3 workshop materials publically available with an MIT license.

Natural Language Processing at EDHEC, 2022

Related tags

Overview

Natural Language Processing

What is the course about?

What materials are available here?

Can I work through the material on my own?

Aknowledgements

Owner

Higher quality textures for the Metal Gear Solid series.

문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".

Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

Automatically search Stack Overflow for the command you want to run

Machine translation models released by the Gourmet project

Lyrics generation with GPT2-based Transformer

Fast, general, and tested differentiable structured prediction in PyTorch

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

This is an incredibly powerful calculator that is capable of many useful day-to-day functions.

Korean extractive summarization. 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

Scikit-learn style model finetuning for NLP

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Extracting Summary Knowledge Graphs from Long Documents

edge-SR: Super-Resolution For The Masses