Course materials for a 3-day seminar "Machine Learning and NLP: Advances and Applications" at New College of Florida

Overview

Machine Learning and NLP: Advances and Applications

This repository hosts the course materials used for a 3-day seminar "Machine Learning and NLP: Advances and Applications" as part of Independent Study Period 2020 at New College of Florida.

Note that the seminar was held in Jan 2020, and the content may be a little bit oudated (as of Feb 2022). Please also refer to a Fall 2021 full semester course "CIS6930 Topics in Computing for Data Science", which covers much wider (and a little bit newer) Deep Learning topics.

Syllabus

Course Description

This 3-day course provides students with an opportunity to learn Machine Learning and Natural Language Processing (NLP) from basics to applications. The course covers some state-of-the-art NLP techniques including Deep Learning. Each day consists of a lecture and a hands-on session to help students learn how to apply those techniques to real-world applications. During the hands-on session, students will be given assignments to develop programming code in Python. Three days are too short to fully understand the concepts that are covered by the course and learn to apply those techniques to actual problems. Students are strongly encouraged to complete reading assignments before the lecture to be ready for the course assignments, and bring a lot of questions to the course. :)

Learning Objectives

Students successfully completing the course will

  • demonstrate the ability to apply machine learning and natural language processing techniques to various types of problems.
  • demonstrate the ability to build their own machine learning models using Python libraries.
  • demonstrate the ability to read and understand research papers in ML and NLP.

Course Outline

  • Wed 1/22 Day 1: Machine Learning basics [Slides]

    • Machine learning examples
    • Problem formulation
    • Evaluation and hyper-parameter tuning
    • Data Processing basics with pandas
    • Machine Learning with scikit-learn
    • Hands-on material: [ipynb] Open In Colab
  • Thu 1/23 Day 2: NLP basics [Slides]

    • Unsupervised learning and visualization
    • Topic models
    • NLP basics with SpaCy and NLTK
    • Understanding NLP pipeline for feature extraction
    • Machine learning for NLP tasks (text classification, sequential tagging)
    • Hands-on material [ipynb] Open In Colab
    • Follow-up
      • Commonsense Reasoning (Winograd Schema Challenge)
  • Fri 1/24 Day 3: Advanced techniques and applications [Slides]

    • Basic Deep Learning techniques
    • Word embeddings
    • Advanced Deep Learning techniques for NLP
    • Problem formulation and applications to (non-)NLP tasks
    • Pre-training models: ELMo and BERT
    • Hands-on material: [ipynb] Open In Colab
    • Follow-up
      • The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time
      • Cross-lingual word/sentence embeddings

Reading Assignments & Recommendations:

The following online tutorials for students who are not familiar with the Python libraries used in the course. Each day will have a hands-on session that requires those libraries. Please do not expect to have enough time to learn how to use those libraries during the lecture.

The following list is a good starting point.

The course will cover the following papers as examples of (non-NLP) applications (probably in Day 3.) Students who'd like to learn how to apply Deep Learning techniques to your own problems are encouraged to read the following papers.

  • [1] A. Asai, S. Evensen, B. Golshan, A. Halevy, V. Li, A. Lopatenko, D. Stepanov, Y. Suhara, W.-C. Tan, Y. Xu, "HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments" Proc LREC 18, 2018. [Paper] [Dataset]
  • [2] S. Evensen, Y. Suhara, A. Halevy, V. Li, W.-C. Tan, S. Mumick, "Happiness Entailment: Automating Suggestions for Well-Being," Proc. ACII 2019, 2019. [Paper]
  • [3] Y. Suhara, Y. Xu, A. Pentland, "DeepMood: Forecasting Depressed Mood Based on Self-Reported Histories via Recurrent Neural Networks," Proc. WWW '17, 2017. [Paper]
  • [4] N. Bhutani, Y. Suhara, W.-C. Tan, A. Halevy, H. V. Jagadish, "Open Information Extraction from Question-Answer Pairs," Proc. NAACL-HLT 2019, 2019. [Paper]

Computing Resources:

The course requires students to write code:

  • Students are expected to have a personal computer at their disposal. Students should have a Python interpreter and the listed libraries installed on their machines.

The hands-on sessions will require the following Python libraries. Please install those libraries on your computer prior to the course. See also the reading assignment section for the recommended tutorials.

  • pandas
  • scikit-learn
  • gensim
  • spacy
  • nltk
  • torch (PyTorch)
Owner
Yoshi Suhara
Yoshi Suhara
Python Projects is an Open Source to enhance your python skills

Welcome! 👋🏽 Python Project is Open Source to enhance your python skills. You're free to contribute. 🤓 You just need to give us your scripts written

Tristán 6 Nov 28, 2022
a sketch of what a zkvm could look like

We want to build a ZKP that validates an entire EVM block or as much of it as we can efficiently. Its okay to adjust the gas costs for every EVM opcode. Its also to exclude some opcodes for now if th

25 Dec 30, 2022
Dump Data from FTDI Serial Port to Binary File on MacOS

Dump Data from FTDI Serial Port to Binary File on MacOS

pandy song 1 Nov 24, 2021
Машинное обучение на ФКН ВШЭ

Курс "Машинное обучение" на ФКН ВШЭ Конспекты лекций, материалы семинаров и домашние задания (теоретические, практические, соревнования) по курсу "Маш

Evgeny Sokolov 2.2k Jan 04, 2023
Criando um jogo de naves espaciais com Pygame. Para iniciantes em Python

Curso de Programação de Jogos com Pygame Criando um jogo de naves espaciais com Pygame. Para iniciantes em Python Pré-requisitos Antes de começar este

Flávio Codeço Coelho 33 Dec 02, 2022
GitHub saver for stargazers, forks, repos

GitHub backup repositories Save your repos and list of stargazers & list of forks for them. Pure python3 and git with no dependencies to install. GitH

Alexander Kapitanov 23 Aug 21, 2022
A tool to allow New World players to calculate the best place to put their Attribute Points for their build and level

New World Damage Simulator A tool designed to take a characters base stats including armor and weapons, level, and base damage of their items (slash d

Joseph P Langford 31 Nov 01, 2022
An app to help people apply for admissions on schools/hostels

Admission-helper About An app to help people apply for admissions on schools/hostels This app is a rewrite of Admission-helper-beta-v5.8.9 and I impor

Advik 3 Apr 24, 2022
The purpose of this script is to bypass disablefund, provide some useful information, and dig the hook function of PHP extension.

The purpose of this script is to bypass disablefund, provide some useful information, and dig the hook function of PHP extension.

Firebasky 14 Aug 02, 2021
Python package for handling and analyzing PSRFITS files

PyPulse A pure-Python package for handling and analyzing PSRFITS files. Read the documentation here. This is an alternate code base from PSRCHIVE. Req

Michael Lam 15 Nov 30, 2022
A repository containing an introduction to Panel made to be support videos and talks.

👍 Awesome Panel - Introduction to Panel THIS REPO IS WORK IN PROGRESS. PRE-ALPHA Panel is a very powerful framework for exploratory data analysis and

Marc Skov Madsen 51 Nov 17, 2022
Have an idea for a Python package? Register the name on PyPI 💡

Register Package Names on PyPI Have an idea for a Python package? Thought of a great name? Register it on PyPI, before someone else does! A tool that

Alex Ioannides 1 Jul 15, 2022
A programming language that for tech savvy graphic designers

Microsoft Hackathon - PhoTex Idea A programming language that allows tech savvy graphic designers develop scalable vector graphics using plain text co

Joe Furfaro 5 Nov 14, 2021
Gobigger Explore For Python

Gobigger-Explore 🔮 GoBigger Challenge 2021 Baseline en/中文 🤖 Introduction This is the baseline of GoBigger Multi-Agent Decision Intelligence Challeng

OpenDILab 145 Dec 22, 2022
This is the DBMS Project done in 5th sem of B.E CS.

Student-Result-Management-System This is the DBMS Project done in 5th sem of B.E CS. You need to install SQlite DB Browser in your pc or laptop to ope

Vivek kulkarni 1 Jan 14, 2022
Repositório para estudo do airflow

airflow-101 Repositório para estudo do airflow Docker criado baseado no tutorial Exemplo de API da pokeapi Para executar clone o repo execute as confi

Gabriel (Gabu) Bellon 1 Nov 23, 2021
Absolute solvation free energy calculations with OpenFF and OpenMM

ABsolute SOLVantion Free Energy Calculations The absolv framework aims to offer a simple API for computing the change in free energy when transferring

7 Dec 07, 2022
An osu! cheat made in c++ rewritten in python and currently undetected.

megumi-python An osu! cheat made in c++ rewritten in python and currently undetected. Installation Guide Download python 3.9 from https://python.org C

Elaina 2 Nov 18, 2022
This script can be used to get unlimited Gb for WARP.

Warp-Unlimited-GB This script can be used to get unlimited Gb for WARP. How to use Change the value of the 'referrer' to warp id of yours You can down

Anix Sam Saji 1 Feb 14, 2022
Eatlocal - This package helps users solve PyBites code challenges on their local machine

eatlocal This package helps the user solve Pybites code challenges locally. Inst

Russell 0 Jul 25, 2022