Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Related tags

Text Data & NLPanlp21
Overview

anlp21

Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dbamman/info256.html

Notebook Description
1.words/EvaluateTokenizationForSentiment The impact of tokenization choices on sentiment classification.
1.words/ExploreTokenization Different methods for tokenizing texts (whitespace, NLTK, spacy, regex)
1.words/TokenizePrintedBooks Design a better tokenizer for printed books
1.words/Text_Complexity Implement type-token ratio and Flesch-Kincaid Grade Level scores for text
2.compare/ChiSquare, Mann-Whitney Tests Explore two tests for finding distinctive terms
2.compare/Log-odds ratio with priors Implement the log-odds ratio with an informative (and uninformative) Dirichlet prior
3.dictionaries/DictionaryTimeSeries Plot sentiment over time using human-defined dictionaries
3.dictionaries/Empath Explore using Empath dictionaries to characterize texts
4.embeddings/DistributionalSimilarity Explore distributional hypothesis to build high-dimensional, sparse representations for words
4.embeddings/WordEmbeddings Explore word embeddings using Gensim
4.embeddings/Semaxis Implement SemAxis for scoring terms along a user-defined axis (e.g., positive-negative, concrete-abstract, hot-cold),
4.embeddings/BERT Explore the basics of token representations in BERT and use it to find token nearest neighbors
4.embedings/SequenceEmbeddings Use sequence embeddings to find TV episode summaries most similar to a short description
5.eda/WordSenseClustering Inferring distinct word senses using KMeans clustering over BERT representations
5.eda/Haiku KMeans Explore text representation in clustering by trying to group haiku and non-haiku poems into two distinct clusters
Owner
David Bamman
David Bamman
Code for Editing Factual Knowledge in Language Models

KnowledgeEditor Code for Editing Factual Knowledge in Language Models (https://arxiv.org/abs/2104.08164). @inproceedings{decao2021editing, title={Ed

Nicola De Cao 86 Nov 28, 2022
Sentence Embeddings with BERT & XLNet

Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch This framework provides an easy method t

Ubiquitous Knowledge Processing Lab 9.1k Jan 02, 2023
๐ŸŽ a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk James Turk 1.8k Dec 21, 2022

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

Habib Abdurrasyid 5 Dec 28, 2021
Prompt tuning toolkit for GPT-2 and GPT-Neo

mkultra mkultra is a prompt tuning toolkit for GPT-2 and GPT-Neo. Prompt tuning injects a string of 20-100 special tokens into the context in order to

61 Jan 01, 2023
Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

14 Jan 03, 2023
NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering Paper: https://arxiv.org/abs/2103.00762 Running Run on the provided DTU scene cd run ba

Fanbo Xiang 68 Jan 06, 2023
NLP made easy

GluonNLP: Your Choice of Deep Learning for NLP GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you l

Distributed (Deep) Machine Learning Community 2.5k Jan 04, 2023
Sapiens is a human antibody language model based on BERT.

Sapiens: Human antibody language model ____ _ / ___| __ _ _ __ (_) ___ _ __ ___ \___ \ / _` | '_ \| |/ _ \ '

Merck Sharp & Dohme Corp. a subsidiary of Merck & Co., Inc. 13 Nov 20, 2022
Implementation of ProteinBERT in Pytorch

ProteinBERT - Pytorch (wip) Implementation of ProteinBERT in Pytorch. Original Repository Install $ pip install protein-bert-pytorch Usage import torc

Phil Wang 92 Dec 25, 2022
Question and answer retrieval in Turkish with BERT

trfaq Google supported this work by providing Google Cloud credit. Thank you Google for supporting the open source! ๐ŸŽ‰ What is this? At this repo, I'm

M. Yusuf Sarฤฑgรถz 13 Oct 10, 2022
A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

RunMany Intro | Installation | VSCode Extension | Usage | Syntax | Settings | About A tool to run many programs written in many languages from one fil

6 May 22, 2022
Predict the spans of toxic posts that were responsible for the toxic label of the posts

toxic-spans-detection An attempt at the SemEval 2021 Task 5: Toxic Spans Detection. The Toxic Spans Detection task of SemEval2021 required participant

Ilias Antonopoulos 3 Jul 24, 2022
A Telegram bot to add notes to Flomo.

flomo bot ไฝฟ็”จ Telegram ๆœบๅ™จไบบๅ‘้€็ฌ”่ฎฐๅˆฐไฝ ็š„ Flomo. ไฝ ้œ€่ฆๆœ‰ไธ€ๅฐๅฏ่ฎฟ้—ฎ Telegram ็š„ๆœๅŠกๅ™จใ€‚ Steps @BotFather ๆ–ฐๅปบๆœบๅ™จไบบ๏ผŒ่Žทๅ– token Flomo ๅฎ˜็ฝ‘่Žทๅ– API๏ผŒ้“พๆŽฅ https://flomoapp.com/mine?source=in

Zhen 44 Dec 30, 2022
NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

pretrain4ir_tutorial NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking ็”จไฝœNLPIRๅฎž้ชŒๅฎค, Pre-training

ZYMa 12 Apr 07, 2022
pyupbit ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ upbit์—์„œ ๋น„ํŠธ์ฝ”์ธ์„ ์ž๋™๋งค๋งคํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. ์กฐ์ฝ”๋”ฉ ์œ ํŠœ๋ธŒ ์ฑ„๋„์—์„œ ์ž์„ธํ•œ ๊ฐ•์˜ ์˜์ƒ์„ ๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํŒŒ์ด์ฌ ๋น„ํŠธ์ฝ”์ธ ํˆฌ์ž ์ž๋™ํ™” ๊ฐ•์˜ ์ฝ”๋“œ by ์œ ํŠœ๋ธŒ ์กฐ์ฝ”๋”ฉ ์ฑ„๋„ pyupbit ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ upbit ๊ฑฐ๋ž˜์†Œ์—์„œ ๋น„ํŠธ์ฝ”์ธ ์ž๋™๋งค๋งค๋ฅผ ํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. ํŒŒ์ผ ๊ตฌ์„ฑ test.py : ์ž”๊ณ  ์กฐํšŒ (1๊ฐ•) backtest.py : ๋ฐฑํ…Œ์ŠคํŒ… ์ฝ”๋“œ (2๊ฐ•) bestK.p

์กฐ์ฝ”๋”ฉ JoCoding 186 Dec 29, 2022
Example code for "Real-World Natural Language Processing"

Real-World Natural Language Processing This repository contains example code for the book "Real-World Natural Language Processing." AllenNLP (2.5.0 or

Masato Hagiwara 303 Dec 17, 2022
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

Abel 211 Dec 28, 2022
Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

Visual Automata Copyright 2021 Lewi Lie Uberg Released under the MIT license Visual Automata is a Python 3 library built as a wrapper for Caleb Evans'

Lewi Uberg 55 Nov 17, 2022
Chinese segmentation library

What is loso? loso is a Chinese segmentation system written in Python. It was developed by Victor Lin ( Fang-Pen Lin 82 Jun 28, 2022