The aim of this task is to predict someone's English proficiency based on a text input.

Last update: Dec 13, 2021

Overview

English_proficiency_prediction_NLP

The aim of this task is to predict someone's English proficiency based on a text input.

Using the The NICT JLE Corpus available here : https://alaginrc.nict.go.jp/nict_jle/index_E.html

The source of the corpus data is the transcripts of the audio-recorded speech samples of 1,281 participants (1.2 million words, 300 hours in total) of English oral proficiency interview test. Each participant got a SST (Standard Speaking Test) score between 1 (low proficiency) and 9 (high proficiency) based on this test.

The goal is to build a machine learning algorithm for predicting the SST score of each participant based on their transcript.

Steps:

1 - Pre-process the dataset: extract the participant transcript (all tags). Inside participant transcript, you can remove all other tags and extract only English words.

2 - Process the dataset: extract features with the Bag of Word (BoW) technique

3 - Train a classifier to predict the SST score

4 - Compute the accuracy of your system (the number of participant classified correctly) and plot the confusion matrix.

5 - Try to improve your system (for example you can try to use GloVe instead of BoW).

The aim of this task is to predict someone's English proficiency based on a text input.

Related tags

Overview

English_proficiency_prediction_NLP

Owner

A workshop with several modules to help learn Feast, an open-source feature store

SurvTRACE: Transformers for Survival Analysis with Competing Events

NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

Python api wrapper for JellyFish Lights

Trained T5 and T5-large model for creating keywords from text

Rhythm-Finder is a unsupervised ML driven python powered web-application that can find the songs that suits you.

A BERT-based reverse dictionary of Korean proverbs

Some embedding layer implementation using ivy library

A toolkit for document-level event extraction, containing some SOTA model implementations

🏖 Easy training and deployment of seq2seq models.

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Understand Text Summarization and create your own summarizer in python

Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

Resources for "Natural Language Processing" Coursera course.

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

基于GRU网络的句子判断程序/A program based on GRU network for judging sentences

Code for the ACL 2021 paper "Structural Guidance for Transformer Language Models"

Application for shadowing Chinese.

Main repository for the chatbot Bobotinho.

Basic yet complete Machine Learning pipeline for NLP tasks