Natural Language Processing for Adverse Drug Reaction (ADR) Detection

This repo contains code from a project to identify ADRs in discharge summaries at Austin Health. The model uses the HuggingFace Transformers library, beginning with the pretrained DeBERTa model. Further MLM pre-training is performed on a large corpus of unannotated discharge summaries. Finally, fine-tuning is peformed on a corpus of annotated discharge summaries (annotated using Prodigy). The model performs NER, but final performance is measured at the document level using the maximum token-level score.

We used Weights and Biases for experiment tracking.

The pretrain script takes a folder containing discharge summaries stored in CSV folders, tokenizes and continues MLM training on deberta-base.

Fine-tuning can then be performed with the finetune script using CLI commands. This script assumes the data is either a JSONL file of annotated text exported from Prodigy (--datafile example.jsonl), or a saved HuggingFace Datasets. If you run this script once on a JSONL file of annotations, you can choose to save the Dataset into a folder (--save_data_dir "save_to_here") and use this for subsequent training runs (--datafile "save_to_here").

Example usage:

python .\finetune.py --folds 5 --epochs 15 --lr 5e-5 --wandb_on --hub_off --project 'CLI Tests' --run_name cross-validation --datafile 'data'

Note: you might find that your exported annotations (JSONL file) is not encoded using UTF-8, which will prevent this code from working. There are various methods to change the encoding and these can all be found with a quick Google search. On a windows machine, for example, modify the following in powershell:

Get-Content .\name_of_file.jsonl -Encoding Unicode | Set-Content -Encoding UTF8 .\name_of_new_file.jsonl

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Related tags

Overview

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Owner

Medicines Optimisation Service - Austin Health

Search Git commits in natural language

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Neural-Machine-Translation - Implementation of revolutionary machine translation models

ZUNIT - Toward Zero-Shot Unsupervised Image-to-Image Translation

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

A Chinese to English Neural Model Translation Project

An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

AI_Assistant - This is a Python based Voice Assistant.

Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

Tools for curating biomedical training data for large-scale language modeling

PyTorch implementation of Tacotron speech synthesis model.

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Reproduction process of BERT on SST2 dataset

translate using your voice

Code for the paper "Are Sixteen Heads Really Better than One?"

NLP applications using deep learning.