A Japanese Medical Information Extraction Toolkit

Last update: Dec 12, 2022

Related tags

Deep Learning JaMIE

Overview

JaMIE: a Japanese Medical Information Extraction toolkit

Joint Japanese Medical Problem, Modality and Relation Recognition

The Train/Test phrases require all train, dev, test file converted to CONLL-style. Please check data_converter.py

Installation (python3.8)

git clone https://github.com/racerandom/JaMIE.git
cd JaMIE \

Required python package

pip install -r requirements.txt

Mophological analyzer required:\

jumanpp
mecab (juman-dict)

Pretrained BERT required:\

NICT-BERT (NICT_BERT-base_JapaneseWikipedia_32K_BPE)

Train：

CUDA_VISIBLE_DEVICES=$SEED python clinical_joint.py \
--pretrained_model $PRETRAINED_BERT \
--train_file $TRAIN_FILE \
--dev_file $DEV_FILE \
--dev_output $DEV_OUT \
--saved_model $MODEL_DIR_TO_SAVE \
--enc_lr 2e-5 \
--batch_size 4 \
--warmup_epoch 2 \
--num_epoch 20 \
--do_train
--fp16 (apex required)

The models trained on radiography interpretation reports of Lung Cancer (LC) and general medical reports of Idiopathic Pulmonary Fibrosis (IPF) are to be availabel: link1, link2.

Test:

CUDA_VISIBLE_DEVICES=$SEED python clinical_joint.py \
--saved_model $SAVED_MODEL \
--test_file $TEST_FILE \
--test_output $TEST_OUT \
--batch_size 4

Bath Converter from XML (or raw text) to CONLL for Train/Test

Convert XML files to CONLL files for Train/Test. You can also convert raw text to CONLL-style for Test.

python data_converter.py \
--mode xml2conll \
--xml $XML_FILES_DIR \
--conll $OUTPUT_CONLL_DIR \
--cv_num 5 \ # 5-fold cross-validation, 0 presents to generate single conll file
--doc_level \ # generate document-level ([SEP] denotes sentence boundaries) or sentence-level conll files
--segmenter mecab \ # please use mecab and NICT bert currently
--bert_dir $PRETRAINED_BERT

Batch Converter from predicted CONLL to XML

python data_converter.py \
--mode conll2xml \
--xml $XML_FILES_DIR \
--conll $OUTPUT_CONLL_DIR

Citation

If you use our code in your research, please cite our work:

@inproceedings{cheng2021jamie,
   title={JaMIE: A Pipeline Japanese Medical Information Extraction System,
   author={Fei Cheng, Shuntaro Yada, Ribeka Tanaka, Eiji Aramaki, Sadao Kurohashi},
   booktitle={arXiv},
   year={2021}
}

A Japanese Medical Information Extraction Toolkit

Related tags

Overview

JaMIE: a Japanese Medical Information Extraction toolkit

Joint Japanese Medical Problem, Modality and Relation Recognition

Installation (python3.8)

Required python package

Mophological analyzer required:\

Pretrained BERT required:\

Train：

Test:

Bath Converter from XML (or raw text) to CONLL for Train/Test

Batch Converter from predicted CONLL to XML

Citation

Owner

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018

CausaLM: Causal Model Explanation Through Counterfactual Language Models

Autoencoders pretraining using clustering

MWPToolkit is a PyTorch-based toolkit for Math Word Problem (MWP) solving.

Single-Stage Instance Shadow Detection with Bidirectional Relation Learning (CVPR 2021 Oral)

UT-Sarulab MOS prediction system using SSL models

Python implementation of a live deep learning based age/gender/expression recognizer

Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli

Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

Code repository for the paper "Tracking People with 3D Representations"

An end-to-end framework for mixed-integer optimization with data-driven learned constraints.

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

A toolkit for developing and comparing reinforcement learning algorithms.

PixelPyramids: Exact Inference Models from Lossless Image Pyramids (ICCV 2021)

The official GitHub repository for the Argoverse 2 dataset.

MIMO-UNet - Official Pytorch Implementation

PyKaldi GOP-DNN on Epa-DB

Paddle implementation for "Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation" (NAACL 2021)

The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021