Implementation of legal QA system based on SentenceKoBART

Last update: Dec 27, 2022

Related tags

Text Data & NLP LegalQA

Overview

LegalQA using SentenceKoBART

Implementation of legal QA system based on SentenceKoBART

How to train SentenceKoBART
Based on Neural Search Engine Jina
Provide Korean legal QA data(1,830 pairs)

Setup

# install git lfs , https://github.com/git-lfs/git-lfs/wiki/Installation
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt install git-lfs
git clone https://github.com/haven-jeon/LegalQA.git
cd LegalQA
git lfs pull
pip install -r requirements.txt

Index

python app.py -t index

GPU-based indexing available as an option

pods/encoder.yml - on_gpu: true

Search

With REST API

To start the Jina server for REST API:

python app.py -t query_restful

Then use a client to query:

curl --request POST -d '{"top_k": 1, "mode": "search",  "data": ["상속 관련 문의"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:1234/api/search'

Or use Jinabox with endpoint http://127.0.0.1:1234/api/search

From the terminal

python app.py -t query

Demo

http://ec2-3-36-123-253.ap-northeast-2.compute.amazonaws.com:7874/

Citation

Model training, data crawling, and demo system were all supported by the AWS Hero program.

@misc{heewon2021,
author = {Heewon Jeon},
title = {LegalQA using SentenceKoBART},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/haven-jeon/LegalQA}}

License

QA data data/legalqa.jsonlines is crawled in www.freelawfirm.co.kr based on robots.txt. Commercial use other than academic use is prohibited.
We are not responsible for any legal decisions we make based on the resources provided here.

Implementation of legal QA system based on SentenceKoBART

Related tags

Overview

LegalQA using SentenceKoBART

Setup

Index

Search

With REST API

From the terminal

Demo

Citation

License

Owner

Heewon Jeon(gogamza)

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

Code voor mijn Master project omtrent VideoBERT

Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

Search-Engine - 📖 AI based search engine

Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch

Demo programs for the Talking Head Anime from a Single Image 2: More Expressive project.

This is a really simple text-to-speech app made with python and tkinter.

Conditional Transformer Language Model for Controllable Generation

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

Multiple implementations for abstractive text summurization , using google colab

A high-level Python library for Quantum Natural Language Processing

用Resnet101+GPT搭建一个玩王者荣耀的AI

TensorFlow code and pre-trained models for BERT

AI_Assistant - This is a Python based Voice Assistant.

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Trained T5 and T5-large model for creating keywords from text

AudioCLIP Extending CLIP to Image, Text and Audio