For making Tagtog annotation into csv dataset

Last update: Dec 28, 2021

Overview

tagtog_relation_extraction

for making Tagtog annotation into csv dataset

How to Use

On Tagtog

1. Go to Project > Downloads
2. Download all documents, using the button below

On Local

1. Place folders and files according to the structure specified below:

tagtog_relation_extraction
├──main.py
├──util.py
├──.gitignore
├──README.md
├──requirements.txt
└──Your_download_file_Name
   ├──annotations-legend.json
   ├──ann.json
   |  └──master
   |     └──pool/
   ├──plain.html
   |  └──pool/
   ├──guidelines.md
   └──README.md

2. Install other required packages

tqdm==4.62.3
pandas==1.1.5
beautifulsoup4==4.10.0

$ pip install -r $ROOT/tagtog_relation_extraction/requirements.txt

3. Run

$ python main.py --path Your_download_file_Name

Result

1. Dataset file (dataset.csv)

csv file with rows in KLUE dataset format
example:

sentence: 가장 가능성이 높은 새 대안은 플랑크 상수를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 원자질량단위이다
sub_tag: {'word': '원자질량단위', 'start_idx': 85, 'end_idx': 90, 'type': 'POH'}
obj_tag: {'word': '플랑크 상수', 'start_idx': 17, 'end_idx': 22, 'type': 'POH'}
label: POH:no_relation'

2. File for checking answers (answer_check.csv)

csv file desgined for checking entity taggings and labels
example:

sentence: 가장 가능성이 높은 새 대안은 
   
    를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 
    
     이다	
sub_tag: POH
obj_tag: POH
label: POH:no_relation

Restrictions

Entity labels should follow the following form

SUBJ-{ENT_TYPE}-{RELATION_NAME}
OBJ-{ENT_TYPE}-{RELATION_NAME}

If this is not the case you might need some revision on the util.py file

For making Tagtog annotation into csv dataset

Related tags

Overview

tagtog_relation_extraction

How to Use

On Tagtog

On Local

Result

Restrictions

Owner

hyeong

MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.

This tool parses log data and allows to define analysis pipelines for anomaly detection.

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

CINECA molecular dynamics tutorial set

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

Time ranges with python

A data analysis using python and pandas to showcase trends in school performance.

My solution to the book A Collection of Data Science Take-Home Challenges

A set of tools to analyse the output from TraDIS analyses

A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

A distributed block-based data storage and compute engine

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

A lightweight, hub-and-spoke dashboard for multi-account Data Science projects

A notebook to analyze Amazon Recommendation Review Dataset.

MIR Cheatsheet - Survival Guidebook for MIR Researchers in the Lab

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Minimal working example of data acquisition with nidaqmx python API

A collection of learning outcomes data analysis using Python and SQL, from DQLab.

A Numba-based two-point correlation function calculator using a grid decomposition