This is a NLP based project to extract effective date of the contract from their text files.

Last update: Jan 26, 2022

Overview

Date-Extraction-from-Contracts

This is a NLP based project to extract effective date of the contract from their text files.

Problem statement

This is a NLP based project where effective dates needs to be identified from the contracts as per the given text data of the contracts. The dates could be in any format for eg - 01/01/2022, 1st Jan, 2022, 1st January, 2022, 01 Jan 2022, etc.

Libraries Used

Numpy
Tensorflow
keras
nltk
Sklearn
matplotlib
pandas

Approach

Data prerprocessing

To preprocess the text data the custom function was developed to preprocess the data as the convential libraires out there are not focused on preprocessing dates in a text corpus. To perform the requried tokenization and vectorization of the text nltk was used instaed of tensorflow or keras based text preprocessors. The preprocessing includes data cleaning (remvoing improper data lbaleing or file namings), stopwords removal, puncation removal but keeping in mind the punctutaions within a date like '/', spacing and seperating dates with words as there were cases where the numbers in the dates are conjoined with the preceding word, tokenization and vectorization of word. For vectorization of the word a normal word based vectorization was used as usig TF-IDF would not have made much difference in terms of date extraction.

Preprocessed data before vectorization:

Model Building

The model for this problem was a RNN based model with a bidirectional LSTM layer. the inputs of the model include the preprocessed data with 3 output values each predicting the values of a day, month and year respectively.

The model was trained a decayed learning rate starting from a learning rate of 0.001 and trained for 80 epochs with a batch size of 8.

Model Architecture:

Results

The model performed quite well being a baseline model to extract date using just a single Bidirectional LSTM layer. The prediction file is atatched to refer the results.

This is a NLP based project to extract effective date of the contract from their text files.

Related tags

Overview

Date-Extraction-from-Contracts

Problem statement

Libraries Used

Approach

Data prerprocessing

Model Building

Results

Owner

Sambhav Garg

Understanding the Difficulty of Training Transformers

GPT-2 Model for Leetcode Questions in python

Easy, fast, effective, and automatic g-code compression!

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

MRC approach for Aspect-based Sentiment Analysis (ABSA)

An open-source NLP library: fast text cleaning and preprocessing.

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

NLP command-line assistant powered by OpenAI

使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

NeurIPS'21: Probabilistic Margins for Instance Reweighting in Adversarial Training (Pytorch implementation).

A minimal code for fairseq vq-wav2vec model inference.

原神抽卡记录数据集-Genshin Impact gacha data

ProtFeat is protein feature extraction tool that utilizes POSSUM and iFeature.

VD-BERT: A Unified Vision and Dialog Transformer with BERT

BERT-based Financial Question Answering System

Curso práctico: NLP de cero a cien 🤗

Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3